ดาวน์โหลด lumpy sv - ดาวน์โหลดซอร์สโค้ด lumpy sv

หมายเหตุ: นี่คือ LUMPY 0.2.13 ที่มีการเปลี่ยนแปลงเพิ่มเติมเพื่อให้ lumpyexpress ทำงานเมื่อไฟล์หลักเป็น CRAM ไม่ใช่ BAM ตัวแยกและไม่ลงรอยกันจะต้องยังคงเป็นไฟล์ BAM เนื่องจาก LUMPY เองยังไม่รองรับ CRAM เป็นอินพุต สิ่งนี้จำเป็นต้องมีคำสั่ง hexdump

สำหรับคำถามและการอภิปรายเกี่ยวกับ LUMPY กรุณาเยี่ยมชมฟอรั่มที่:

https://groups.google.com/forum/#!forum/lumpy-discuss

เป็นก้อน

กรอบความน่าจะเป็นสำหรับการค้นพบตัวแปรโครงสร้าง

ไรอัน เอ็ม เลเยอร์, โคลบี เชียง, แอรอน อาร์ ควินแลน และไอรา เอ็ม ฮอลล์ 2557. "เป็นก้อน: กรอบความน่าจะเป็นสำหรับการค้นพบตัวแปรโครงสร้าง" ชีววิทยาจีโนม 15 (6): R84 ดอย:10.1186/gb-2014-15-6-r84.

สารบัญ

เริ่มต้นอย่างรวดเร็ว
การติดตั้ง
การใช้งาน LUMPY Express: การตรวจจับเบรกพอยต์อัตโนมัติสำหรับการวิเคราะห์มาตรฐาน
การใช้งานแบบ LUMPY (ดั้งเดิม): การตรวจจับเบรกพอยต์ที่ยืดหยุ่นและปรับแต่งได้สำหรับผู้ใช้ขั้นสูง
ตัวอย่างขั้นตอนการทำงาน
ข้อมูลการทดสอบ
การแก้ไขปัญหา

เริ่มต้นอย่างรวดเร็ว

โปรดทราบว่า smoove เป็นวิธีที่แนะนำในการเรียกใช้ lumpy เนื่องจากรวบรวมแนวทางปฏิบัติที่ดีที่สุดของ lumpy และเครื่องมือที่เกี่ยวข้อง และจะมีรันไทม์สั้นกว่าและอัตราผลบวกลวงต่ำกว่า lumpyexpress ที่อธิบายไว้ด้านล่าง

ดาวน์โหลดและติดตั้ง

 git clone --recursive https://github.com/arq5x/lumpy-sv.git
cd lumpy-sv
make
cp bin/* /usr/local/bin/.

เรียกใช้ LUMPY Express

 lumpyexpress 
    -B my.bam 
    -S my.splitters.bam 
    -D my.discordants.bam 
    -o output.vcf

การติดตั้ง

ความต้องการ

เป็นก้อน
- คอมไพเลอร์ g++
- ซีเมค
ลัมปีเอ็กซ์เพรส (ไม่จำเป็น)
- Samtools (0.1.18+) (htslib.org/)
- SAMBLASTER (0.1.19+) (ที่เก็บ GitHub)
- Python 2.7 (python.org/) พร้อม pysam (0.8.3+) และ NumPy (1.8.1+)
- แซมบัมบา (gihub repo)
- เพ่งพิศ (โครงการ GNU)

ติดตั้ง

วิธีการติดตั้งเริ่มต้น:

 git clone --recursive [email protected]:arq5x/lumpy-sv.git
cd lumpy-sv
make
cp bin/* /usr/local/bin/.

การติดตั้งด้วย costom zlib (ข้อผิดพลาดในการคอมไพล์ gzopen64):

 git clone --recursive [email protected]:arq5x/lumpy-sv.git
cd lumpy-sv
export ZLIB_PATH="/usr/lib/x86_64-linux-gnu/"; #when /usr/lib/x86_64-linux-gnu/libz.so
make
cp bin/* /usr/local/bin/.

การใช้งาน LUMPY Express

การตรวจจับเบรกพอยต์อัตโนมัติสำหรับการวิเคราะห์มาตรฐาน

 usage:   lumpyexpress [options]

อาร์กิวเมนต์ที่จำเป็น

     -B FILE  coordinate-sorted BAM file(s) (comma separated)
     -S FILE  split reads BAM file(s) (comma separated)
     -D FILE  discordant reads BAM files(s) (comma separated)

อาร์กิวเมนต์เพิ่มเติม

 -o STR    output [fullBam.bam.vcf]
-x FILE   BED file to exclude
-P        output probability curves for each variant
-m INT    minimum sample weight for a call [4]
-r FLOAT  trim threshold [0]
-T DIR    temp directory [./output_prefix.XXXXXXXXXXXX]
-k        keep temporary files
-K FILE   path to lumpyexpress.config file
            (default: same directory as lumpyexpress)
-v        verbose
-h        show this message

การกำหนดค่า

LUMPY Express รันโปรแกรมภายนอกหลายโปรแกรมซึ่งมีเส้นทางระบุไว้ใน scripts/lumpyexpress.config การกำหนดค่านี้ต้องอยู่ในไดเรกทอรีเดียวกันกับ lumpyexpress หรือระบุอย่างชัดเจนด้วยแฟล็ก -K

การติดตั้ง Makefile จะสร้างไฟล์ lumpyexpress.config โดยอัตโนมัติและวางไว้ในไดเร็กทอรี "bin"

ป้อนข้อมูล

LUMPY Express คาดว่าไฟล์ BAM ที่จัดแนว BWA-MEM เป็นอินพุต โดยจะแยกวิเคราะห์ข้อมูลตัวอย่าง ไลบรารี และกลุ่มการอ่านโดยอัตโนมัติโดยใช้แท็ก @RG ในส่วนหัว BAM ไฟล์ BAM แต่ละไฟล์คาดว่าจะมีตัวอย่างเดียวเท่านั้น

อินพุตขั้นต่ำคือไฟล์ BAM ที่เรียงลำดับพิกัด (-B) ซึ่ง LUMPY Express จะแยกตัวแยกและไม่ลงรอยกันโดยใช้ SAMBLASTER ก่อนที่จะรัน LUMPY อีกทางหนึ่ง ผู้ใช้อาจจัดหาไฟล์ BAM ที่แยกตามพิกัด (-S) และไฟล์ BAM ที่ไม่ลงรอยกัน (-D) ซึ่งจะข้ามการแยก SAMBLASTER เพื่อการวิเคราะห์ที่รวดเร็วยิ่งขึ้น

เอาท์พุต

LUMPY Express สร้างไฟล์ VCF ตามข้อกำหนด VCF 4.2

การใช้งานเป็นก้อน (ดั้งเดิม)

การตรวจจับเบรกพอยต์ที่ยืดหยุ่นและปรับแต่งได้สำหรับผู้ใช้ขั้นสูง

 usage:    lumpy [options]

ตัวเลือก

 -g       Genome file (defines chromosome order)
-e       Show evidence for each call
-w       File read windows size (default 1000000)
-mw      minimum weight across all samples for a call
-msw     minimum per-sample weight for a call
-tt      trim threshold
-x       exclude file bed file
-t       temp file prefix, must be to a writeable directory
-P       output probability curve for each variant
-b       output as BEDPE instead of VCF

-sr      bam_file:<file name>,
         id:<sample name>,
       	 back_distance:<distance>,
         min_mapping_threshold:<mapping quality>,
         weight:<sample weight>,
         min_clip:<minimum clip length>,
         read_group:<string>

-pe      bam_file:<file name>,
         id:<sample name>,
         histo_file:<file name>,
         mean:<value>,
         stdev:<value>,
         read_length:<length>,
         min_non_overlap:<length>,
         discordant_z:<z value>,
         back_distance:<distance>,
         min_mapping_threshold:<mapping quality>,
         weight:<sample weight>,
         read_group:<string>

-bedpe   bedpe_file:<bedpe file>,
         id:<sample name>,
         weight:<sample weight>

ตัวอย่างขั้นตอนการทำงาน

การประมวลผลล่วงหน้า

เราขอแนะนำให้จัดแนวข้อมูลด้วย SpeedSeq ซึ่งดำเนินการจัดตำแหน่ง BWA-MEM ทำเครื่องหมายรายการซ้ำ และแยกคู่การอ่านที่แยกและไม่สอดคล้องกัน

 speedseq align -R "@RGtID:idtSM:sampletLB:lib" 
    human_g1k_v37.fasta 
    sample.1.fq 
    sample.2.fq

มิฉะนั้นข้อมูลอาจสอดคล้องกับ BWA-MEM

 # Align the data
bwa mem -R "@RGtID:idtSM:sampletLB:lib" human_g1k_v37.fasta sample.1.fq sample.2.fq 
    | samblaster --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 
    | samtools view -S -b - 
    > sample.bam

# Extract the discordant paired-end alignments.
samtools view -b -F 1294 sample.bam > sample.discordants.unsorted.bam

# Extract the split-read alignments
samtools view -h sample.bam 
    | scripts/extractSplitReads_BwaMem -i stdin 
    | samtools view -Sb - 
    > sample.splitters.unsorted.bam

# Sort both alignments
samtools sort sample.discordants.unsorted.bam sample.discordants
samtools sort sample.splitters.unsorted.bam sample.splitters

วิ่งเป็นก้อน

LUMPY มีทางเลือกการดำเนินการที่แตกต่างกันสองทาง LUMPY Express เป็น wrapper ที่เรียบง่ายสำหรับการวิเคราะห์มาตรฐาน LUMPY (ดั้งเดิม) สามารถปรับแต่งได้มากขึ้นสำหรับผู้ใช้ขั้นสูงและการทดลองเฉพาะทาง

ลัมปี้ เอ็กซ์เพรส

เรียกใช้ LUMPY Express บนตัวอย่างเดียวด้วยตัวแยกและตัวแยกที่แยกไว้ล่วงหน้า
```
 lumpyexpress 
    -B sample.bam 
    -S sample.splitters.bam 
    -D sample.discordants.bam 
    -o sample.vcf
```

เรียกใช้ LUMPY Express ร่วมกันกับตัวอย่างหลายรายการด้วยตัวแยกและตัวแยกที่แยกไว้ล่วงหน้า

 lumpyexpress 
    -B sample1.bam,sample2.bam,sample3.bam 
    -S sample1.splitters.bam,sample2.splitters.bam,sample3.splitters.bam 
    -D sample1.discordants.bam,sample2.discordants.bam,sample3.discordants.bam 
    -o multi_sample.vcf

เรียกใช้ LUMPY Express กับคู่เนื้องอก-ปกติ

 lumpyexpress 
    -B tumor.bam,normal.bam 
    -S tumor.splitters.bam,normal.splitters.bam 
    -D tumor.discordants.bam,normal.discordants.bam 
    -o tumor_normal.vcf

ก้อน (ดั้งเดิม)

ขั้นแรก สร้างสถิติขนาดส่วนแทรกเชิงประจักษ์ในแต่ละไลบรารีในไฟล์ BAM

 samtools view -r readgroup1 sample.bam 
    | tail -n+100000 
    | scripts/pairend_distro.py 
    -r 101 
    -X 4 
    -N 10000 
    -o sample.lib1.histo

สคริปต์ด้านบน (scripts/pairend_distro.py) จะแสดงค่าเฉลี่ยและ stdev ไปที่หน้าจอ สำหรับตัวอย่างเหล่านี้ เราจะถือว่าค่าเฉลี่ยคือ 500 และ stdev คือ 50

รัน LUMPY ด้วย paired-end และ split-reads

 lumpy 
    -mw 4 
    -tt 0 
    -pe id:sample,bam_file:sample.discordants.bam,histo_file:sample.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 
    -sr id:sample,bam_file:sample.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 
    > sample.vcf

เรียกใช้ LUMPY บนไฟล์ BAM ที่มีหลายไลบรารี

 lumpy 
    -mw 4 
    -tt 0 
    -pe id:sample,read_group:rg1,bam_file:sample.discordants.bam,histo_file:sample.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 
    -pe id:sample,read_group:rg2,bam_file:sample.discordants.bam,histo_file:sample.lib2.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 
    -sr id:sample,bam_file:sample.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 
    > sample.vcf

รัน LUMPY บนหลายตัวอย่างด้วยหลายไลบรารี

 lumpy 
    -mw 4 
    -tt 0 
    -pe id:sample1,bam_file:sample1.discordants.bam,read_group:rg1,read_group:rg2,histo_file:sample1.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 
    -pe id:sample1,bam_file:sample1.discordants.bam,read_group:rg3,histo_file:sample1.lib2.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 
    -pe id:sample2,bam_file:sample2.discordants.bam,read_group:rg4,histo_file:sample2.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 
    -sr id:sample1,bam_file:sample1.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 
    -sr id:sample2,bam_file:sample2.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 
    > multi_sample.vcf

เรียกใช้ LUMPY โดยไม่รวมขอบเขตที่มีความซับซ้อนต่ำ
Heng Li จัดเตรียมชุดขอบเขตที่มีความซับซ้อนต่ำไว้ในข้อมูลเสริมของรายงานของเขา "สู่ความเข้าใจที่ดีขึ้นเกี่ยวกับสิ่งประดิษฐ์ในการเรียกตัวแปรจากตัวอย่างที่มีความครอบคลุมสูง" ที่ https://doi.org/10.1093/bioinformatics/btu356
```
 unzip btu356_Supplementary_Data.zip
unzip btu356-suppl_data.zip
lumpy 
    -mw 4 
    -tt 0.0 
    -x btu356_LCR-hs37d5.bed/btu356_LCR-hs37d5.bed 
    -pe bam_file:sample.discordants.bam,histo_file:sample.pe.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,id:sample,min_mapping_threshold:1 
    -sr bam_file:sample.sr.sort.bam,back_distance:10,weight:1,id:sample,min_mapping_threshold:1 
    > sample.exclude.vcf
```
เรียกใช้ LUMPY โดยไม่รวมภูมิภาคที่มีความครอบคลุมสูงมาก
เราสั่งให้ก้อนละเว้นบางภูมิภาคได้โดยใช้ตัวเลือกยกเว้นภูมิภาค ในตัวอย่างนี้ เราค้นหาและยกเว้นภูมิภาคที่มีความครอบคลุมสูงมาก อันดับแรก เราใช้สคริปต์ get_coverages.py เพื่อค้นหาความครอบคลุมขั้นต่ำ สูงสุด และค่าเฉลี่ยของไฟล์ sr และ pe bam และเพื่อสร้างโปรไฟล์ความครอบคลุมสำหรับทั้งสองไฟล์
```
 python ../scripts/get_coverages.py 
    sample.pe.sort.bam 
sample.sr.sort.bam
# sample.pe.sort.bam.coverage  min:1   max:14  mean(non-zero):2.35557521272
# sample.sr.sort.bam.coverage  min:1   max:7   mean(non-zero):1.08945936729
```
จากผลลัพธ์นี้ เราจะเลือกที่จะยกเว้นภูมิภาคที่มีความครอบคลุมมากกว่า 10 เท่า ในการสร้างไฟล์ที่แยกออก เราจะใช้สคริปต์ get_exclude_regions.py เพื่อสร้างไฟล์ที่แยกออก
```
 python ../scripts/get_exclude_regions.py 
    10 
exclude.bed 
sample.pe.sort.bam 
sample.sr.sort.bam
```
ตอนนี้เรารันใหม่เป็นก้อนด้วยตัวเลือกยกเว้น (-x)
```
 lumpy 
    -mw 4 
    -tt 0.0 
    -x exclude.bed 
    -pe bam_file:sample.discordants.bam,histo_file:sample.pe.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,id:sample,min_mapping_threshold:1 
    -sr bam_file:sample.sr.sort.bam,back_distance:10,weight:1,id:sample,min_mapping_threshold:1 
    > sample.exclude.vcf
```

หลังการประมวลผล

SVTyper สามารถเรียกจีโนไทป์บนไฟล์ VCF เอาต์พุต LUMPY โดยใช้อัลกอริธึมความน่าจะเป็นสูงสุดของ Bayesian

 svtyper       
    -B sample.bam 
    -S sample.splitters.bam 
    -i sample.vcf
    > sample.gt.vcf

ข้อมูลการทดสอบ

สคริปต์ test/test.sh ดำเนินการเป็นกลุ่มกับชุดข้อมูลจำลองหลายชุด และเปรียบเทียบผลลัพธ์กับผลลัพธ์ที่ถูกต้องที่ทราบ สามารถดูชุดข้อมูลตัวอย่างได้ที่ http://layerlab.org/lumpy/data.tar.gz ควรแยกลูกบอลน้ำมันดินนี้ลงในไดเร็กทอรีที่เป็นก้อนระดับบนสุด สคริปต์ test/test.sh ตรวจสอบการมีอยู่ของไดเร็กทอรีนี้ก่อนที่จะรัน LUMPY

การแก้ไขปัญหา

ไฟล์ bam ทั้งหมดที่กระบวนการที่เป็นก้อนจะต้องเรียงลำดับตำแหน่ง หากต้องการตรวจสอบว่า bams ของคุณเรียงลำดับอย่างถูกต้องหรือไม่ ให้ใช้สคริปต์ check_sorting.py

 python ../scripts/check_sorting.py 
    pe.pos_sorted.bam 
    sr.pos_sorted.bam 
    pe.name_sorted.bam
# pe.pos_sorted.bam
# in order
# sr.pos_sorted.bam
# in order
# pe.name_sorted.bam
# out of order:   chr10   102292476   occurred after   chr10   102292893

ขยาย