The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome

Similar documents
Genome mapping in salmonid fish

Figure S1a Diagram of the HA412- HO x RHA415 cross. J. E. Bowers et al.

7.013 Problem Set

Working with Marker Maps Tutorial

SUPPLEMENT MATERIALS

Identification of Species-Diagnostic SNP Markers in Tilapias Using ddradseq

Assigning breed origin to alleles in crossbred animals

Estimating breeding value

VOLLEYBALL Match players ranking USA USA

x trna * (C ) x , o 596 trna, 3 x, 8000 x [1]., 32 6 SCIENCE IN CHINA ( Series C ) A U a) U&C G U&C A b)

DNA Breed Profile Testing FAQ. What is the history behind breed composition testing?

Genetics Discussion for the American Black Hereford Association. David Greg Riley Texas A&M University

Biol 321 Genetics S 02 Exam #1

Selective Genotyping for Marker Assisted Selection Strategies for Soybean Yield Improvement. Ben Fallen

Waves Practice Problems AP Physics In a wave, the distance traveled by a wave during one period is called:

Genetic analysis of radio-tagged westslope cutthroat trout from St. Mary s River and Elk River. April 9, 2002

Sulaiman M Al-Mayouf1*, Asma Sunker2*, Reem Abdwani3*, Safiya Al Abrawi5, Fathiya

Bighorn Sheep Research Activity Love Stowell & Ernest_1May2017 Wildlife Genomics & Disease Ecology Lab Updated 04/27/2017 SMLS

GENES AND CHROMOSOMES CHROMOSOMES IN SEX CELLS. Horse Science: How Inheritance Works in Horses Page 3. dam unite and grow into the new animal.

Identification of the dwarfing gene Dw1

Genetic analyses of the Franches-Montagnes horse breed with genome-wide SNP data

Challenges related to introducing genomic selection in sport horse breeding EAAP 2015

Waves Multiple Choice

Impact of the Friesian POLLED Mutation on Milk Production Traits in Holstein Friesian

Compaction Theory and Best Practices

Insights into the genome of the De Donno strain of Xylella fastidiosa. Annalisa Giampetruzzi PhD researcher

FINAL PERFORMANCE REPORT

Test Results Summary. Owner: Darby Dan Farm 8 th Nov Distance Plus (USA) v1.0. Distance Plus (SAF) v1.0. Distance Plus (ANZ) v1.

VOLLEYBALL Match players ranking. ITA Italy

Supplementary Appendix

H orses are recognized as extremely successful domestic animals. Humans in many parts of the world have

AmpFlSTR Identifiler PCR Amplification Kit

Section 4.2. Travelling Waves

LAB 06 Organismal Respiration

Background. Do we need an equine biobank in the UK? Online Survey Results. B.iii Online survey results

Mission to Mars. Day 8 Heredity

1) A child is born with blue eyes even though BOTH his parents have brown eyes. How is this possible?

Supplementary Figure 1 An insect model based on Drosophila melanogaster. (a)

xbreed: An R package for genomic simulation of purebred and crossbred populations

Final Report: Measurements of Pile Driving Noise from Control Piles and Noise-Reduced Piles at the Vashon Island Ferry Dock

Xiu Feng, Xiaomu Yu, Beide Fu, Xinhua Wang, Haiyang Liu, Meixia Pang and Jingou Tong *

SynchroGait. Text: Lisa S. Andersson, Capilet Genetics and Kim Jäderkvist, Swedish University of agricultural Sciences (SLU).

Leopard-class Dropship

20 th Int. Symp. Animal Science Days, Kranjska gora, Slovenia, Sept. 19 th 21 st, 2012.

Society for Wildlife Forensic Science Develop Wildlife Forensic Science into a comprehensive, integrated and mature discipline.

The new stress negative Pietrain line developed at the Faculty of Veterinary Medicine of the University of Liege.

Annapolis Good Old Boat Regatta 2016 Sailing Instructions. Class Identification

CHAPTER III RESULTS. sampled from 22 streams, representing 4 major river drainages in New Jersey, and 1 trout

Genetics Test Review

The purpose of this experiment is to find this acceleration for a puck moving on an inclined air table.

Archival copy: for current recommendations see or your local extension office.

Today: waves. Exam Results. Wave Motion. What is moving? Motion of a piece of the rope. Energy transport

Exam Results, HW4 reminder. Wave Motion. Today: waves. What is moving? Motion of a piece of the rope. Exam Results. Average

RM-80 respiration monitor

Design of Optimal Depth of Coverage

Trial of a process to estimate depth of cover on buried pipelines

-2- A. schlegeli and A. latus inhabit Hiroshima Bay. Although the former is abundant, the

End of Chapter Exercises

Characteristics of modern soccer balls

Dehorning cattle via genetics

Enteroaggregative Escherichia coli and other unusual VTEC: a world in motion

Unit 8: GENETICS PACKET

Human gas exchange. Question Paper. Save My Exams! The Home of Revision. Cambridge International Examinations. 56 minutes. Time Allowed: Score: /46

Special edition paper

Revealing the Past and Present of Bison Using Genome Analysis

Stop Chasing & Start Managing Density

Waves Chapter Problems

THE GENERAL STUD BOOK OF SOUTHERN AFRICA CONDITIONS OF ENTRY

Weedy Seadragon Ecology Project. Annual Report Underwater Research Group of NSW Fish Ecology Lab, UTS

The Galway Community of Practice (CoP) All about Athletics

INSTRUCTOR RESOURCES

Slide 1 / The distance traveled by a wave in one period is called? Frequency Period Speed of wave Wavelength Amplitude

Haplotype analysis revealed candidate region for black/brown coat color gene in cattle

Two types of physical and biological standards are used to judge the performance of the Wheeler North Reef 1) Absolute standards are measured against

Practice Questions: Waves (AP Physics 1) Multiple Choice Questions:

Topic 3 Other patterns of Inheritance (heredity) Pre Class Reading Assignment. 1. Read pgs

LAB. NATURAL SELECTION OF STRAWFISH

First results on genomic selection in French show-jumping horses

Physics 1520, Spring 2014 Quiz 1A, Form: A

Physics 1520, Spring 2014 Quiz 1B, Form: A

Population structure and genetic diversity of Bovec sheep from Slovenia preliminary results

Supplemental figure 1. Collection sites and numbers of strains sequenced.

Continued Genetic Monitoring of the Kootenai Tribe of Idaho White Sturgeon Conservation Aquaculture Program

WSF ASSESSMENT SHEET

Supporting information

PCR primers for 100 microsatellites in red drum (Sciaenops ocellatus)

Unit 2 Kinetic Theory, Heat, and Thermodynamics: 2.A.1 Problems Temperature and Heat Sections of your book.

TRAINING THE : PUTTING IT TOGETHER

What environmental factors trigger a fruit fly response?

Bison Conservation Genetics and Disease

Typical KRT25 and SP6 Crosses

ERC Consolidator grant RHEA: the story of my application

INTRODUCTION: EXPERIMENTAL NOTES:

Maximizing genetic progress in the new age of genomics

End of Chapter Exercises

Full Name: Period: Heredity EOC Review

INTRODUCTION TO PATTERN RECOGNITION

Supplemental Data. Gutjahr et al. (2008). Arbuscular mycorrhiza-specific signaling in rice transcends the common symbiosis signaling pathway.

wi Astuti, Hidayat Ashari, and Siti N. Prijono

59 th EAAP Annual Meeting August 24-27, 2008, Vilnius, Lithuania Abstract # 2908, Session 08

Transcription:

The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome Hiroaki Sakai 1, Ken Naito 2 *, Eri Ogiso-Tanaka 2, Yu Takahashi 2, Kohtaro Iseki 2, Chiaki Muto 2, Kazuhito Satou 3, Kuniko Teruya 3, Akino Shiroma 3, Makiko Shimoji 3, Takashi Hirano 3, Takeshi Itoh 1, Akito Kaga 2, and Norihiko Tomooka 2 1 Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 35-862, Japan. 2 Genetic Resources Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 35-862, Japan. 3 Okinawa Institute of Advanced Sciences, Uruma, 94-2234, Japan. Correspondence should be addressed to Ken Naito (knaito@affrc.go.jp).

Assembly_1 Assembly_2 454 reads PacBio reads Illumina reads Assemble (CA) Assemble (SGA) Error correction (pacbiotoca) Assemble (ALLPATHS-LG) Initial contigs Rm. contaminated contigs ecpbreads Rm. contaminated scaffolds Error correction (BWA, samtools) Trimming Fin-contigs Fin-scaffolds Illumina reads eccontigs Scaffolding (SSPACE) Mate-pair reads (454 & Illumina) Mapping (BWA) Scaffolds Unmapped Illumina reads Gap closing (PBJelly) Fin-scaffolds Supplementary Figure 1. Assembly flows of Assembly_1 and Assembly_2. Blue drums indicate sequence data, whereas red boxes indicate assembly tools. CA: Celera Assembler. SGA: String Graph Assembler. eccontigs: error-corrected contigs. ecpbreads: error-corrected PacBio reads. Rm: remove. Fin-contigs: final contigs. Fin-scaffolds: final scaffolds.

LG2 LG3 LG4 LG6 LG7 LG8 LG9 1 1 2 3 4 (cm) 5 6 7 8 9 1 11 12 Supplementary Figure 2. A high-density genetic linkage map of the azuki bean. In total, 4,912 marker loci were genotyped in 995 F2 plants derived from azuki bean x V. nepalensis.

Preprocessing Subreads Assembly Assemble (CA) Post-assembly processing Correcting indel errors by Illumina reads (BWA, GATK) Error correction (Sprai) Initial contigs Further correction using genetic map ecpbreads Rm. duplicate contigs (BLASTN) Rm. contaminated contigs Nr-contigs Fin-contigs Polishing (Quiver) Scaffolding (SGA) Trimming off low-quality edges Connecting overlapping contigs Cns-contigs Initial scaffolds Gap closing (PBJelly2) Supplementary Figure 3. Assembly flows of Assembly_3. Blue drums indicate sequence data, whereas red boxes indicate assembly tools. CA: Celera Assembler. SGA: String Graph Assembler. eccontigs: error-corrected contigs. ecpb reads: error-corrected PacBio reads. Rm: remove. Nr-contigs: non-redundant contigs. Cns-contigs: consensus contigs. Fin-contigs: final contigs. Fin-scaffolds: final scaffolds. Correcting indel errors by Illumina reads (BWA, GATK) Fin-scaffolds

35.27cM 35.27cM 68.49cM 68.54cM 68.54cM 35.27cM mis-assembly 68.49cM Supplementary Figure 4. A selection of misassembly sites. In this contig, DNA markers on and were present. Blue thick lines indicate the uniquely mapped paired-end reads with 25 bp inserts 25 bp, while the blue dots bridged with thin lines indicate those of 3 kb inserts. The yellow lines indicate the reads with multiple hits.

c d a b 45 4 35 3 25 2 15 1 5 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 A/T Whole genome Sites of indel errors 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 G/C Whole genome Sites of indel errors 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 A/T 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 G/C e f 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 5 4 3 2 1 7 6 5 4 3 2 1 A/T G/C Supplementary Figure 5. Frequency of homopolymer length at indel-errors in PacBio assemblies. (a) Homopolymers of A/T or (b) Homopolymers of G/C in the azuki bean genome. Gray bars indicate frequency of homopolymers in the whole assembled sequences as a control. (c)(d) Arabidopsis thaliana. (e)(f) Drosophila melanogaster.

Supplementary Table 3. Summary of pseudomolecules in the Assembly_3. No. of scaffolds Total scaffolds length Pseudomolecule length Chr1 35 65,919,377 67,115,795 Chr2 28 45,96,54 45,515,668 Chr3 27 41,962,598 43,462,759 Chr4 25 53,47,133 54,73,736 Chr5 25 36,897,451 37,38,367 Chr6 23 38,789,64 38,86,97 Chr7 2 32,2,224 33,495,452 Chr8 32 46,426,616 46,951,433 Chr9 26 35,726,726 37,388,52 Chr1 17 28,785,286 28,886,4 Chr11 21 37,398,86 38,115,44 Unmapped 225 51,515,385 -

Gene Density (Genes/Mb) 18 18 LG2 18 LG3 15 15 15 12 12 12 9 9 9 6 6 6 3 3 3 8 16 24 32 4 48 56 64 6 12 18 24 3 36 42 6 12 18 24 3 36 42 Repeat Rate/Mb 1. 1. LG2 1. LG3.8.8.8.6.6.6.4.4.4.2.2.2. 8 16 24 32 4 48 56 64. 6 12 18 24 3 36 42. 6 12 18 24 3 36 42 Recombination Frequency (cm/mb) 7. 7. LG2 6. 6. 5. 5. 4. 4. 3. 3. 2. 2. 1. 1... 8 16 24 32 4 48 56 64 6 12 18 24 3 36 42 Recombination Frequency (cm/gene) 7. 6. 5. 4. 3. 2. 1.. LG3 6 12 18 24 3 36 42.6.6 LG2.6 LG3.5.5.5.4.4.4.3.3.3.2.2.2.1.1.1 8 16 24 32 4 48 56 64 6 12 18 24 3 36 42 6 12 18 24 3 36 42 Supplementary Figure 6. An overview of the azuki bean genome.

Gene Density (Genes/Mb) 18 LG4 18 18 LG6 15 15 15 12 12 12 9 9 9 6 6 6 3 3 3 1 2 3 4 5 6 12 18 24 3 36 4 8 12 16 2 24 28 32 36 Repeat Rate/Mb 1. LG4 1. 1. LG6.8.8.8.6.6.6.4.4.4.2.2.2.. 1 2 3 4 5 6 12 18 24 3 36 Recombination Frequency (cm/mb). 6 12 18 24 3 36 7. LG4 7. 7. LG6 6. 6. 6. 5. 5. 5. 4. 4. 4. 3. 3. 3. 2. 2. 2. 1. 1. 1.. 1 2 3 4 5. 6 12 18 24 3 36. 6 12 18 24 3 36 Recombination Frequency (cm/gene).6 LG4.6.6 LG6.5.5.5.4.4.4.3.3.3.2.2.2.1.1.1 1 2 3 4 5 4 8 12 16 2 24 28 32 36 4 8 12 16 2 24 28 32 36 Supplementary Figure 6. An overview of the azuki bean genome. Continued.

Gene Density (Genes/Mb) 18 LG7 18 LG8 18 LG9 15 15 15 12 12 12 9 9 9 6 6 6 3 3 3 4 8 12 16 2 24 28 32 Repeat Rate/Mb 6 12 18 24 3 36 42 6 12 18 24 3 36 1. LG7 1. LG8 1. LG9.8.8.8.6.6.6.4.4.4.2.2.2... 4 8 12 16 2 24 28 32 6 12 18 24 3 36 42 6 12 18 24 3 36 Recombination Frequency (cm/mb) 7. LG7 7. LG8 7. LG9 6. 6. 6. 5. 5. 5. 4. 4. 4. 3. 3. 3. 2. 2. 2. 1. 1. 1.... 4 8 12 16 2 24 28 32 6 12 18 24 3 36 42 6 12 18 24 3 36 Recombination Frequency (cm/gene).6 LG7.5.4.3.2.1 4 8 12 16 2 24 28 32.6 LG8.5.4.3.2.1 6 12 18 24 3 36 42.6 LG9.5.4.3.2.1 6 12 18 24 3 36 Supplementary Figure 6. An overview of the azuki bean genome. Continued.

Gene Density (Genes/Mb) 18 15 12 9 6 3 4 8 12 16 2 24 Repeat Rate/Mb 1..8.6.4.2. 1 18 15 12 9 6 3 6 12 18 24 3 36 1. 1.8.6.4.2. 4 8 12 16 2 24 6 12 18 24 3 36 Recombination Freqeucy (cm/mb) 7. 7. 1 6. 6. 5. 5. 4. 4. 3. 3. 2. 2. 1. 1... 4 8 12 16 2 24 6 12 18 24 3 36 Recombination Frequency (cm/gene).6.6 1.5.5.4.4.3.3.2.2.1.1 4 8 12 16 2 24 6 12 18 24 3 36 Supplementary Figure 6. An overview of the azuki bean genome. Continued.