The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome Hiroaki Sakai 1, Ken Naito 2 *, Eri Ogiso-Tanaka 2, Yu Takahashi 2, Kohtaro Iseki 2, Chiaki Muto 2, Kazuhito Satou 3, Kuniko Teruya 3, Akino Shiroma 3, Makiko Shimoji 3, Takashi Hirano 3, Takeshi Itoh 1, Akito Kaga 2, and Norihiko Tomooka 2 1 Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 35-862, Japan. 2 Genetic Resources Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 35-862, Japan. 3 Okinawa Institute of Advanced Sciences, Uruma, 94-2234, Japan. Correspondence should be addressed to Ken Naito (knaito@affrc.go.jp).
Assembly_1 Assembly_2 454 reads PacBio reads Illumina reads Assemble (CA) Assemble (SGA) Error correction (pacbiotoca) Assemble (ALLPATHS-LG) Initial contigs Rm. contaminated contigs ecpbreads Rm. contaminated scaffolds Error correction (BWA, samtools) Trimming Fin-contigs Fin-scaffolds Illumina reads eccontigs Scaffolding (SSPACE) Mate-pair reads (454 & Illumina) Mapping (BWA) Scaffolds Unmapped Illumina reads Gap closing (PBJelly) Fin-scaffolds Supplementary Figure 1. Assembly flows of Assembly_1 and Assembly_2. Blue drums indicate sequence data, whereas red boxes indicate assembly tools. CA: Celera Assembler. SGA: String Graph Assembler. eccontigs: error-corrected contigs. ecpbreads: error-corrected PacBio reads. Rm: remove. Fin-contigs: final contigs. Fin-scaffolds: final scaffolds.
LG2 LG3 LG4 LG6 LG7 LG8 LG9 1 1 2 3 4 (cm) 5 6 7 8 9 1 11 12 Supplementary Figure 2. A high-density genetic linkage map of the azuki bean. In total, 4,912 marker loci were genotyped in 995 F2 plants derived from azuki bean x V. nepalensis.
Preprocessing Subreads Assembly Assemble (CA) Post-assembly processing Correcting indel errors by Illumina reads (BWA, GATK) Error correction (Sprai) Initial contigs Further correction using genetic map ecpbreads Rm. duplicate contigs (BLASTN) Rm. contaminated contigs Nr-contigs Fin-contigs Polishing (Quiver) Scaffolding (SGA) Trimming off low-quality edges Connecting overlapping contigs Cns-contigs Initial scaffolds Gap closing (PBJelly2) Supplementary Figure 3. Assembly flows of Assembly_3. Blue drums indicate sequence data, whereas red boxes indicate assembly tools. CA: Celera Assembler. SGA: String Graph Assembler. eccontigs: error-corrected contigs. ecpb reads: error-corrected PacBio reads. Rm: remove. Nr-contigs: non-redundant contigs. Cns-contigs: consensus contigs. Fin-contigs: final contigs. Fin-scaffolds: final scaffolds. Correcting indel errors by Illumina reads (BWA, GATK) Fin-scaffolds
35.27cM 35.27cM 68.49cM 68.54cM 68.54cM 35.27cM mis-assembly 68.49cM Supplementary Figure 4. A selection of misassembly sites. In this contig, DNA markers on and were present. Blue thick lines indicate the uniquely mapped paired-end reads with 25 bp inserts 25 bp, while the blue dots bridged with thin lines indicate those of 3 kb inserts. The yellow lines indicate the reads with multiple hits.
c d a b 45 4 35 3 25 2 15 1 5 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 A/T Whole genome Sites of indel errors 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 G/C Whole genome Sites of indel errors 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 A/T 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 G/C e f 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 >15 5 4 3 2 1 7 6 5 4 3 2 1 A/T G/C Supplementary Figure 5. Frequency of homopolymer length at indel-errors in PacBio assemblies. (a) Homopolymers of A/T or (b) Homopolymers of G/C in the azuki bean genome. Gray bars indicate frequency of homopolymers in the whole assembled sequences as a control. (c)(d) Arabidopsis thaliana. (e)(f) Drosophila melanogaster.
Supplementary Table 3. Summary of pseudomolecules in the Assembly_3. No. of scaffolds Total scaffolds length Pseudomolecule length Chr1 35 65,919,377 67,115,795 Chr2 28 45,96,54 45,515,668 Chr3 27 41,962,598 43,462,759 Chr4 25 53,47,133 54,73,736 Chr5 25 36,897,451 37,38,367 Chr6 23 38,789,64 38,86,97 Chr7 2 32,2,224 33,495,452 Chr8 32 46,426,616 46,951,433 Chr9 26 35,726,726 37,388,52 Chr1 17 28,785,286 28,886,4 Chr11 21 37,398,86 38,115,44 Unmapped 225 51,515,385 -
Gene Density (Genes/Mb) 18 18 LG2 18 LG3 15 15 15 12 12 12 9 9 9 6 6 6 3 3 3 8 16 24 32 4 48 56 64 6 12 18 24 3 36 42 6 12 18 24 3 36 42 Repeat Rate/Mb 1. 1. LG2 1. LG3.8.8.8.6.6.6.4.4.4.2.2.2. 8 16 24 32 4 48 56 64. 6 12 18 24 3 36 42. 6 12 18 24 3 36 42 Recombination Frequency (cm/mb) 7. 7. LG2 6. 6. 5. 5. 4. 4. 3. 3. 2. 2. 1. 1... 8 16 24 32 4 48 56 64 6 12 18 24 3 36 42 Recombination Frequency (cm/gene) 7. 6. 5. 4. 3. 2. 1.. LG3 6 12 18 24 3 36 42.6.6 LG2.6 LG3.5.5.5.4.4.4.3.3.3.2.2.2.1.1.1 8 16 24 32 4 48 56 64 6 12 18 24 3 36 42 6 12 18 24 3 36 42 Supplementary Figure 6. An overview of the azuki bean genome.
Gene Density (Genes/Mb) 18 LG4 18 18 LG6 15 15 15 12 12 12 9 9 9 6 6 6 3 3 3 1 2 3 4 5 6 12 18 24 3 36 4 8 12 16 2 24 28 32 36 Repeat Rate/Mb 1. LG4 1. 1. LG6.8.8.8.6.6.6.4.4.4.2.2.2.. 1 2 3 4 5 6 12 18 24 3 36 Recombination Frequency (cm/mb). 6 12 18 24 3 36 7. LG4 7. 7. LG6 6. 6. 6. 5. 5. 5. 4. 4. 4. 3. 3. 3. 2. 2. 2. 1. 1. 1.. 1 2 3 4 5. 6 12 18 24 3 36. 6 12 18 24 3 36 Recombination Frequency (cm/gene).6 LG4.6.6 LG6.5.5.5.4.4.4.3.3.3.2.2.2.1.1.1 1 2 3 4 5 4 8 12 16 2 24 28 32 36 4 8 12 16 2 24 28 32 36 Supplementary Figure 6. An overview of the azuki bean genome. Continued.
Gene Density (Genes/Mb) 18 LG7 18 LG8 18 LG9 15 15 15 12 12 12 9 9 9 6 6 6 3 3 3 4 8 12 16 2 24 28 32 Repeat Rate/Mb 6 12 18 24 3 36 42 6 12 18 24 3 36 1. LG7 1. LG8 1. LG9.8.8.8.6.6.6.4.4.4.2.2.2... 4 8 12 16 2 24 28 32 6 12 18 24 3 36 42 6 12 18 24 3 36 Recombination Frequency (cm/mb) 7. LG7 7. LG8 7. LG9 6. 6. 6. 5. 5. 5. 4. 4. 4. 3. 3. 3. 2. 2. 2. 1. 1. 1.... 4 8 12 16 2 24 28 32 6 12 18 24 3 36 42 6 12 18 24 3 36 Recombination Frequency (cm/gene).6 LG7.5.4.3.2.1 4 8 12 16 2 24 28 32.6 LG8.5.4.3.2.1 6 12 18 24 3 36 42.6 LG9.5.4.3.2.1 6 12 18 24 3 36 Supplementary Figure 6. An overview of the azuki bean genome. Continued.
Gene Density (Genes/Mb) 18 15 12 9 6 3 4 8 12 16 2 24 Repeat Rate/Mb 1..8.6.4.2. 1 18 15 12 9 6 3 6 12 18 24 3 36 1. 1.8.6.4.2. 4 8 12 16 2 24 6 12 18 24 3 36 Recombination Freqeucy (cm/mb) 7. 7. 1 6. 6. 5. 5. 4. 4. 3. 3. 2. 2. 1. 1... 4 8 12 16 2 24 6 12 18 24 3 36 Recombination Frequency (cm/gene).6.6 1.5.5.4.4.3.3.2.2.1.1 4 8 12 16 2 24 6 12 18 24 3 36 Supplementary Figure 6. An overview of the azuki bean genome. Continued.