Science nature
Recordsdata availability
Code availability
All source code for the Minigraph-Cactus pangenome pipeline, as neatly as free up binaries, Docker photos and particular person manuals, might perchance likely likely merely furthermore be stumbled on at https://github.com/ComparativeGenomicsToolkit/cactus.
References
-
Eizenga, J. M. et al. Pangenome graphs. Annu. Rev. Genomics Hum. Genet. 21, 139–162 (2020).
-
Miga, K. H. & Wang, T. The need for a human pangenome reference sequence. Annu. Rev. Genomics Hum. Genet. 22, 81–102 (2021).
-
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).
-
Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature 583, 83–89 (2020).
-
Hickey, G. et al. Genotyping structural variants in pangenome graphs the expend of the vg toolkit. Genome Biol. 21, 35 (2020).
-
Sirén, J. et al. Pangenomics permits genotyping of known structural variants in 5202 various genomes. Science 374, abg8871 (2021).
-
Paten, B. et al. Superbubbles, ultrabubbles, and cacti. J. Comput. Biol. 25, 649–663 (2018).
-
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01435-7 (2023).
-
Correct, W. Computational complexity of a few sequence alignment with SP-rating. J. Comput. Biol. 8, 615–623 (2004).
-
Kille, B., Balaji, A., Sedlazeck, F. J., Nute, M. & Treangen, T. J. Multiple genome alignment in the telomere-to-telomere assembly skills. Genome Biol. 23, 182 (2022).
-
Blanchette, M. et al. Aligning a few genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004).
-
Harris, R. S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania Instruct Univ. (2007).
-
Armstrong, J. et al. Revolutionary Cactus is a a few-genome aligner for the thousand-genome skills. Nature 587, 246–251 (2020).
-
Goenka, S. D., Turakhia, Y., Paten, B. & Horowitz, M. SegAlign: a scalable GPU-basically basically based entire genome aligner. In SC20: Global Convention for Excessive Efficiency Computing, Networking, Storage and Prognosis. https://doi.org/10.1109/sc41405.2020.00043 (IEEE, 2020).
-
Paten, B. et al. Cactus graphs for genome comparisons. J. Comput. Biol. 18, 461–489 (2011).
-
Li, H., Feng, X. & Chu, C. The assemble and constructing of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
-
Lee, C., Grasso, C. & Sharlow, M. F. Multiple sequence alignment the expend of partial snarl graphs. Bioinformatics 18, 452–464 (2002).
-
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
-
Vivian, J. et al. Toil permits reproducible, delivery source, big biomedical recordsdata analyses. Nat. Biotechnol. 35, 314–316 (2017).
-
Paten, B. et al. Cactus: algorithms for genome a few sequence alignment. Genome Res. 21, 1512–1528 (2011).
-
Hickey, G., Paten, B., Earl, D., Zerbino, D. & Haussler, D. HAL: a hierarchical format for storing and analyzing a few genome alignments. Bioinformatics 29, 1341–1342 (2013).
-
Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)-simultaneous clade and non-public genome annotation. Genome Res. 28, 1029–1038 (2018).
-
Doerr, D. GFAffix. https://github.com/marschall-lab/GFAffix (2022).
-
Bzikadze, A. V. & Pevzner, P. A. TandemAligner: a brand unusual parameter-free framework for immediate sequence alignment. Preprint at bioRxiv https://doi.org/10.1101/2022.09.15.507041 (2022).
-
Liao, W.-W. et al. A draft human pangenome reference. Nature https://doi.org/10.1038/s41586-023-05896-x (2023).
-
Nurk, S. et al. The entire sequence of a human genome. Science 376, 44–fifty three (2022).
-
Rautiainen, M. & Marschall, T. GraphAligner: like a flash and versatile sequence-to-graph alignment. Genome Biol. 21, 253 (2020).
-
Poplin, R. et al. A neatly-liked SNP and minute-indel variant caller the expend of deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
-
Wagner, J. et al. Curated variation benchmarks for no longer easy medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
-
Ebler, J. et al. Pangenome-basically basically based genome inference permits atmosphere friendly and valid genotyping across a extensive spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).
-
1000 Genomes Venture Consortiumet al. A world reference for human genetic variation. Nature 526, 68–74 (2015).
-
Ebert, P. et al. Haplotype-resolved various human genomes and built-in diagnosis of structural variation. Science 372, eabf7117 (2021).
-
Chakraborty, M., Emerson, J. J., Macdonald, S. J. & Long, A. D. Structural variants existing frequent allelic heterogeneity and shape variation in advanced traits. Nat. Commun. 10, 4872 (2019).
-
Huang, W. et al. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res. 24, 1193–1208 (2014).
-
Garrison, E. & Marth, G. Haplotype-basically basically based variant detection from rapid-read sequencing. Preprint at arXiv https://doi.org/10.48550/arXiv.1207.3907 (2012).
-
Miller, D. E. et al. Identification and characterization of breakpoints and mutations on Drosophila melanogaster balancer chromosomes. G3 (Bethesda) 10, 4271–4285 (2020).
-
Sherman, R. M. et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 51, 30–35 (2019).
-
Human Pangenome Reference Consortium. HPRC Pangenome Resources. https://github.com/human-pangenomics/hpp_pangenome_resources (2022).
-
Guarracino, A. et al. Recombination between heterologous human acrocentric chromosomes. Nature https://doi.org/10.1038/s41586-023-05976-y (2023).
-
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
-
Leonard, A. S. et al. Structural variant-basically basically based pangenome constructing has low sensitivity to variability of haplotype-resolved bovine assemblies. Nat. Commun. 13, 3012 (2022).
-
Li, H. Identifying centromeric satellites with dna-brnn. Bioinformatics 35, 4408–4410 (2019).
-
Numanagic, I. et al. Like a flash characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018).
-
Gao, Y. et al. abPOA: an SIMD-basically basically based C library for immediate partial snarl alignment the expend of adaptive band. Bioinformatics 37, 2209–2211 (2021).
-
Earl, D. et al. Alignathon: a aggressive evaluate of entire-genome alignment methods. Genome Res. 24, 2077–2089 (2014).
-
Garrison, E. & Guarracino, A. Neutral pangenome graphs. Bioinformatics 39, btac743 (2023).
-
Eizenga, J. M. et al. Efficient dynamic variation graphs. Bioinformatics 36, 5139–5144 (2020).
-
Sirén, J., Garrison, E., Novak, A. M., Paten, B. & Durbin, R. Haplotype-conscious graph indexes. Bioinformatics 36, 400–407 (2020).
-
Mose, L. E., Wilkerson, M. D., Hayes, D. N., Perou, C. M. & Parker, J. S. ABRA: improved coding indel detection by contrivance of assembly-basically basically based realignment. Bioinformatics 30, 2813–2815 (2014).
-
Zook, J. M. et al. Intensive sequencing of seven human genomes to symbolize benchmark reference materials. Sci. Recordsdata 3, 160025 (2016).
-
Krusche, P. et al. Handiest practices for benchmarking germline minute-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
-
Cleary, J. G. et al. Evaluating variant name recordsdata for efficiency benchmarking of next-skills sequencing variant calling pipelines. Preprint at bioRxiv https://doi.org/10.1101/023754 (2015).
-
Li, H. et al. A synthetic-diploid benchmark for valid variant-calling evaluation. Nat. Recommendations 15, 595–597 (2018).
-
broadinstitute/picard. https://github.com/broadinstitute/picard
-
Kuhn, R. M., Haussler, D. & Kent, W. J. The U.S. Genome Browser and linked instruments. Transient. Bioinform. 14, 144–161 (2012).
-
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: delicate structural variant comparability preserves allelic fluctuate. Genome Biol. 23, 271 (2022).
-
Smit, A. F. A., Hubley, R. & Inexperienced, P. RepeatMasker Starting up-4.0. http://www.repeatmasker.org (2013–2015).
-
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Acknowledgements
We thank A. D. Long for masses of solutions and insights in terms of the D. melanogaster recordsdata and the total vg team for his or her work to invent and dangle vg, upon which mighty of this work is dependent. B.P., A.N., J.M.E. and J.M. were partly supported by Nationwide Institutes of Health (NIH) grants R01HG010485, U24HG010262, U24HG011853, OT3HL142481, U01HG010961 (with H.L.) and OT2OD033761. H.L. became partly supported by NIH grant R01HG010040 and T.M. by U01HG010973. Computational infrastructure and toughen for working PanGenie were provided by the Centre for Recordsdata and Media Technology at Heinrich Heine University Düsseldorf.
Ethics declarations
Competing interests
The authors expose no competing interests.
Survey review
Survey review recordsdata
Nature Biotechnology thanks the nameless reviewers for his or her contribution to the gape review of this work.
Additional recordsdata
Publisher’s existing Springer Nature stays fair with regards to jurisdictional claims in printed maps and institutional affiliations.
Supplementary recordsdata
Rights and permissions
Springer Nature or its licensor (e.g. a society or varied partner) holds strange rights to this article below a publishing agreement with the author(s) or varied rightsholder(s); author self-archiving of the permitted manuscript model of this article is entirely governed by the phrases of such publishing agreement and applicable law.
About this article
Cite this article
Hickey, G., Monlong, J., Ebler, J. et al. Pangenome graph constructing from genome alignments with Minigraph-Cactus.
Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01793-w
-
Got:
-
Licensed:
-
Printed:
-
DOI: https://doi.org/10.1038/s41587-023-01793-w