SUPPLEMENTARY INFORMATION

Similar documents
Cichlids of East Africa A Model of Vertebrate Radiation. ww.waveformenergetics.com

The jawed vertebrates (gnathostomes) fall into two major taxa,

Genome-scale approach proves that the lungfish-coelacanth sister group is the closest living

Ray-finned fishes (Actinopterygii)

Human Ancestry (Learning Objectives)

Climate Researchers Feeling Heat. By Juliet Eilperin Washington Post Staff Writer Thursday, April 6, 2006; A27

1. Overview of Chordates

SUSUMU HYODO*, SUSUMU ISHII, AND JEAN M. P. JOSS

Animal Evolution: Chordate and Vertebrate Evolution and Diversity (Learning Outline)

8 Studying Hominids In ac t i v i t y 5, Using Fossil Evidence to Investigate Whale Evolution, you

Biol Echinoderms & Chordates. But first a few words about Development

#7 Still more DP, Scoring Matrices 9/5/07

Chapter 34A: The Origin & Evolution of Vertebrates I. 1. Overview of the Chordates 2. Invertebrate Chordates

Fish Dissection. Background

Figure 1: Chordate Characteristics

Genetic engineering in the mouse: from functional genomics to zootechnical applications. Luc Grobet Dimitri Pirottin M. Georges

The Deuterostomes and the rise of the Vertebrates: from Echinoderms to Man

#11 - Multiple Sequence Alignment 9/14/07

Lecture 8 History of fishes

Genome duplication in the Tetraodon nigroviridis genome reveals the early vertebrate karotype. What did our paleozoic ancestor genome look like?

Copyright 2011 Pearson Education, Inc.

Living organisms represent

Zebrafish Fin Regeneration Virtual Experiment

MALAWI CICHLIDS SARAH ROBBINS BSCI462 SPRING 2013

Biol 321 Genetics S 02 Exam #1

BI 101: Chordate Animals & Biodiversity

Molecular biology studies on the coelacanth: a review

Unit 19.2: Fish. Vocabulary fish spawning swim bladder

Slide 1 of 64. End Show Copyright Pearson Prentice Hall. End Show Copyright Pearson Prentice Hall. Respiration. Slide 5 of 64

Biochemical Applications of Computational Chemistry

Dorsal hollow nerve chord that forms spinal cord and brain. VERTEBRATES [OVERVIEW - OVERHEAD, similar to fig. 19.1, p. 390]:

Aquatic vertebrates that are characterized by:

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Adaptive Topographies, Sewall Wright s Shifting- Balance Theory and the Evolution of Horses.

Animal Diversity. Kingdom Animalia

I. Gas Exchange Respiratory Surfaces Respiratory Surface:

Classification. Phylum Chordata

Teleosts: Evolutionary Development, Diversity And Behavioral Ecology (Fish, Fishing And Fisheries) READ ONLINE

SUPPLEMENTARY INFORMATION

Outline 15: Paleozoic Life

Outline 15: Paleozoic Life. The Evolution of Vertebrates: Fish and Amphibians

Oklahoma Ag in the Classroom Coats and Genes Genetic Traits in Cattle

Origin and Importance! ! Fish were the first vertebrates to appear on Earth about 500 million years ago.

Basic Mendelian Genetics & Color Genetics Basic Definitions Mendel demonstrated with corn that genes could be predictably combined.

Kingdom Animalia part 2.notebook. April 08, The fun continues... Kingdom Animalia

Chapter 10. Part 1: Cartilaginous Fishes

Valongo C1, Almeida LS1, Ramos A1, Salomons GS2, Jakobs C2, Vilarinho L1

Supplemental Fig. 1 - Alignment of two MBP gene products expressed in each of four teleost fish

Typical KRT25 and SP6 Crosses

BCB 444/544 Fall 07 Dobbs 1

Colour Genetics. Page 1 of 6. TinyBear Pomeranians CKC Registered Copyright All rights reserved.

Biology 11 - Chapter 31 Assignment

Is a seahorse a fish, amphibian, or reptile? FISH

Avneet Kaur Nov 14 th 2012 Th, 2:10pm-5pm A. F Lab 4: Protein Functionality: Solubility and Foam Formation I. PURPOSE: The purpose of this experiment

The Physiology of Taste in Fish: Potential Implications for Feeding Stimulation and Gut Chemical Sensing

Lecture 2 Phylogenetics of Fishes. 1. Phylogenetic systematics. 2. General fish evolution. 3. Molecular systematics & Genetic approaches

Model Answer M.Sc. (III Semester) Zoology, Paper : LZT-304A (Fish Anatomy and Physiology) SECTION-A (Multiple choice questions)

Chapter 12 Marine Fishes

Identification of the dwarfing gene Dw1

Beef Cattle Genomics: Promises from the Past, Looking to the Future

Understanding Genetics

UNIVERSITY OF SOUTH ALABAMA. GY 112: Earth History. Lectures 28 and 29: Vertebrates. Instructor: Dr. Douglas W. Haywick

AP Biology - Zimmerman Guided Reading Chapter 34

Body Plan of the Chordates. Notochord, dorsal hollow nerve cord, pharyngeal gill slits, blocks of muscle, post-anal tail

Outline. Evolution: Human Evolution. Primates reflect a treedwelling. Key Concepts:

Can Identities of key amino acids in the ligand binding domain of the AhR be used to predict the sensitivity of endangered sturgeons to dioxins?

Davisco Whey Protein Processing

Biology. Slide 1 of 53. End Show. Copyright Pearson Prentice Hall

Structural changes in response to increased environmental salinity and calcium on ultimobranchial gland of teleost fish Tilapia (O.

MOLECULAR PHYLOGENETIC RELATIOSHIPS IN ROMANIAN CYPRINIDS BASED ON cox1 AND cox2 SEQUENCES

Curriculum vitae Kristen E. Frenzel, Ph.D.

Chapter 25: Fishes 1

Fig. 3.1 shows the distribution of roe deer in the UK in 1972 and It also shows the location of the sites that were studied in 2007.

Introduction to be read or described to the participants:

Phylum Chordata Featuring Vertebrate Animals

BIOLOGY 11 CHORDATES

Chordates 1. Biology 2

For this assignment, use the Chapter about Fish that is found on me website, NOT YOUR BOOK.

What is it? Affinities and systematic position of Dipnoi DBS 402B.1 Presidency University, 2015

Genetics Discussion for the American Black Hereford Association. David Greg Riley Texas A&M University

NOTES: Ch 34 - Mammals & Primate / Human Evolution ( )

UC San Diego UC San Diego Previously Published Works

Advanced Animal Science TEKS/LINKS Student Objectives One Credit

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN PRODUCTION NOTE. University of Illinois at Urbana-Champaign Library Large-scale Digitization Project, 2007.

NOTES: The Muscular System (Ch 6, part 1)

Primate Evolution. Section 1. Primates

Skin metabolism of steroid hormones as endogenous compounds?

Fishes are vertebrates that have characteristics allowing them to live and reproduce in water.

Neurobiological analysis of Gonadotropin-releasing (GnRH) systems in the brain of the bichir, Polypterus senegalus

KNG1 (Human) ELISA Kit

An Overview of Animal Diversity

Phylum Chordata. Chief characteristics (some are embryonic):

Supplementary Material

Arthropods, Echinoderms, and Chordates

Class Osteichthyes. Bony Fish

Hominid Skull Comparisons

Circulation and Gas Exchange Chapter 42

DOWNLOAD OR READ : COMPARING LIMB STRUCTURE AND FUNCTION ANSWERS PDF EBOOK EPUB MOBI

World Journal of Pharmaceutical and Life Sciences WJPLS

When did bison arrive in North America?

Transcription:

doi:10.1038/nature15259 For the first time we describe the repertoire of P/Q rich SCPP genes in the genome of spotted gar (Lepisosteus oculatus). All of these genes in gar are located in a cluster on chromosome LG4 and can be divided into three main groups: enamel matrix protein genes, represented by ambn and enam; maturation stage genes, represented by orthologues of tetrapod AMTN, ODAM and SCPPPQ1 and also odamlike; and actinopterygian specific SCPP genes: scpp5, scpp7, scpp5/7, scpp9, scpp10 and scpp3a-c. We have identified 14 P/Q rich SCPP genes in the spotted gar (Lepisosteus oculatus) genome (Fig. 2). All of the genes have the same orientation on the gar chromosome and all have exclusively phase 0 introns similar to other vertebrate SCPP genes 1. We have predicted signal peptides and analyzed exon structure and protein sequence motifs for all these sequences (Extended Data Fig. 1). We have analyzed the number of paired-end RNASEQ reads in brain, embryo, eye, heart, kidney, larvae, liver, muscle, skin and testis tissue samples represented in the dataset. Ensembl and RNASEQ transcript numbers are listed in the Extended Data Table 1. In the gar genome P/Q rich SCPP genes are located in the cluster on chromosome LG4 (Fig. 2) downstream from the sparcl1 gene and upstream from the sparcl2 gene, which shares 62% amino acid identity with the coelacanth Sparcl2 or SparclB gene orthologue 1,2. In the coelacanth genome Sparcl2 and the enamel matrix protein genes are located on separate scaffolds (Fig. 2). The single cluster in gar probably indicates that these two scaffolds in coelacanth belong to the same chromosomal segment. The last six exons of the gar sparcl1 and sparcl2 share around 56% amino acid identity. The presence of Sparcl2 gene orthologues in coelacanth and gar genomes indicates that this gene was already present prior to the divergence of actinopterygians (ray-finned bony fishes) and sarcopterygians (lobe-finned fishes) around 420 MYA 3. Enamel matrix protein genes There are three proteins constituting enamel matrix in tetrapods, amelogenin (AMEL), ameloblastin (AMBN) and enamelin (ENAM). Each of these proteins is essential for the proper enamel structure in mammals and inactivation of any one of them results in disruption of enamel formation 4-6. A recent study identified Amel, Ambn and Enam orthologues in the genome of the sarcopterygian fish coelacanth 1, whereas none of the enamel matrix proteins (EMPs) were identified in teleosts 7. This WWW.NATURE.COM/NATURE 1

suggested that EMP genes are a sarcopterygian novelty, not present at the osteichthyan crown group node (the last common ancestor of actinopterygians and sarcopterygians). Amel has a conserved genomic location in coelacanth and tetrapods in the first intron of Arhgap6 gene 1. In the genome of spotted gar we did not identify a signal peptide, RNASEQ model or GENSCAN 8 prediction similar to AMEL in this region (Fig.2). We have identified a gene similar to ambn in the genome of spotted gar (Fig.2). 12 exons of this gene are coding for a 501 aa long protein. For comparison, coelacanth Ambn has 12 exons coding for 536 aa and human AMBN has 13 exons coding for 447 aa. Similar to sarcopterygians, gar Ambn has a signal peptide in exon 2, followed by the putative phosphorylation motif Ser-Xaa-Ser-Xaa-Glu 9 in exon 3 (Extended Data Fig. 1). Human, coelacanth and gar Ambn share the exon 5 and 6 bridging motif Gln-Gln-Tyr-Glu-Tyr, another potential phosphorylation motif Ser- Ser-Glu-Glu in the penultimate exon and a long last exon with shared motifs between coelacanth and gar (Extended Data Fig. 1). Altogether we identify this gene in gar as an orthologue of Ambn, demonstrating that this EMP was already present before the divergence of actinopterygians and sarcopterygians. Interestingly, when we compared the exon organization and motifs of the zebrafish scpp6 gene with the coelacanth and gar Ambn sequences, we found some similarities in exon organization in the N- and C-terminal and some partially shared motifs (Extended Data Fig. 1). scpp6 was identified only in zebrafish, but its expression pattern is still unknown 7. We speculate that Ambn diverged into scpp6 in the lineage leading to zebrafish, but functional analysis of this gene is necessary before any firm conclusion can be drawn. Enamelin represents the longest protein among EMPs. In tetrapods the N terminal of ENAM contains a signal peptide, followed by a potential phosphorylation site, whereas the C-terminal is represented by a very long exon with a number of Cys residues, N-glycosilation sites and the potential integrin binding motif Arg-Gly-Asp (RGD) 1. We have identified a very long RNASEQ model in the skin of spotted gar coding for 805 aa located directly upstream from the sparcl2 gene (Fig. 2). This sequence has a signal peptide nearly identical to the signal peptide of coelacanth ENAM (Extended Data Fig. 1). In contrast to other ENAM sequences, the first predicted phosphorylation site is missing, however the second site in the penultimate exon is conserved. The intron between the last two exons was considered a coding WWW.NATURE.COM/NATURE 2

sequence due to noncanonical splice site. The last exon in the C-terminal part is the longest. Although it is shorter than in other species, we were able to identify Cys residues, N-glycosylation site and RGD motif (Extended Data Fig. 1). The middle part of the gar protein is coded by very short P/Q rich exons, with several exons having almost identical sequences (Extended Data Fig. 1). Based on the structure analysis we identify this sequence as a probable orthologue of tetrapod ENAM, although it remains to be understood what function the gar Enam plays in the formation of the enamel matrix and if it is similar to that in tetrapods. Enamel maturation stage genes Odontogenic ameloblast-associated protein (ODAM) is expressed during the maturation stage 10. The Odam gene is conserved in sarcopterygians, including coelacanth 1. It is also present in teleosts as the only orthologue of tetrapod P/Q rich SCPP genes identified so far 7. As suspected, odam is also present in the gar genome downstream from sparcl1 (Fig. 2). According to synteny analysis, the location of odam in proximity to sparcl1 is conserved not only in gar, but also in a number of other species, like coelacanth, fugu, stickleback and medaka (data not shown), whereas in zebrafish sparcl1 has been translocated (Fig. 2). It was previously reported that ODAM contains a Ser-Xaa-Glu motif in the 3 part of the coding exon 2 that is phosphorylated by the bone morphogenetic protein receptor type IB (BMPR-IB) during enamel mineralization 11. This motif is conserved in vertebrate ODAM sequences, including the gar, indicating that this signaling cascade is probably conserved between actinopterygians and sarcopterygians. Unexpectedly, directly upstream from the odam in the gar genome we have identified an additional gene sharing the exon structure and 35% amino acid identity with gar odam, named odam-like (Fig. 2, Extended Data Fig. 1). The similarity of this gene to vertebrate Odam genes may indicate similar roles during the maturation stage. In tetrapods, there is an additional gene, amelotin (AMTN), expressed during the maturation stage of ameloblasts and involved in the mineralization of the matured enamel 12. This gene is present in tetrapods including coelacanth 1, but absent in teleosts 13. We have identified a gene similar to amtn in the genome of the spotted gar. Synteny, organization of exons, shared motifs and conserved residues (Extended Data Fig. 1) indicate that this gene is the orthologue of AMTN. Important Ser residues that can be phosphorylated and the Arg-Gly-Asp (RGD) motif that can bind cell surface WWW.NATURE.COM/NATURE 3

integrins 14 are conserved between lizard, coelacanth and gar AMTN sequences (Extended Data Fig. 1 and 1 ). We have identified two RGD motifs in the last exon of the gar Amtn, and one of the motifs is bidirectional DGRGD. Since the RGD motif is not present in the human AMTN sequence, due to the proposed novel splice site 1, we looked for additional conserved motifs and identified a Pro-Xaa-Gly-Ile-Leu-Pro motif conserved between human, coelacanth and gar in the C terminal of AMTN (Extended Data Fig. 1). SCPPPQ1 (secretory calcium-binding phosphoprotein-proline-glutamine-rich 1) is recently identified as an additional member of P/Q rich SCPP gene family cluster, which is functional during the maturation stage of amelogenesis 15. A recent study reported the presence of SCPPPQ1 in the basal lamina of the junctional epithelium during the maturation stage of amelogenesis 16. SCPPPQ1 was identified in a number of mammalian species and also in the anole lizard (Anolis carolinensis) 16,17. Recent analysis of the SCPP genes in coelacanth did not identify the orthologue of SCPPPQ1 and it was proposed that SCPPPQ1 probably arose in the amniote lineage 1. Remarkably, we have identified scpppq1 in the genome of gar directly downstream from sparcl1. This location of SCPPPQ1 is conserved between mammals and gar. The presence in gar indicates that this gene arose already before the divergence of ray-finned fishes and tetrapods. Therefore we have searched for Scpppq1 in the coelacanth genome between Odam and Sparcl1 genes and identified a GENSCAN prediction of the coelacanth Scpppq1 (Fig. 2 and Extended Data Table 1). The lizard SCPPPQ1 shares 42% amino acid identity with coelacanth SCPPPQ1, whereas the latter shares 37% amino acid identity with gar Scpppq1. Gar Scpppq1 shares 34-35 % identity with amniote SCPPPQ1 protein sequences. Vertebrate SCPPPQ1 proteins display similarities in exon organization and sequence motifs (Extended Data Fig. 1). C terminal sequences in mammals are rich in Pro, whereas in lizard C terminal sequences are also rich in Phe, Gly, and Tyr. In coelacanth and gar C terminal sequences contain mostly Gly and Tyr residues (Extended Data Fig. 1). It is however yet to be understood if this shift of typical residues in the C terminal of SCPPPQ1 has some functional importance for the development of enamel. Altogether there are four gene candidates for the maturation stage of amelogenesis in the genome of the spotted gar (Fig. 2). All of these genes could be distinguished from each other and from other P/Q rich SCPP genes already before the WWW.NATURE.COM/NATURE 4

divergence of actinopterygians and sarcopterygians. Three of these genes, ODAM, AMTN and SCPPPQ1 are still present in mammalian genomes. This possibly indicates an evolutionary conserved and advanced enamel mineralization program shared between gar, coelacanth and mammals. Teleosts, however, seem to have a diverged mineralization program with odam as the only clear orthologue present in both teleosts and tetrapods. Expression analysis in zebrafish and fugu has revealed additional players during the maturation stage, zebrafish scpp9 7 and fugu scpp4 18. scpp9 is present also in the gar genome, whereas the orthologue of fugu scpp4 could not be identified. Actinopterygian specific P/Q rich SCPP genes A number of SCPP genes in teleosts with no obviously shared orthologues in tetrapods have been considered teleost lineage specific 7,18 (Fig. 2). Most of these teleost specific SCPPs have orthologues in teleost species, including zebrafish and fugu, like scpp3a, scpp3b, scpp3c, scpp5 and scpp7. Others like zebrafish scpp6 are not reported in other teleost species; scpp4 is identified only in fugu and tetraodon; scpp9 is reported in zebrafish, medaka and stickleback 7. Our analysis revealed P/Q rich SCPP genes in the genome of spotted gar similar to teleost SCPP genes (Fig. 2). We have identified 3 genes similar to teleost scpp3a-c genes downstream from odam. The first shows highest amino acid identity to fugu scpp3b (ENSTRUT00000025817) and zebrafish scpp3b (RNASEQT00000009201), 44% and 39%, respectively. The next has highest identity of 34% to zebrafish scpp3c (RNASEQT00000009200) and the third is most similar to fugu scpp3a (ENSTRUT00000008490) with 32% amino acid identity (Fig. 2). The chromosomal synteny between gar and teleost scpp3 genes is conserved, with the genes located next to each other downstream from odam gene. Exon number and organization with exclusively 0 phase introns is also conserved between teleosts and gar (data not shown). In the same cluster of P/Q rich SCPP genes in gar between ambn and enam, we have identified 3 genes similar to teleost scpp5 and scpp7. The gene located upstream from enam is predicted by GENSCAN, has 8 exons with 0 phase introns, and is coding for protein with 33% identity to zebrafish scpp5. Interestingly, the amino acid sequences of signal peptide and penultimate exon are very similar to the signal peptide and part of the C terminal of the tetrapod AMEL (Extended Data Table WWW.NATURE.COM/NATURE 5

1). The next gene with 6 exons predicted by GENSCAN displays 33% amino acid identity to the zebrafish SCPP7. In contrast, the third gene, represented by RNASEQ model with 8 exons shows around 30% amino acid identity to both zebrafish Scpp5 and Scpp7 and was named scpp5/7. Downstream from scpppq1 we have identified another P/Q rich SCPP gene with no previously reported orthologues, named scpp10. Next to this gene we have identified the orthologue of teleost scpp9, sharing 35% amino acid identity with zebrafish scpp9. Altogether we have identified 7 orthologous genes of teleost scpp3, scpp5, scpp7 and scpp9 in gar shifting the divergence of these genes before the split of teleost and gar lineages. References 1. Kawasaki, K. & Amemiya, C.T. SCPP genes in the coelacanth: tissue mineralization genes shared by sarcopterygians. J. Exp. Zool. B Mol. Dev. Evol. 322, 390-402 (2014). 2. Bertrand, S., et al. A dynamic history of gene duplications and losses characterizes the evolution of the SPARC family in eumetazoans. Proc. Biol. Sci. 280, 20122963 (2013). 3. Benton, M.J. & Donoghue, P.C. Paleontological evidence to date the tree of life. Mol. Biol. Evol. 24, 26-53 (2007). 4. Gibson, C.W., et al. Amelogenin-deficient mice display an amelogenesis imperfecta phenotype. J. Biol. Chem. 276, 31871-31875 (2001). 5. Fukumoto, S., et al. Ameloblastin is a cell adhesion molecule required for maintaining the differentiation state of ameloblasts. J. Cell Biol. 167, 973-983 (2004). 6. Hu, J.C., et al. Enamel defects and ameloblast-specific expression in Enam knock-out/lacz knock-in mice. J. Biol. Chem. 283, 10858-10871 (2008). 7. Kawasaki, K. The SCPP gene repertoire in bony vertebrates and graded differences in mineralized tissues. Dev. Genes Evol. 219, 147-157 (2009). 8. Burge, C., & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94 (1997). 9. Fukae, M., et al. Primary structure of the porcine 89-kDa enamelin. Adv. Dent. Res. 10, 111-118 (1996). 10. Park, J.C., et al. The amyloid protein APin is highly expressed during enamel mineralization and maturation in rat incisors. Eur. J. Oral Sci. 115, 153-160 (2007). 11. Lee, H.K., et al. Odontogenic ameloblasts-associated protein (ODAM), via phosphorylation by bone morphogenetic protein receptor type IB (BMPR-IB), is implicated in ameloblast differentiation. J. Cell. Biochem. 113, 1754-1765 (2012). WWW.NATURE.COM/NATURE 6

12. Nakayama, Y., Holcroft, J. & Ganss, B. Enamel Hypomineralization and Structural Defects in Amelotin-deficient Mice. J. Dent. Res. 94, 697-705 (2015). 13. Kawasaki, K. The SCPP gene family and the complexity of hard tissues in vertebrates. Cells, tissues, organs 194, 108-112 (2011). 14. Fisher, L.W. & Fedarko, N.S. Six genes expressed in bones and teeth encode the current members of the SIBLING family of proteins. Connect. Tissue Res. 44 Suppl 1, 33-40 (2003). 15. Moffatt, P., Smith, C.E., Sooknanan, R., St-Arnaud, R. & Nanci, A. Identification of secreted and membrane proteins in the rat incisor enamel organ using a signal-trap screening approach. Eur. J. Oral Sci. 114 Suppl 1, 139-146; discussion 164-135, 380-131 (2006). 16. Moffatt, P., Wazen, R.M., Dos Santos Neves, J. & Nanci, A. Characterisation of secretory calcium-binding phosphoprotein-proline-glutamine-rich 1: a novel basal lamina component expressed at cell-tooth interfaces. Cell Tissue Res. 358, 843-55 (2014). 17. Kawasaki, K., Lafont, A.G. & Sire, J.Y. The evolution of milk casein genes from tooth genes before the origin of mammals. Mol. Biol. Evol. 28, 2053-2061 (2011). 18. Kawasaki, K., Suzuki, T. & Weiss, K.M. Phenogenetic drift in evolution: the changing genetic basis of vertebrate teeth. Proc. Natl. Acad. Sci. U. S. A. 102, 18063-18068 (2005). WWW.NATURE.COM/NATURE 7