Introduction

Fishes are an extremely diverse group of vertebrate aquatic animals usually breathing through gills throughout life and having fins and scales. They include jawless fishes (hagfishes, lampreys), cartilaginous fishes (sharks, rays) and bony fishes (coelacanth, lungfishes and ray-finned fishes) (Nelson, 1994). With more than 23 500 species, ray-finned fishes (actinopterygians) represent more than 95% of all living fish species, and roughly make up half of the extant vertebrate species. More than 99.8% of ray-finned fishes belong to the teleosts. Bichirs and sturgeons are examples of nonteleost ray-finned fishes (Figure 1). According to traditional views, fishes do not form a unique monophyletic group distinct from tetrapods (mammals, reptiles, birds and amphibians) (Nelson, 1994; but see also Rasmussen and Arnason, 1999). Rather, bony fishes are thought to be more related to tetrapods than to jawless and cartilaginous fishes (Figure 1). This means that humans and other land vertebrates probably all share a 360–450 million years old fish ancestor.

Figure 1
figure 1

The fish lineage. Origin of fish pictures: Manfred Schartl and Christoph Winkler (medaka, zebrafish, platyfish and Tetraodon), Erwin Schraml (African cichlid), John F Scarola (rainbow trout and Atlantic salmon), Suzanne L and Joseph T Collins (channel catfish), Bernd Ueberschaer (Nile tilapia), Konrad Schmidt (three spined stickleback) and Greg Elgar (Torafugu).

Fishes show a remarkable level of diversity affecting their morphology, ecology, behavior and genomes as well as multiple other facets of their biology (Nelson, 1994). This makes them extremely attractive for the study of many evolutionary questions related to diverse aspects of biology. Fish biodiversity is important to humans at the economical, ecological and cultural points of view, and its maintenance is an important challenge for the next generations. New insights from several fish species and sequencing projects have shed new light on the organization and evolution of fish genomes and now allow one to approach the evolutionary mechanisms possibly underlying biodiversity in the fish lineage.

Principal teleost fish models

Several teleost fish species are particularly studied at the genetic and genomic levels; some of them have been, are or will certainly be subjects of whole-genome sequencing projects (Figure 1 and Table 1). For species like the Atlantic salmon Salmo salar, the rainbow trout Oncorhynchus mykiss, the Nile tilapia Oreochromis niloticus and the channel catfish Ictalurus punctatus, genetics and genomics programmes have at least partially an economical motivation. Such programmes essentially aim to identify at the molecular level qualitative and quantitative trait loci (QTLs), controlling among others growth, reproduction, pigmentation, environmental tolerance or resistance to disease, which are highly relevant traits for aquaculture.

Table 1 Teleost fish species with genetic and genomic resources

Other fish species are widely used as ‘piscine mice’ in developmental biology. This is particularly true for the zebrafish Danio rerio and the Japanese medaka Oryzias latipes, two well-established complementary models for the study of different aspects of vertebrate organogenesis (Grunwald and Eisen, 2002; Wittbrodt et al, 2002; Furutani-Seiki and Wittbrodt, 2004). In both species, gene function can be studied routinely in vivo in the laboratory particularly through transgenesis and ‘morpholino’ antisense oligonucleotide ‘knockdown’ technology (the classical knockout technology as used in the mouse is not available at the moment in fish). Large-scale mutagenesis programmes have been performed in the zebrafish and more recently in the medaka (Driever et al, 1996; Haffter et al, 1996; Furutani-Seiki et al, 2004). Some mutants were obtained in one species and not in the other, demonstrating the complementarity of the models.

Another small aquarium fish, the platyfish Xiphophorus maculatus, is a traditional model for the study of the development of pigment cells and melanoma (Schartl, 1995; Meierjohann et al, 2004), but its use for in vivo analysis is restricted by the fact that this fish is viviparous (livebearer). The pufferfishes Takifugu rubripes (Torafugu) and Tetraodon nigroviridis (spotted green pufferfish) are pure genomics models. They are studied because of the compactness of their genome, characterized by small intronic and intergenic regions (Brenner et al, 1993; Roest Crollius et al, 2000), but at the moment these animals cannot be crossed routinely in the laboratory and are not usable in vivo for functional analysis of gene function. Finally, cichlids from the great lakes of East Africa, sticklebacks and guppies (Poecilia reticulata) are important models to study the molecular basis of speciation and the evolutionary mechanisms shaping development and behaviour (Peichel et al, 2001; Brooks, 2002; Verheyen et al, 2003; Kocher, 2004; Shapiro et al, 2004).

The different domains of research described here for each fish species do not exclude other types of investigations. For example, the rainbow trout is not only studied with regard to its economical interest but is also a model for more fundamental research in carcinogenesis, toxicology, comparative immunology, physiology and others (Thorgaard et al, 2002). Several topics are investigated in parallel in different species. Sex determination and sex differentiation are studied in numerous fishes including medaka, zebrafish, salmonids, platyfish, sticklebacks, tilapia and others (Baroiller and D'Cotta, 2001; Devlin and Nagahama, 2002; Volff and Schartl, 2002). Such comparative analyses are of the highest importance to understand the mechanisms driving differential evolution in fish sublineages.

Fish genome projects

The genome of the pufferfish Takifugu rubripes was, after the human genome, the second vertebrate genome to be sequenced (Aparicio et al, 2002). Its sequencing through whole-genome shotgun (WGS) strategy allowed the first genome-wide comparison between two vertebrate species. Pufferfish and mammals have approximately the same number of genes, but the Torafugu genome is 8–9 times smaller than the human genome. This is principally due to the fact that nonexonic regions (intronic and intergenic sequences) are generally – but not always – much shorter in the pufferfish than in humans, because of a relative paucity of repetitive sequences. The third assembly of the Torafugu genome is available and consists of approximately 8000 genomic scaffolds covering approximately 95% of the nonrepetitive fraction of the genome. Genome compaction is also observed in the green spotted pufferfish T. nigroviridis, the genome of which has been almost completely sequenced too (Jaillon et al, 2004). Sequence data have been assembled in approximately 50 000 contigs covering 312 Mbp of the 385 Mbp genome. These contigs have been further linked particularly through fluorescent in situ hybridization of genomic clones on Tetraodon chromosomes (Jaillon et al, 2004). The genome sequence of both smooth pufferfishes is useful for the identification of coding and regulatory sequences in humans and other vertebrates through sequence comparison, since sequences with functions should be more conserved than ‘useless’ sequences. The sequence of T. nigroviridis has been used to predict the number of genes in the human genome (Roest Crollius et al, 2000). Importantly, hundreds of putative novel human genes have been discovered by comparing the pufferfish and human genome sequences (Aparicio et al, 2002; Jaillon et al, 2004). In addition, analysis of the T. nigroviridis genome revealed the basic structure of the ancestral bony vertebrate genome, which was composed of 12 chromosomes, and allowed to reconstruct many of the chromosome rearrangements which led to the modern human karyotype (Jaillon et al, 2004). Finally, Takifugu/Tetraodon comparisons might provide new information about differences between relatively related species, in a manner similar to the comparison between rat and mouse.

The sequencing of the genome of both zebrafish and medaka is close to completion. Sequence reads and preliminary contigs are publicly available, but the final analyses have not been published to date. The sequencing of the zebrafish genome has been initiated in 2001 following two strategies: clone mapping and sequencing from genomic clone libraries and WGS sequencing with subsequent assembly (Table 1). The fourth WGS assembly Zv4 has been released in July 2004. This assembly consists of approximately 21 000 contigs covering 1500 Mbp of the zebrafish genome. The sequencing of the genome of the medaka Oryzias latipes, mainly based on a whole shotgun sequence strategy, has been started in 2002. A nine-fold coverage of the genome has been already obtained in May 2004, and a first WGS assembly has been released in July 2004 (about 116 000 sequences covering 840 Mbp; see Naruse et al (2004a) for additional information and useful www resources). The availability and comparison of the zebrafish and medaka genome drafts will allow linking mutant phenotypes to gene functions and shedding a new ‘evo-devo’ light on the fish and vertebrate lineages.

Finally, for other fishes (Table 1), important expressed sequence tags (ESTs) resources are already available. There is no doubt that some of these species will be subjected to genome sequencing in the near future due to their economical and/or academical relevance, and several proposals way have already been submitted to funding agencies. On the subgenomic level, a project dealing with the physical mapping and sequencing of the sex chromosomes of the platyfish Xiphophorus maculatus has been started (Froschauer et al, 2002).

Fish-specific gene and genome duplications

Hox gene clusters and a plethora of other genes have been duplicated in the teleost fish lineage after its divergence from tetrapods (Amores et al, 1998; Wittbrodt et al, 1998; Meyer and Schartl, 1999; Robinson-Rechavi et al, 2001a; Loh et al, 2004; Postlethwait et al, 2004). For some gene pairs (for example Xmrk/egfrb in Xiphophorus; Volff and Schartl, 2003), the duplication events are rather recent and clearly affected only a restricted chromosomal region in a particular fish sublineage. However, in multiple other cases, the paralogous (duplicated) sequences are much more ancient (Taylor et al, 2001a). Such duplicates have been mapped on different chromosomes within larger duplicated regions (paralogons) in several divergent teleost fish species (Postlethwait et al, 2000; Woods et al, 2000; Morizot et al, 2001; Taylor et al, 2003; Winkler et al, 2003b; Naruse et al, 2004b). Phylogenomic analysis confirmed the presence in Torafugu, Tetraodon and zebrafish of hundreds of duplicates being co-orthologous to single-copy tetrapod genes (Taylor et al, 2003; Christoffels et al, 2004; Jaillon et al, 2004; Vandepoele et al, 2004). Sequence divergence analysis between paralogs suggests that these genes have been duplicated within the same time window 300–450 million years ago (Taylor et al, 2001a) after divergence of sturgeons from the lineage that led to teleosts (Hoegg et al, 2004). Taken together, these observations have suggested that the ray-finned fish lineage has experienced an event of tetraploidization early during its evolution after its divergence from tetrapods (Figure 1). The large paralogous segments observed in different fish genomes might correspond to remnants of this whole-genome duplication that have been maintained after rediploidization. Two round(s) of tetraploidization/rediploidization (the 1-2-4 rule or 2R hypothesis) might also have occurred earlier during the evolution of the vertebrate lineage before the split between tetrapods and ray-finned fishes (Meyer and Schartl (1999) and references therein), and much more recent independent events of polyploidization have been detected in different fish sublineages including the salmonids (Figure 1; Venkatesh (2003) and references therein). However, the existence of rounds of tetraploidization/rediploidization in the early evolution of vertebrates, and subsequently in ray-finned fishes (the 1-2-4-8 hypothesis, Meyer and Schartl, 1999), is difficult to demonstrate unambiguously because (i) the duplication events are ancient, (ii) the genome has been rediploidized, (iii) in most cases, one of the duplicates has been lost during evolution and (iv) differential evolutionary rates within a pair of paralogs frequently obscure their phylogenetic relationship with orthologs from other species. Therefore, the involvement of successive tetraploidization events in vertebrate evolution is still a matter of intense debate, and an alternative hypothesis involves more regional events of DNA duplication (Robinson-Rechavi et al, 2001b; Seoighe, 2003). Strikingly, some gene families like hox or egfr (epidermal growth factor receptor) almost perfectly recapitulate the presumed duplication history in fish and other vertebrates. While only one egfr gene is present in nonvertebrate animals like the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans, mammals have four egfr-related genes, hence supporting the 1-2-4 hypothesis. Teleost fishes have seven egfr-related genes, an observation consistent with an additional event of genome duplication followed by the loss of one gene. The much more recent duplication of egfrb having generated the Xmrk oncogene demonstrates that fish genomes continue to duplicate their genes by more regional events (Volff and Schartl, 2003; Gómez et al, 2004). The same mathematics holds true for the hox clusters (Amores et al, 2004).

At least 500 fish-specific ancient pairs of paralogs are present in the genome of Torafugu and Tetraodon (Christoffels et al, 2004; Jaillon et al, 2004; Vandepoele et al, 2004). Even if this value might be an underestimation, this indicates that most genes are not present as pairs of duplicates but rather as single-copy genes in fish. Therefore, if a whole-genome duplication has taken place during the evolution of the ray-finned lineage, one copy has been subsequently lost within most pairs of paralogs. This is consistent with the observation that, for the vast majority of gene duplicates, one copy evolves as a pseudogene through degenerative mutations (nonfunctionalization) and/or is eliminated because of its dispensability (Lynch and Conery (2000) and references therein).

Nevertheless, independently of the mechanisms by which they have been generated, the evolutionary scenario behind the persistence of hundreds of functional paralog pairs over hundreds of millions of years of evolution in fish is an extremely interesting question, since the duplication of genetic information is thought to be a major seed of evolution (Ohno, 1970). Rarely, one duplicate might have acquired by chance a mutation conferring a new positively selected beneficial function (neofunctionalization), the other copy fulfilling alone the original function. Alternatively, persistence of gene duplicates could be due to subfunctionalization, that is, through partitioning of ancestral functions between duplicates after complementary degenerative mutations in different regulatory or structural sequences (the duplication-degeneration-complementation model; Force et al, 1999; Lynch and Force, 2000a). In this case, both copies are preserved, since the presence of both is necessary to perform the original function of the ancestral single-copy gene. In teleosts, the evolution of numerous gene duplicates is consistent with the subfunctionalization model (eg Lister et al, 2001; Serluca et al, 2001; Altschmied et al, 2002; McClintock et al, 2002; Cresko et al, 2003; Yu et al, 2003; Amores et al, 2004). For example, mammals and birds have a unique microphthalmia-associated transcription factor gene mitf, from which different isoforms are expressed through the use of different promoter sequences and alternative exons. In contrast, fish have two different mitf genes present in species as divergent as zebrafish, pufferfish and platyfish. Interestingly, the two mitf genes in fish each encode one of the different isoforms that are generated from the single gene in ‘higher’ vertebrates. Hence, the two mitf genes are required together to perform the functions of the unique mitf gene in mammals and birds. This partitioning of functions is associated with the degeneration of isoform-specific exons and regulatory sequences (Lister et al, 2001; Altschmied et al, 2002).

It is quite difficult to assess with certainty which mechanisms have been at the origin of the maintenance of paralogs in fish. For example, function partitioning might not be primarily responsible for the persistence of the duplicates but might have arisen subsequently during evolution. In addition, the evolution of several paralog pairs does not completely fit any of the three major simple evolutionary models (non-, neo- and subfunctionalization) since they present divergent functions in teleost and tetrapods (see for instance Winkler et al, 2003b). More information is therefore required about ancestral gene function in the fish lineage before the proposed genome duplication for a better understanding of the evolutionary mechanism behind the preservation of hundreds of gene duplicates in teleost fish genomes.

Gen(om)e duplication and speciation in fish

Genome duplication/rediploidization leads to a massive duplication of genetic information. This might allow the occurrence of evolutionary novelties necessary for major transitions in evolution and might favour the formation of new species (Ohno, 1970). In addition, divergent resolution of gene duplicates might play an important role in genomic incompatibility between species leading to reduced fertility and/or viability of interspecific hybrids (Werth and Windham, 1991; Lynch and Conery, 2000; Lynch and Force, 2000b; Taylor et al, 2001b; Postlethwait et al, 2004). Imagine the presence, after one round of genome duplication/rediploidization, of two paralogs of the gene G (Ga and Gb), with redundant functions and located on different chromosomes (Figure 2). After geographic isolation between two populations and divergent resolution, Ga might be nonfunctionalized through deleterious mutations (pseudogene psGa) or simply lost in one population, and Gb might do the same in the second population (pseudogene psGb). F1 hybrids between both populations would be (Ga/psGa; Gb/psGb). If G is essential for gamete function, 25% of the gametes produced by F1 hybrids will be nonfunctional (no functional copy of the G gene) (Lynch and Conery, 2000; Figure 2). If not, crossing between F1 individuals will generate about 6% (1/16) of F2 individuals without any functional G gene. In addition, haploinsufficiency (when only one functional allele of G is not sufficient to support its normal function) might occur in 25% of the progeny (Figure 2). If divergent resolution occurs for multiple different pairs of duplicates generated by whole-genome duplication, this will result in the passive build-up of reproductive postmating isolation without affecting intraspecific fitness (Lynch and Conery, 2000).

Figure 2
figure 2

Divergent resolution of gene duplicates. Black boxes show functional genes, white boxes pseudogenes (ps).

A role of divergent resolution in speciation does not obligatorily involve gene silencing or deletion and can be well conciliated with the subfunctionalization model (Lynch and Conery, 2000). For instance, if the ancestral single-copy gene G has two important functions (functions 1 and 2), each involving a specific regulatory sequence, reciprocal divergent partitioning might occur after duplication and geographic isolation (Figure 3). In one population, Ga will perform function 1 and Gb function 2, while in the second population Ga will be responsible for function 2 and Gb for function 1. Half of the gametes of the F1 progeny will have genes only for one function. This means that 25% of the gametes will be nonfunctional if one of these functions is important for the gametes, and 50% if both are involved. If not, 12.5% (2/16) of individuals in the F2 progeny will completely lack one function, and as much as 50% might show haploinsufficiency in either function 1 or function 2 (Figure 3). Here again, divergent partitioning for different gene duplicates would result in reproductive isolation and speciation.

Figure 3
figure 3

Divergent subfunction partitioning between gene duplicates.

Is there any evidence for divergent resolution in the fish lineage? Several instances of inactivation or loss of a gene duplicate in one species but not in the other have been described in fish, for example, in duplicated hox gene clusters (Amores et al, 2004). Demonstrating reciprocal differential loss of duplicates in two divergent fish lineages might be more difficult. If Ga is lost in species 1 and Gb in species 2, Gb of species 1 and Ga of species 2 may just look like true orthologs, and additional phylogenetic analyses and mapping experiments will be necessary to demonstrate their paralogy. Putative examples of reciprocal loss have been already detected through comparative analysis of the genome of medaka and zebrafish (Naruse et al, 2004b). Two related helix-loop-helix transcription factor genes called hey1 and hey2 are present in the genome of fish and mammals. The hey1 gene has been duplicated during the early evolution of the ray-fin fish lineage (Winkler et al, 2003a). Both hey1a and hey1b have been maintained in the pufferfishes T. nigroviridis and T. rubripes, but hey1b has been apparently lost in the zebrafish. On the other hand, hey2 could not be detected in T. nigroviridis, and is extremely divergent in T. rubripes compared to the well-conserved hey2 gene of the zebrafish (Winkler et al, 2003a). This situation might correspond to a form of reciprocal divergent resolution, with hey1b possibly compensating for the loss or extreme divergence of hey2 in pufferfishes.

Cases of divergent partitioning of subfunctions have also been described in teleost fish. In the rare examples like mitfa/mitfb or sox9a/sox9b, for which gene duplicate expression and function have been examined in divergent fish species, the major partitioning of ancestral gene functions appears to be ancient (Lister et al, 2001; Altschmied et al, 2002; Cresko et al, 2003). This is manifested by paralog-specific subfunctions conserved in divergent fishes. However, species-specific differences in expression indicative of lineage-specific partitioning have also been found for sox9a and sox9b between zebrafish and stickleback, suggesting that this phenomenon might indeed be involved in teleost diversification (Cresko et al, 2003). Clearly, traditional and functional comparative genomics between pufferfishes, zebrafish, medaka and others will shed new light on the role of divergent resolution and partitioning in teleost radiation and subsequent events of speciation.

Gene duplication might also be directly involved in fish diversification by creating ‘speciation genes’ reducing hybrid fitness. One possible example for that is the Xmrk gene (Xiphophorus melanoma receptor tyrosine kinase), corresponding to the dominant Tu (Tumour) locus inducing the formation of melanoma in certain interspecific hybrids of the genus Xiphophorus (Gordon, 1927; Kosswig, 1928; Anders and Anders, 1978; Schartl, 1995; Meierjohann et al, 2004). After crossing between the Xmrk-containing platyfish Xiphophorus maculatus and the Xmrk-free swordtail X. helleri, the F1 hybrid progeny develop noninvasive, superficially spreading nonmalignant pigment cell lesions. Backcrossing of the F1 with the swordtail parent results in the formation of highly invasive melanoma in 25% of the progeny. This corresponds clearly to a reduction of fitness in hybrids, since fishes with melanoma will generally die more or less rapidly depending on the allele of Tu they have inherited. In contrast, the platyfish parent only very rarely develops Xmrk-mediated melanoma. Xmrk has been recently formed by duplication of the epidermal growth factor receptor gene co-ortholog egfrb. Subsequently, mutations in the promoter region have drastically modified its pattern of expression, and mutations conferring ligand-independent constitutive activation have arisen in its extracellular domain (Meierjohann et al (2004) and references therein). According to the classical genetic model, the oncogenic potential of Xmrk is repressed by an unlinked tumour suppressor locus called R (Anders and Anders, 1978; Schartl, 1995; Meierjohann et al, 2004). The allele of R present in the swordtail is unable to repress Xmrk (or R is simply absent from the swordtail), and the progressive elimination of the platyfish R allele in hybrids through crossing leads to the derepression of Xmrk and to the formation of melanoma. This situation might be consistent with the Dobzhansky–Muller model of hybrid incompatibility (Dobzhansky, 1970; Orr and Presgraves, 2000; Wu and Ting, 2004). In this model, two populations are derived from an ancestral population with an AA/BB genotype (A and B are two different genes). A evolves into a in one population (aa/BB genotype) and B into b in the other population (AA/bb). When separated in their own populations, the a and b alleles do not alter fitness. In contrast, when brought together, incompatibility can occur, resulting in a reduction of fitness in Aa/Bb hybrids through partial or full sterility or nonviability. In the Xiphophorus model, a (Xmrk) would have been created by duplication of A (egfrb); B and b would correspond to the different alleles of R in the platyfish and the swordtail, respectively. Importantly, the occurrence and significance of hybridization between Xiphophorus species under natural conditions remain to be demonstrated.

Diversity of transposable elements (TEs) in fish

TEs are sequences able to integrate into new sites within genomes. They are classified into two major classes according to their structure and mechanism of transposition (Curcio and Derbyshire, 2003). Sequences requiring for transposition an RNA intermediate reverse-transcribed into complementary DNA (retrotransposition) are called retroelements. They include reverse transcriptase retrotransposons (LTR or non-LTR retrotransposons depending on the presence of flanking long terminal repeats), retroviruses (reverse transcriptase LTR elements with an envelope gene) and various categories of nonautonomous retroelements like the short interspersed nuclear elements (SINEs). Other elements transposing without reverse transcription are called DNA transposable elements.

Mobile elements can disrupt genes. In addition, ectopic homologous recombination between nonallelic copies of TEs can lead to the formation of deletions, duplications, inversions and translocations, and transposition itself can induce various types of rearrangements at the target site (for a review, see Kazazian, 2004). TEs can also be recruited as exons disrupting an open reading frame, or modify the level and specificity of expression of neighbouring resident genes. The contribution of TEs to mutant phenotypes and genetic diseases can vary considerably between different organisms (Kazazian, 1999).

There is no doubt that transposable elements are drivers of genome evolution (Brosius, 2003; Deininger et al, 2003; Kazazian, 2004). They have been involved in chromosome rearrangements during the evolution of a wide variety of organisms, and retrotransposition has generated at least half of the human and mouse genomes. Particularly, retrotransposition has generated intronless copies of cellular genes (retrogenes), some of them, for example, forming a family of Y-chromosomal genes expressed exclusively in the testis and implicated in male fertility in human (Lahn et al, 2002). TE-derived sequences have been frequently recruited during evolution as regulatory and coding sequences for the host genes (Nekrutenko and Li, 2001; Jordan et al, 2003; Van de Lagemaat et al, 2003). Finally, some TEs, like the telomeric retrotransposons of Drosophila, are apparently directly beneficial to their host, and some mobile sequences have even been domesticated to fulfil new cellular functions (Pardue and DeBaryshe, 2003; Brandt et al, in press).

Almost all types of eukaryotic TEs have been described in teleost fish genomes (Aparicio et al, 2002). Some of these elements are capable of natural insertional mutagenesis (Izsvak et al, 1996; Koga et al, 1996). In order to understand the evolution of TEs in the vertebrate lineage, comparisons between teleost fish and mammalian genomes have been performed particularly for reverse transcriptase retroelements (Volff et al, 2003a). Interestingly, numerous retrotransposons present in vertebrates but absent from mammalian genomes have been identified in the genome of different teleost fish species. As many as nine clades (ancient phylogenetic groups of TEs, the origin of which can be traced back prior to vertebrates) of Ty3/Gypsy-like LTR retrotransposons are found in fish (Poulter and Butler, 1998; Volff et al, 2001b, 2003a), while none of them (with the exception of some domesticated sequences, Brandt et al, in press) are present in the genome of mouse and human. Other major groups of reverse transcriptase retrotransposons present in fish but with no functional equivalent in mammals include Ty1/Copia LTR retrotransposons (Volff et al, 2003a), tyrosine recombinase-encoding retrotransposons (Goodwin and Poulter, 2001, 2004), BEL-like LTR retrotransposons (Frame et al, 2001), Uri endonuclease-encoding Penelope-like elements (Lyozin et al, 2001; Volff et al, 2001a) and non-LTR retrotransposons with restriction enzyme-like endonuclease (Volff et al, 2001c; Bouneau et al, 2003). Even for non-LTR retrotransposons with apurinic-apyrimidinic endonuclease, which are extremely well represented in mouse and human genomes (Deininger et al, 2003; Kazazian, 2004), more clades are found in fish genomes (five) than in mammals (three) (Poulter et al, 1999; Volff et al, 1999, 2003a). Taken together, as many as 16–23 clades of reverse transcriptase retroelements have been detected in different fish species, while only six clades are present in mouse and human genomes. A similar situation is also observed for some major families of DNA transposable elements (for example, see Poulter et al, 2003). Hence, the diversity of TEs is much higher in teleost fish than in mammalian genomes, and this phenomenon is also observed inside a particular clade of TE (Furano et al, 2004). Strikingly, even the compact genomes of smooth pufferfishes display a higher diversity of TEs than mammalian genomes, despite their low content of repetitive sequence. Evidence for frequent and recent activity has been provided for numerous families of fish TEs, but their copy number is generally much lower in zebrafish and pufferfish than in mammals. Hence, mobile sequences apparently undergo a higher turnover in teleost fish genomes.

The genomic organization of TEs has been extensively analysed, particularly by fluorescent in situ hybridization, in the compact genome of the pufferfish T. nigroviridis (Dasilva et al, 2002; Bouneau et al, 2003; Fischer et al, 2004). Almost all categories of TEs generally colocalize with other types of repeats (duplicated pseudogenes, minisatellites) in specific heterochromatic regions of the genome. These observations showed that TEs and other repeated elements are generally excluded from gene-rich regions in T. nigroviridis, this underlining the extreme degree of compartmentalization of this compact genome. Hence, the global organization of the genome of the pufferfish is clearly different from that observed in humans, where repeated sequences make up an important fraction of euchromatic DNA, and is more similar to that observed in the fruit fly D. melanogaster (Volff et al, 2003a).

Transposable elements and speciation

Transposable elements might be able to contribute to pre- and postmating reproductive isolation, and therefore might be involved in the formation of new species (Hurst and Schilthuizen, 1998; Hurst and Werren, 2001). TEs are generally active in germ cells, where they might induce insertional mutations and other kinds of rearrangements that could lead to speciation. Fixation of different rearrangements like translocations and inversions in different populations might result in reproductive isolation. In addition, interspecific crossing might activate transposition in hybrids, this leading to hybrid sterility or inviability. Several cases of hybrid dysgenesis involving transposable elements have been described in Drosophila. This phenomenon is observed in the progeny of crosses between strains containing multiple functional copies of a particular TE and strains devoid of this active element. For example, the I retrotransposon of D. melanogaster is repressed and does not transpose in I strains containing functional I elements, but retrotransposes at very high frequency in the germ line of hybrid females that are produced after crossing R females (devoid of active I elements) with I males (Bucheton et al, 1992). This results in an increased rate of insertions and DNA rearrangements in the germ line of the hybrid females. One important consequence is female sterility, manifested by the nonhatching of most eggs due to early blocking of embryonic development. Hybrid dysgenesis in Drosophila can also be mediated by other TEs (Kidwell et al, 1977; Lozovskaya et al, 1990). Derepression of transposition through interspecific hybridization might occur in divergent animal lineages (O'Neill et al, 1998; Labrador et al, 1999).

At the moment, there is no information about a role of TEs in speciation in fish, or about their possible activation in fish hybrids. However, phylogenetic analysis of several retrotransposons from various fish species has revealed the presence of multiple waves of retrotransposition, which might have been associated with speciation events (Volff et al, 2001d). Clearly, the multiple active lineages of TEs present in fish genomes might predispose to rapid speciation. Further studies will be necessary to establish the activity and genomic impact of fish TEs in germ cells and hybrids in order to approach their role in reproductive isolation and species formation.

Sex chromosome evolution and the diversity of sex determination in fish

Some particular parts of fish genomes are apparently evolving extremely rapidly. This is particularly true for the sex chromosomes, and this phenomenon might be related to the amazing variety of sex determination systems observed in teleosts (for reviews, Baroiller and D'Cotta, 2001; Devlin and Nagahama, 2002; Volff and Schartl, 2002). All different forms of genetic sex determination have been observed in fish, including both male heterogamety (males are XY and females are XX) and female heterogamety (males are ZZ and females are ZW), autosomal influences and polygenic sex determination. Sex chromosomes can display very variable degrees of molecular differentiation. Sex determination can also be influenced or determined by environmental factors including the temperature and the pH value of the water, or the fish density. Numerous fish species are hermaphrodites, either simultaneous (male and female at the same time), protandrous (first male and then female) or protogynous (first female and then male). In a same fish, different types of sex determination systems can coexist (eg genetic sex determination and influence of temperature in the Nile Tilapia Oreochromis niloticus). Different systems of genetic sex determination can be found in the same fish genus (eg Oreochromis spp.) and even in the same species (eg Xiphophorus maculatus).

The molecular and evolutionary mechanisms driving sex determination and its variability in teleost fish are poorly understood. The Sry gene, inducing the male phenotype in human, mouse and other vertebrates, is clearly absent from fish genomes. Importantly, recent studies on the medaka Oryzias latipes, a fish with a XX/XY sex determination system, have revealed how rapidly novel master sex-determining genes and sex chromosomes can evolve in teleosts. Using a positional cloning strategy (Matsuda et al, 2002) and a candidate gene approach (Nanda et al, 2002), two different groups have independently identified in this small fish the first master sex-determining gene of a nonmammalian vertebrate. This gene is dmrt1bY (aka DMY), a Y chromosome-specific duplicate of an autosomal gene called dmrt1. Dmrt1 is a putative transcription factor apparently ubiquitously involved in sex determination/differentiation in vertebrates, and is member of a family of proteins containing a conserved DNA-binding motif called the DM domain (Volff et al, 2003d and cited references). Some DM domain proteins are involved in the induction of sexual dimorphism in divergent invertebrates including flies and nematodes (Zarkower, 2002). DM domain genes other than dmrt1 have been identified in fish and mammals, and several of them might be involved in gonad development in the mouse (Brunner et al, 2001; Kondo et al, 2002; Kim et al, 2003; Winkler et al, 2004).

Medaka males have two types of dmrt1 genes, the autosomal dmrt1 and the Y-specific dmrt1bY. Dmrt1bY has probably been formed by a large transchromosomal duplication from linkage group 9 onto another autosome, which became the neo-Y-chromosome by this way (Nanda et al, 2002; Volff and Schartl, 2002; Schartl, 2004). Other genes were included in this duplication, but, in contrast to dmrt1bY, they all subsequently degenerated (Nanda et al, 2002). Hence, dmrt1bY is apparently the only functional gene in the Y-specific part of the sex chromosomes of the medaka, strongly suggesting that it indeed corresponds to the master sex determining gene. Its expression pattern is also consistent with a role in sex determination: dmrt1bY is expressed only in male embryos, and expression occurs prior to the morphological differentiation of gonads. In adults, transcripts are found exclusively in the Sertoli cells of the testis. Finally, natural mutations in dmrt1bY result in XY sex-reversed females (Matsuda et al, 2002). However, the existence of spontaneous sex-reversed XX males in the medaka indicates that a full male phenotype can also occasionally develop in the absence of dmrt1bY (Nanda et al, 2003).

The high degree of sequence identity between the autosomal dmrt1 gene and the Y-specific dmrt1bY suggests a recent origin for the master sex-determining gene of the medaka (Nanda et al, 2002). This was confirmed by evolutionary analyses, and dmrt1bY was detected only in a very restricted number of Oryzias species (Kondo et al, 2003; Matsuda et al, 2003; Veith et al, 2003). Hence, this gene is not the universal master sex-determining gene in teleost fish (Volff et al, 2003b), and the gene(s) driving sexual dimorphism remain(s) to be discovered for the vast majority of teleost fish species.

No sex-linked markers and no sex chromosomes have been identified so far in the zebrafish and smooth pufferfishes, explaining why alternative models like salmonids, platyfish, tilapia and sticklebacks are necessary to analyse sex determination and sex chromosome evolution in fish. There is no doubt that the master sex-determining gene of these fishes will be identified by positional cloning. This will shed new light on the molecular mechanisms driving the evolution of sex determination and sex chromosomes in fish. In salmonids (male heterogamety), comparative mapping of sex-linked microsatellite markers has already shown that Arctic charr, brown trout, Atlantic salmon and rainbow trout have evolved different sex chromosomes (Woram et al, 2003). Sex-linked markers have been found in the Nile tilapia Oreochromis niloticus (XX/XY) (Lee et al, 2003) and the blue tilapia O. aureus (ZW/ZZ) (Lee et al, 2004), and the putative sex chromosomes have been identified by synaptonemal complex analysis (for a review, Griffin et al, 2002). In the threespine stickleback, sequencing of X- and Y-specific bacterial artificial chromosome clones from the sex determination region revealed many sequence differences between X and neo-Y chromosomes (Peichel et al, 2004). In the platyfish Xiphophorus maculatus, a species with three sex chromosomes (X, Y and W), megabase-sized bacterial artificial chromosome contigs covering the sex-determining region of the X and Y chromosomes have been constructed and partially sequenced (Volff and Schartl, 2001; Froschauer et al, 2002, unpublished). As observed for the Y chromosome-specific region of both medaka and threespine stickleback (Nanda et al, 2002; Peichel et al, 2004), the sex determination region of the platyfish displays a high level of genomic instability characterized by frequent transpositions, duplications and deletions (Volff et al, 2003c). Genes present in this region frequently undergo mutations and rearrangements. This phenomenon is associated with a high genetic variability of traits like pigmentation, melanoma phenotype or puberty, which are controlled by gene loci closely linked to the master sex-determining gene in the platyfish (Volff and Schartl, 2001). Whether the genomic plasticity of sex-determining regions is directly implicated in the variability of sex determination systems is still an open question.

Sex chromosome evolution and speciation

Speciation is intimately associated with the evolution of sex- and reproduction-related traits (eg mating behaviour, fertilization, spermatogenesis, sex determination), which are frequently controlled by gene loci located on the sex chromosomes. The modification of visual mating cues is probably involved in the establishment of premating barriers between closely related species (prezygotic isolation; Coyne and Orr, 1998). Particularly, colour pattern is a central feature of fish behaviour and evolution, which can serve as mate recognition signals and evolve by sexual selection. African cichlid radiations have been strongly influenced by sexual selection, which principally resulted in the diversification of male colour patterns (Danley and Kocher (2001) and references therein). Numerous examples of sex chromosomal pigmentation loci have been described in different fish species. This is also the case for species of the genus Xiphophorus (Volff and Schartl, 2001). In the guppy Poecilia reticulata, many of the gene loci controlling the polymorphic male colour patterns involved in mate choice are located in or near the nonrecombining sex-determining region of the Y chromosome (Brooks (2002) and cited references; Lindholm et al, 2004).

Sex-determining regions are apparently very unstable in some fishes, and this phenomenon might account for the high polymorphism generally affecting traits controlled by sex chromosomal loci (Volff and Schartl, 2001). Hence, the rapid divergence of gene loci involved in mate choice within a sex-determining region might speed up prezygotic isolation between two populations. This might affect not only genes involved in pigmentation but also genes playing a role, for example, in sexual maturity, since differences in the time of reproduction might also induce prezygotic isolation (Coyne and Orr (1998) and references therein). Interestingly, a highly polymorphic locus controlling the onset of sexual maturity is closely linked to or located inside the unstable sex-determining region in Xiphophorus (Volff and Schartl (2001) and references therein).

In addition, creation of a neo-sex chromosome might disrupt the linkage between the master sex-determining gene and genes involved in mate choice. This might occur, for example, through transposition of the master sex-determining gene from a sex chromosome onto an autosome, as suggested in salmonids (Woram et al, 2003). Another possibility is the creation of a novel master sex-determining gene on an autosome, as observed in the medaka (Nanda et al, 2002). If the mate choice gene, for instance, a Y-linked pigmentation pattern gene, is not controlled directly or indirectly by the master sex-determining itself, the pattern will not be sex-specific anymore, or may even disappear from males. This might initiate the isolation between the population with the ancestral Y chromosome from the population with the neo-Y chromosome. Speciation models based on sexual selection on sex-determining genes associated with colour polymorphisms and incorporating the lability of sex determination in fish have been proposed for the African cichlids (Lande et al, 2001). Divergence of sex determination systems might also lead to hybrid progeny with a reduced fitness due to sex ratio distortion (Volff and Schartl, 2001), and selection for nonbiased sex ratios has been proposed to be involved in sympatric speciation in cichlids (Seehausen et al, 1999).

Differences in sex chromosomes might be a potential source of postzygotic isolation too. For example, if two species have developed different heteromorphic pairs of sex chromosomes, abnormalities in meiosis pairing might occur in hybrids, with hybrid sterility as a possible consequence. In addition, divergent resolution between X- and Y-chromosomal alleles of genes with male-specific functions (for example, involved in fertility) in different populations might lead to male sterility (or inviability, if one of the functions is essential for the survival of the males) (Lynch and Force, 2000b; Figure 4). If genes A and B are located both on the proto-sex chromosomes of an ancestral population, A and B might became Y- and X-specific, respectively, in one population, while they would be X- and Y-specific, respectively, in the other population. All males in the F1 progeny will consequently be sterile, since they completely lack either A or B (Figure 4). This might correspond to an early stage of reproductive isolation. If this phenomenon only occurs with gene A, male sterility will be observed only in one cross, but not in the reciprocal one (Lynch and Force, 2000b). This model is consistent with Haldane's rule, stating that when the F1 hybrid offspring of a cross between a male parent from one line and a female parent from the other line is sterile although otherwise healthy, it will tend to be of the heterogametic sex (Haldane, 1922). Divergent gene loss would be favoured by the high frequency of rearrangement-mediated nonfunctionalization affecting (at least some) sex-determining regions in fish (Volff et al, 2003c). The same model can apply if A and B correspond to different subfunctions of the same gene, which are then divergently resolved in the two different populations.

Figure 4
figure 4

Divergent resolution of Y- and X-chromosomal alleles of genes with male-specific functions. Black boxes show functional genes, white boxes pseudogenes (ps).

Conclusions

Teleost fish provide an outstanding model to study a multitude of questions related to evolution. This may be linked to the apparent considerable plasticity of their genome, manifested, for example, by a high variability in genome size and chromosome number (Venkatesh, 2003). Fish genomes also have intrinsic characteristics, which might have been involved in the formation of the amazing diversity of species observed in the teleost lineage. Particularly, there is now substantial evidence that an ancient event of genome duplication (tetraploidization) has provided the evolutionary framework for the diversification of gene functions and for speciation in fish. Completion and comparison of the sequence of the genome of different fish species, as well as subsequent functional genomics approaches will allow the understanding of why hundreds of paralogs have been maintained over hundreds of millions of years of evolution in fish genomes. Such analyses will also provide new information concerning the evolution and evolutionary impact of transposable elements in fish genomes. Clearly, the multiple families of active transposable elements present in teleosts potentially represent powerful evolutionary factors, which might also have played an important role in speciation. Finally, the frequent switching between different sex determination systems and the rapid evolution of sex chromosomes might also be linked to the formation of new species. Many more comparative studies will be necessary to understand why sex determination is so variable in fish in contrast to the situation observed in birds and mammals.

Studies on fish will probably help to better understand the evolution of our own genome and characterize the functions of its genes. Hundreds of new genes and regulatory sequences have already been identified through sequence comparison between pufferfish and human genomes. There is also no doubt that the knowledge gained from analyses in zebrafish and medaka will shed a new light on organogenesis in vertebrates, even if there is already evidence for important differences between human and fish. Subfunctionalized gene paralogs are highly relevant for such experiments, since they will allow to analyse separately gene functions performed by a unique gene in human (Volff and Schartl, 2003; Postlethwait et al, 2004). Finally, analysis of sex determination in fish might allow the discovery of alternative strategies to compensate for the announced destruction of the Y chromosome in humans (Marshall Graves, 2002; Volff et al, 2003b). Lessons from current fish models have shown that one species is rarely representative of the complete teleost fish lineage. Multiple comparative analyses will be necessary to understand the evolution of this very diverse group of animals.