The first human members of the MAGE gene family that have been described are expressed in tumor cells but silent in normal adult tissues except in the male germ line. Hence, they encode strictly tumor-specific antigens that represent attractive targets for cancer immunotherapy. However, other members of the family were recently found to be expressed in normal cells, indicating that the family is larger and more disparate than initially expected. We therefore performed a database screening to identify all of the recorded members of both classes of human MAGE genes. This report provides an overview of the MAGE family and proposes a general nomenclature for all of the MAGE genes identified thus far. We found that the MAGE-D genes were particularly well conserved between man and mouse, suggesting that they exert important functions. In addition, the genomic structure of the MAGE-D genes indicates that one of them corresponds to the founder member of the family, and that all of the other MAGE genes are retrogenes derived from that common ancestral gene. Intriguingly, the COOH-terminal domain of MAGE-D3 was found to be identical to trophinin, a previously described protein believed to be involved in embryo implantation.
The first member of the human MAGE family was identified as a gene encoding a tumor-specific antigen (1). This gene was later found to belong to a cluster of 12 hMAGE-A genes located in the q28 region of the X chromosome (2, 3). A sequencing effort directed at the human Xp21 region led to the discovery of a second cluster that was named hMAGE-B (4, 5, 6) and, more recently, a hMAGE-C cluster was identified in Xq26–27 (7, 8). The genes belonging to the hMAGE-A, -B, and -C subfamilies are characterized by a large terminal exon encoding the entire protein. They are completely silent in normal tissues, with the exception of male germ cells, and, for some of them, placenta. Some are expressed in tumor cells of various histological types, where they code for antigens recognized by cytolytic T lymphocytes (9). Because of their specific expression on tumor cells, these antigens are of particular interest for antitumor immunotherapy. Preliminary results of clinical trials suggest that tumor regression can be induced in a significant number of cancer patients by immunization with an antigen encoded by gene hMAGE-A3 (10, 11, 12, 13).
Two groups of mMage genes have been identified thus far in the mouse (14, 15, 16). Like their human counterparts, they are silent in normal adult tissues with the exception of male germ cells (17, 18), and some of them are expressed in tumor cells (14, 15). These murine genes were named mMage-a and -b because the sequences and isoelectric points of the corresponding proteins were closest to those of the human MAGE-A and -B proteins, respectively. However, the overall sequence identity between MAGE-A and -B orthologues is weak, and MAGE-C genes have not been identified in the mouse, implying that the members of these three subfamilies are poorly conserved during evolution.
More recently, we (19) and others (20) have reported the identification of two sequences that define a fourth subfamily of human MAGE genes, hMAGE-D. These genes differ from the previously described members of the family by their expression pattern: they are expressed in all normal tissues tested. They also differ by their genomic structure, the open reading frame of hMAGE-D2 being split over 11 exons. Importantly, MAGE-D1 was recently found to interact with the p75 neurotrophin receptor and to facilitate nerve growth factor-dependent apoptosis (21). MAGE-D1 was also recently reported to interact with members of the Dlx/Msx homeodomain family and to regulate the transcriptional function of Dlx5 (22). These observations suggest that the members of the MAGE-D subfamily exert important functions and prompted us to systematically screen the public databases to identify all of the recorded members of the human MAGE family. In this paper, we report the results of this screening as well as the first detailed analysis of the murine Mage-d genes.
MATERIALS AND METHODS
With the aim of identifying new human and murine genes belonging to the MAGE family, we performed tblastn searches in the DNA databases (nr, ests, htgs, and the working draft sequence of the human genome) available from the National Center for Biotechnology Information,4 using the MAGE domain of protein MAGE-D2 as the query. We ran the gapped tblastn program (version 2.1.2), using the BLOSUM-62 substitution matrix and the default values for the gap costs (11 and 1; Ref. 23). All hits with expected values (E) <0.001 were recorded for additional analysis. Nucleotide sequences were aligned to constitute clusters corresponding to individual genes. These alignments were manually evaluated and a consensus sequence was derived for each of them. These sequences were then compared with the sequences of the known members of the MAGE family.
Multiple Sequence Alignment.
Protein sequences of the MAGE conserved domains were aligned using CLUSTAL W neighbor-joining (24). Calculations were performed with the MacVector 6.5 package (Oxford Molecular Group, Oxford, United Kingdom) using the BLOSUM-30 substitution matrix and the default values for the gap costs (10 and 0.05).
Screening of a PAC Mouse Library.
The large-insert RPCI21 PAC library of genomic DNA from 129/SvEvTacfBr female mice was obtained from Peter de Jong (Roswell Park Cancer Institute, Buffalo, NY). The high density filters were screened by hybridization with mMage-d1 and mMage-d2 cDNA probes. Six clones containing mMage-d1, seven clones containing mMage-d2, and four clones containing mMage-d3 were identified.
Determination of the Genomic Structure of the mMage-d Genes.
The location and size of the introns of mMage-d1, -d2, and -d3 were determined by a combination of sequencing and PCR experiments performed on plasmid clones containing restriction fragments of these genes.
The expression of the newly identified MAGE genes in normal and tumoral tissues was evaluated by RT-PCR5 using standard procedures. The primers were: 5′-AAAGAGCAACTGTGCCATTGG and 5′-ACTTTCATCTTACTGGTTTCAAG for hMAGE-B7; 5′-CAAGAGCAGAGATGCAGATGA and 5′-GAGCACACACCCCTATTGCAT for hMAGE-C4; and 5′-CCAAGGACACTCCCAGGCTGA and 5′-CATGTTCCTCGGCCATATCCA for hMAGE-D4. Sequences of the primers used to specifically amplify the other newly identified MAGE genes are available on request. A mouse poly(A)+ RNA dot blot (Clontech, Palo Alto, CA) was hybridized with 32P-labeled probes specific for mMage-d1 (probe 1), mMage-d2 (probe 2), or mMage-d3 (probes 3a and 3c). The probes were PCR fragments obtained from cDNA with the following primers: probe 1, sense primer 5′-TGACTGGACTGCACAGTTC and antisense primer 5′-GCATGCCACTCTCAGTCAACAGG; probe 2, sense primer 5′-AGGATCCCAAGGAATGGGCAG and antisense primer 5′-TCACTTGTAGGAGAAACCACAG; probe 3a, sense primer 5′-GACCACAAATACTGACAATG and antisense primer 5′- GGAAGAAGGGTAACAATA; and probe 3c, sense primer 5′-ACTGCCTAACAAGGGAAGAG and antisense primer 5′-CCCAGTTCTATTG TTGGCTT. Radioactive signals were quantified by a phosphorimager analysis. A Northern blot of total mouse brain RNA was hybridized with probe 3b, which is a PCR fragment corresponding to the Mage-conserved domain of mMage-d3 obtained with sense primer 5′-GTTGGTGAAATACCTGTTGG and antisense primer 5′-CGAGACTAGCAAGATGAAAGTC.
The 5′ ends of mMage-d1, -d2, and -d3 cDNAs were amplified by PCR from mouse brain mRNA using the Marathon cDNA Amplification Kit (Clontech, Palo Alto, CA). The 5′ ends of trophinin transcripts present in human endometrium were amplified using the 5′ RACE system from Life Technologies, Inc. Total RNA was extracted with Tripure (Roche Molecular Biochemicals) from human endometrium dissected from a surgical sample obtained at the early secretory phase. Poly(A)+ RNA was purified with the mRNA Isolation kit from Roche Molecular Biochemicals. cDNA synthesis was primed with the antisense primer 5′-TACAAGGCATGCCACCAAAGC, and two successive rounds of PCR amplification were performed using antisense primers 5′-AAATCTGCTCCAGGCCTGAG and 5′-AACTCTTCCCTTGTTAGGC, respectively. Then the amplified products were cloned into pCR2.1 (Invitrogen, Carlsbad, CA). The clones containing sequences of mMage-d3 exon 11 were identified by hybridization with oligonucleotide 5′-GACTTTCATCTTGCTAGTCTCG. The clones which did not hybridize were sequenced to determine their 5′ end.
Human MAGE Genes
We performed tblastn homology searches (23) with the hMAGE-D2 protein sequence (19) to identify all of the human MAGE sequences recorded in the databases of the National Center for Biotechnology Information. Most of the new MAGE sequences that were retrieved could be grouped in contigs. The new genes were classified on the basis of their sequence homologies and their chromosomal location (Fig. 1). Some were found to belong to previously published hMAGE subfamilies (hMAGE-A13 to A15; hMAGE-B7 to B17; hMAGE-C4 to C7 and hMAGE-D3 and -D4), whereas others defined new subfamilies that were named hMAGE-E to -K (Table 1).
A RT-PCR analysis of the expression of all these genes was performed on panels of normal and tumoral tissue samples (Fig. 2 and data not shown). Some were found to be silent in all tissues tested, including testis. These genes are probably pseudogenes comparable with the previously described hMAGE-A7 sequence (2) and are indicated by a “P” in the expression column of Table 1. Genes hMAGE-B10, hMAGE-B16, hMAGE-B17, and hMAGE-C4 displayed the expression pattern characteristic of the MAGE-A, -B, and -C genes, i.e., silent in all normal tissues with the exception of testis. In addition, hMAGE-C4 was found to be expressed in a minor proportion of tumoral samples, suggesting that it could potentially encode tumor-specific antigens. By contrast, all hMAGE-D, hMAGE-E, hMAGE-F, hMAGE-G, and hMAGE-H genes were found to be expressed at various levels in many normal tissues, a “ubiquitous” expression pattern resembling that reported for hMAGE-D1 and -D2 (19, 20).
By comparing the sequences of the putative proteins encoded by these genes, we identified a stretch of ∼200 amino acids which we named the “MAGE conserved domain.” This domain corresponds to the only region of homology shared by all of the members of the family (Fig. 3,A). The rest of the protein sequences, and in particular the NH2-terminal domains, are completely different from one subfamily to the other (Fig. 4). The MAGE conserved domain is usually located close to the COOH termini of the proteins except in the hMAGE-D proteins, where it occupies a more central position. Intriguingly, the MAGE conserved domain is duplicated in hMAGE-E1 and hMAGE-E2 (Fig. 4).
Murine Mage Genes.
By database screening, we also retrieved a large number of murine Mage genes, most of which could be identified as orthologues of human MAGE sequences (Table 1). As shown in Fig. 5, the conservation of the MAGE domains between human and mouse differs considerably from one protein to the other. However, these domains are remarkably similar in the two species for the MAGE-D proteins (up to 99% amino acid identity between human and mouse for MAGE-D2; Fig. 3,B), and the three MAGE-D domains found in mouse are closer to their respective human orthologues than they are to each other (Fig. 5). These observations suggested that the MAGE-D proteins exert important and distinct functions and prompted us to analyze this subfamily in more detail.
The open reading frames of mMage-d1, -d2, and -d3 were amplified by RT-PCR from adult mouse brain total RNA and used as probes to isolate the corresponding genomic clones from a mouse PAC library. In addition, we also screened the same library with a human MAGE-D4 probe, but we only retrieved clones containing mMage-d1 and -d3 sequences, strongly suggesting that the mouse genome does not contain a hMAGE-D4 orthologue. The murine Mage-d genes all displayed a genomic structure that closely resembles that observed for the human MAGE-D2 gene (19). Each gene contains 13 exons, the main open reading frame covering exons 2 to 12, and most of the MAGE conserved domain being encoded by exons 5 to 11 (Fig. 6,A). Importantly, the existence of multiple exons encoding the MAGE protein seems to be a feature unique to the MAGE-D subfamily. Indeed, all of the other human and mouse genes that we identified yielded PCR products of identical sizes when amplified from either cDNA or genomic DNA (Fig. 2 and data not shown), suggesting that the sequence encoding their MAGE conserved domain was entirely comprised in a single exon, as previously observed for the MAGE-A, -B, and -C genes.
The homology between the three mMage-d proteins was found to be restricted to a region encompassing the MAGE domain and the 40 amino acids immediately downstream of it (Fig. 6,B). The NH2-terminal regions of the proteins appeared to be very different, and the COOH-terminal domain encoded by exon 12 was much longer in mMage-d3 than in the other mMage-d proteins. Data bank searches did not reveal any significant homology between known proteins and the NH2-terminal regions of mMage-d1, d2, or d3. By contrast, the long COOH-terminal domain of mMage-d3 was found to contain the entire sequence of trophinin, a previously described protein consisting essentially of decapeptide repeats (25, 26). This is in agreement with the recent identification of a human brain cDNA clone (27) that similarly contained MAGE-D3 and trophinin coding sequences in frame. Interestingly, when the trophinin repeats were omitted, the COOH-terminal parts of the mouse and human MAGE-D3 proteins could be aligned to the corresponding regions of MAGE-D1 and -D2 (Fig. 7).
Expression Pattern of the mMage-d Genes.
To determine the relative expression of the three murine Mage-d genes in different tissues and at different developmental stages, we performed a semiquantitative mRNA dot blot analysis using cDNA probes specific for each gene. Transcripts corresponding to mMage-d1 and -d2 could be detected in all of the tissues analyzed with this method, although at different levels. For instance, mMage-d1 appeared to be predominantly expressed in the adult brain (Fig. 8,A), whereas the expression of mMage-d2 was found to increase steadily during embryonic development, reaching a maximum just before birth (Fig. 8,A). By contrast, hybridization with a mMage-d3-specific probe gave significant signals essentially in the adult brain and in embryos (Fig. 8,A), although all tissues tested scored positive by RT-PCR (not shown). Importantly, a trophinin probe gave similar results (Fig. 8,A), and a Northern blot analysis of brain tissue detected a single band of about 7 kb that contained both mMage-d3 and trophinin sequences (Fig. 8 B). We therefore conclude that the mMage-d3 and trophinin exons are expressed as a single transcriptional unit in the mouse brain. In addition, transcripts containing MAGE-D3 and trophinin sequences also could be detected in various tissues by RT-PCR (not shown).
Trophinin was originally described in humans as a homophilic adhesion molecule specifically expressed in trophoblastic and endometrial cells and potentially involved in embryo implantation (25). To determine whether the trophinin transcripts present in human endometrium also contained hMAGE-D3 exons, we performed a RACE experiment on mRNA prepared from early secretory phase human endometrium. Most of the clones that were obtained (136 of 153) carried both exon 11 and 12 sequences, implying that the hMAGE-D3 and trophinin exons were predominantly expressed as a single transcriptional unit in human endometrium. However, preliminary data also suggested the existence of two discrete transcription start sites, one at the splicing site of exon 12 and one in intron 11 (Fig. 9). The possibility therefore remains that the trophinin gene can be transcribed independently of MAGE-D3 in endometrial cells.
The human MAGE, GAGE, and BAGE genes were originally described as completely silent in normal adult tissues, with the exception of testis and, for some of them, placenta (9). By contrast, these genes were expressed at a high frequency in various kind of tumors. Therefore, the corresponding proteins represent attractive targets for cancer immunotherapy, because it can be assumed that immunized patients should not be tolerant to such tumor-specific antigens. However, some members of the MAGE family are expressed in normal tissues. This is the case for Necdin, the first member of the family to be identified (28, 29). More recently, we (19) and others (20, 30), have identified new members of the MAGE family that are ubiquitously expressed. In this paper, we report the existence of eight additional members of the human MAGE family whose expression is not restricted to tumors and male germ cells. None of the antigenic peptides encoded by the human MAGE-A and -B genes (9, 31) could be identified in any of the MAGE proteins that show expression in normal somatic tissues. We therefore conclude that immunized patients should not be tolerant to any of these antigens, and that the immune response triggered by the immunization should not have any autoimmune consequences on healthy tissues.
The MAGE-D genes contain 13 exons, 11 of which encode the protein. By contrast, all of the other MAGE genes share a less complex genomic structure, almost invariably characterized by a large terminal exon carrying the complete coding sequence. This suggests that one or several MAGE-D ancestor genes have generated the first member of other MAGE subfamilies by retroposition, a process that frequently occurs in mammalian genomes (32, 33). In addition, gene duplication has obviously contributed to the emergence of the multigenic MAGE subfamilies that we observe today. Some of these duplications have occurred recently, indicating an unusually rapid evolution. This is the case for the murine Mage-a genes, which are much closer to each other (up to 99% nucleotide identities in their coding sequences) than they are to their human orthologues. By contrast, the duplication events that produced MAGE-D1, -D2, and -D3 must be much older. Indeed, the NH2- and COOH-terminal regions that flank the MAGE conserved domain are completely different for each MAGE-D paralogue but are highly conserved between human and mouse orthologues. This clearly indicates that the MAGE-D genes have evolved independently for a long time before the phylogenic separation of the two species. Interestingly, repeat insertion appears to have played a major role during the evolution of the family. For instance, the long COOH-terminal domain of MAGE-D3 was most probably formed by serial duplications of decapeptide repeats, and the NH2-terminal domains of MAGE-C1 and MAGE-D1, which are also highly repetitive (7, 20), must also have undergone sequential duplication events.
The fact that three very different MAGE-D proteins were conserved during the evolution of mammals strongly suggests that these proteins exert important but distinct functions in this phylum. Importantly, searches in databases also identified MAGE-like genes in nonmammalian species. We identified a zebrafish MAGE gene with a structure similar to that of the mammalian MAGE-D genes (11 exons, 9 of which encode the MAGE conserved domain; data not shown). In addition, a single MAGE-like gene was identified in the genome of the fly Drosophila melanogaster (FlyBase accession no. FBgn0037481; Ref. 34). Surprisingly however, we were unable to identify MAGE homologous sequences in the genome of the nematode Caenorhabditis elegans or in the yeasts Saccharomyces cerevisiae and Saccharomyces pombe although MAGE sequences were identified in several vegetal species, including Arabidopsis thaliana (GenBank accession no. AF234632; Ref. 34).
Recently, a two-hybrid analysis identified rat MAGE-D1 as a binding partner for the p75 neurotrophin receptor, raising the possibility that it could be a component of its intracellular signaling pathway (21). Although a more refined mapping is clearly required, the available data seems to point to the MAGE conserved domain of MAGE-D1 as the region involved in p75 binding. Presumably, MAGE-D1 signaling to downstream targets could be mediated by a different region of the protein that would be specific to MAGE-D1. Therefore, if other MAGE-D proteins also interact with p75 or related receptors through their MAGE conserved domain, one can assume that these interactions could result in different intracellular responses. Alternatively, receptor binding could be mediated by a MAGE-D1-specific sequence and downstream signaling by the MAGE conserved domain. These issues could be clarified by performing two-hybrid experiments using each MAGE-D protein as a bait.
As suggested above, the different MAGE-D proteins are unlikely to exert redundant functions because their NH2- and COOH-terminal domains are extremely variable. In this respect, the COOH-terminal part of MAGE-D3 is remarkable because it is identical to trophinin, a previously described protein thought to be involved in embryo implantation (25). In the mouse brain, a single mMage-d3 transcript is observed that also carries the trophinin sequence in frame. Translation of this transcript should generate a large protein composed of a transmembrane trophinin domain and an intracellular MAGE domain that could potentially be involved in an intracellular signaling pathway similar to that proposed for MAGE-D1. However, the size of the protein detected by anti-trophinin antibodies in brain corresponds to the size predicted for trophinin alone (26). This suggests that the physiological form of trophinin is devoid of most, if not all, MAGE-D3 sequences. Whether this discrepancy is attributable to a preferential initiation of translation at the trophinin ATG in exon 12 of MAGE-D3 or to a rapid processing of a large precursor protein remains to be investigated.
Most of the MAGE genes that exist today appear to be retrogenes derived from one or several MAGE-D ancestral genes. Retroposition usually results in the acquisition of a defective cDNA copy of the founder gene that degenerates into a pseudogene. However, inactivity is not always a retrogene’s fate, and it has been proposed that most intronless genes present in today’s eukaryotic genomes are functional retroposons that have lost their characteristic 3′ poly(A) stretches and flanking direct repeats because of their old age (32). The MAGE retrogenes obviously belong to this category. Necdin, which is highly conserved between man and mouse, is a candidate gene for the Prader-Willi syndrome. Its recent inactivation in the mouse germ line results in perinatal lethality, at least in some genetic backgrounds (35). MAGE-G1 is also strikingly conserved between man and mouse (91% amino acid identities in the MAGE conserved domain), suggesting that it also exerts important functions. By contrast, many other MAGE retrogenes are poorly conserved during evolution. However, despite their old age, many of them still contain an intact open reading frame and are transcribed in male germ cells. The possibility therefore remains that these genes encode proteins whose functions could be related to those of the MAGE-D ancestral gene(s). More interestingly, some of these retrogenes could have been recruited during evolution to acquire novel activities, a process referred to as exaptation (36).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported by the Fonds National de la Recherche Scientifique, Brussels, Belgium (to S. L.) and a fellowship from the Fonds pour la formation à la Recherche dans l’Industrie et l’Agriculture (to M. B.).
Internet address: http://www.ncbi.nlm.nih.gov/.
The abbreviations used are: RT-PCR, reverse transcription-PCR; RACE, rapid amplification of cDNA ends.
We thank Philippe Auquier and Maria Panagiotakopoulos for excellent technical assistance and Dr. Etienne Marbaix for the human endometrium surgical sample.