To the Editor:
The Perspective by Hassan et al. (1) is an excellent synopsis of the substantial studies by these authors and others that have identified mesothelin as a new target for tumor immunotherapy and diagnosis. We would like to comment on the mesothelin variants diagrammed in Fig. 2 of that article (1). The existence of mesothelin variants with divergent amino acid sequences has been promulgated in numerous publications, public sequence databases, and presentations at scientific meetings. Some authors use terms such as “mesothelin family proteins” (2, 3) to encompass variants reported based on cDNA sequences. However, there is no published evidence for the existence of mesothelin variants at the protein sequence level, excepting those consistent with posttranslational proteolysis (4). Hassan et al. (1) acknowledge that the large majority of mesothelin sequences in the expressed sequence (EST) database correspond to that originally reported by Kojima et al. (4), but Fig. 2 of Hassan et al. (1) may be interpreted as emphasizing the importance of mesothelin variants for cancer cell biology.
We have recently published a study (5) suggesting that a single mesothelin transcript predominates in human cell lines and tissues, corresponding to GenBank accession no. NM_005823 (ref. 4; the upper mesothelin protein diagrammed in Fig. 2B of ref. 1). All cDNA sequences derived from 17 individual human cell lines and tissue samples, including the HeLa cell line that was the source of mesothelin variant V-1 originally reported by Chang and Pastan (6) and by Hassan, Bera, and Pastan (1), corresponded to NM_005823 (4). Furthermore, all mesothelin cDNA sequences in our study (5) were consistent with available human genomic sequences, which does not appear to be the case for mesothelin V-1 (1, 6) in the region of codons 4–56. This apparent discrepancy in HeLa results might be due to acquired genomic variations in cell cultures maintained by different laboratories over several decades. Yet, it should be noted that we obtained essentially identical mesothelin sequences from 17 independent sources (5), suggesting that mesothelin V-1 is not commonly expressed by human cells.
Mesothelin V-2 (lower structure in Fig. 2B of ref. 1) is a proposed splice variant based on a partial cDNA sequence that retains intron 16; it would encode a unique COOH terminus because of frame-shift and lack the glycosylphosphatidyl inositol (GPI) anchor signal sequence, so the hypothetical protein would be secreted (soluble mesothelin; ref. 2). This transcript was detected at low levels in our study and was primarily associated with incompletely spliced nuclear RNA (5). The results suggest that “soluble mesothelin” transcripts exist primarily among heterogeneous nuclear RNA and are not likely to be efficiently expressed (translated) by either normal or malignant cells.
The above comments do not in any way diminish the significance of mesothelin for tumor immunotherapy and diagnosis, as proposed by several authors including Hassan et al. in their perspective article (1). There is only a 5% amino acid sequence difference between the two reported GPI-anchored mesothelin variants (1, 5), which may have minimal impact for applications such as mesothelin-targeted vaccines or immunotoxin therapies. Likewise, the absence of a “soluble mesothelin” variant does not rule out the possible utility of circulating mesothelin as a diagnostic or prognostic marker. Mesothelin is a GPI-anchored membrane protein, and there are several described pathways for the release of GPI-proteins from the cell surface (7). Carcinoembryonic antigen (CEA) is a prime example of a GPI-anchored protein overexpressed by many carcinomas and efficiently released into the circulation, so that serum CEA levels are used clinically as surrogate markers of disease progression in patients with CEA-expressing tumors. In this context, the precise mechanisms of mesothelin release into the circulation may be relatively unimportant for most proposed applications.
However, studies based on mesothelin mRNA expression may be adversely affected by incorrect assumptions of the target mesothelin transcript sequence. Examples include choice of oligonucleotide reagents for gene microarray, quantitative real-time PCR, in situ hybridization, antisense DNA, or inhibitory RNA studies. In such cases, knowledge of the precise sequences of transcripts responsible for mesothelin protein expression by tumor cells may prove critical. Until additional data are available, we suggest that the sequence reported by Kojima et al. (GenBank accession no. NM_005823; ref. 4) should be regarded as the default for all investigations targeting mesothelin expression by human cells.
In Response:
Dr. Shaw and colleagues have emphasized that there is confusion in the literature about the nomenclature used to describe the mesothelin gene, RNA transcripts, and the proteins encoded by these transcripts. This confusion is one of the reasons that we felt it was important to write a review on the subject of mesothelin (1). Our group identified mesothelin by expression cloning using monoclonal antibody K1 and a cDNA library prepared from HeLa cells (2). The cDNA sequence of mesothelin is very similar to the that of cDNA encoding the megakaryocyte potentiating factor described by Kojima et al. (3). Since that time, 150 sequences related to mesothelin have been deposited in various databases. Based on an analysis of these data, we concluded that the major mesothelin transcript encodes a protein of 622 amino acids. This is exactly the major transcript described in the paper by Muminova, Strong, and Shaw published after submission of our review (4). So we are in total agreement on this point. We chose to call the major transcript mesothelin, whereas they labeled it variant 1. We hope that this discrepancy will not cause further confusion. We chose not to call the major transcript a variant because it was so abundant and reserved the use of variant for minor transcripts.
Mesothelin V-1 is the sequence originally reported by our group in 1995 (GenBank accession no.U40434) and has a sequencing error in the region of codon 4-56, as mentioned by Shaw and colleagues. The difference between mesothelin and V-1 is an eight-amino-acid insertion (APRRPLPQ) as shown in Fig. 1 of ref. 4. This insertion is also reported in another expressed sequence tag in the database (GenBank accession no. CB266931), which suggests that this could be a minor but real splice variant of mesothelin. Mesothelin V-2 is supported by the existence of four expressed sequence tags (GenBank accession nos. AA404695, AI469957, AA291488, and AA291639) in the database and was also reported by Scholler et al. as a minor splice variant of mesothelin (5).
The total sequence of the protein in the circulation that is reported by Scholler et al. (5) and that is described as the secretory form of mesothelin is unclear. They report that its NH2-terminal amino acid sequence, EVEKTACPSGKKAREIDES, is identical to mesothelin (in the region of codons 296–314), but their report did not describe its COOH-terminal sequence. We noted in our review that mesothelin V-2 could be a result of abnormal splicing. However, as pointed out by Dr. Shaw, soluble mesothelin could also represent mesothelin that is shed from the cell surface. Isolation and sequencing of the soluble form of mesothelin in serum will determine whether the secretory form of mesothelin is encoded by variant 2 or whether it represents mesothelin that is shed from the cell surface.
In conclusion, we agree with Dr. Shaw that the major transcript of the mesothelin gene in human cells and tissues is the upper mesothelin protein that was diagrammed in Fig. 2B in our review and that was referred to simply as mesothelin (1). We are also in agreement that the mesothelin sequence in the GenBank accession no. NM_005823 should be considered as the default sequence for investigations targeting mesothelin (3).