Studies were conducted with the final goal of identifying genes of interest mapping to the chromosome region 16q23.3–24.1, an area commonly affected by allelic losses in breast cancer. To this end we generated a detailed physical map of the genomic region spanning between sequence-tagged site markers D16S518 and D16S516. To identify candidate genes, we used shotgun genomic sequencing as well as isolation and analysis of transcripts mapping to the area of interest. We identified and cloned a novel gene, the genomic structure of which spans the whole region of interest. We named this gene WWOX because it contains two WW domains coupled to a region with high homology to the short-chain dehydrogenase/reductase family of enzymes. The ORF of WWOX is 1245 bp long, encoding a 414-amino acid protein. This gene is composed of nine exons. We performed a mutation screening of WWOX exons in a panel of breast cancer lines, most of which are hemizygous for the 16q genomic region indicated. We found no evidence of mutations, thus indicating that WWOX is probably not a tumor suppressor gene. However, we observed that one case of homozygous deletion as well as two previously described translocation breakpoints map to intronic regions of this gene. We speculate that WWOX may span the yet uncharacterized common fragile site FRA16D region. In expression studies we found overexpression of WWOX in breast cancer cell lines when compared with normal breast cells and tissues. The highest normal expression of WWOX was observed in hormonally regulated tissues such as testis, ovary, and prostate. This expression pattern and the presence of a short-chain dehydrogenase/reductase domain and specific amino acid features suggest a role for WWOX in steroid metabolism. Interestingly, the presence of WW domains in the structure of WWOX indicate the likelihood that this protein physically interacts with other proteins. The unique features of WWOX and its possible association with cancer processes make it an interesting target for further investigation.
Chromosomal and genomic abnormalities affecting chromosome 16q have frequently been reported in cytogenetic and allelotypic studies of various epithelial tumors. We and others have demonstrated that LOH,3affecting the long arm of this autosome, is often observed in breast carcinomas and preinvasive breast lesions (1, 2, 3, 4, 5). Other tumor types, such as prostate and hepatic carcinomas, also exhibit similar abnormalities (6, 7). Due to these findings,considerable research effort has been invested in an attempt to identify the putative tumor suppressor gene(s) that may reside in the distal portion of ch 16q (8, 9, 10). In a previous study we observed that one of the most commonly affected areas spanned the region between STS markers D16S515 and D16S504 with the most affected marker being D16S518 at 16q23.3–24.1 (11). The high incidence of LOH observed at preinvasive stages of tumor development led us to speculate that a candidate tumor suppressor gene or genes located at 16q23.3–24.1 may play an important role in early breast carcinogenesis (11).
In this report we describe the physical map of the region of interest between STS markers D16S518 and D16S516 and the cloning of a novel protein from within this area.
Materials and Methods
BAC Identification and Development of STSs and DNA Sequencing.
YAC and BAC clones spanning the region of interest were identified by PCR screening of STSs. BACs were isolated from BAC library CITB-HSP-C(Research Genetics, Huntsville, AL). To generate novel STSs for contig building, BAC ends were sequenced using SP6 and T7 vector primers.
Genomic DNA shotgun sequencing was performed using DNaseI BAC DNA digestion and subsequent cloning into pZErO-1 vector (Invitrogen). Inserts were amplified with vector-specific primers. Cycle sequencing reactions were performed using ABI PRISM BigDye Terminator cycle sequencing chemistry (Perkin-Elmer/Applied Biosystems) and analyzed on an ABI 377 automated fluorescent sequencer (Perkin-Elmer/Applied Biosystems). When necessary, clones were sequenced manually with 32P-labeled primers.
Isolation of Candidate cDNAs from the Region of Interest.
cDNA clones were isolated following a modification of a solution hybrid capture method described by Futreal et al.(12), using BAC clones from the region of interest, as selector DNA, and isolating cDNA clones from a human mammary gland cDNA library (5′-STRECH; Clontech Laboratories, Inc.). All cloned cDNAs were sequenced and analyzed using the BLAST algorithm, searching all available GenBank human databases. The isolated cDNA clones were mapped back to the corresponding BAC (selector) DNAs and compared with the genomic DNA sequence.
WWOX cDNA Isolation and Exon-Intron Structure Determination.
A consensus sequence was generated by alignment of the primary cloned cDNA sequence and matching ESTs. From this sequence primers were prepared to isolate the full-length cDNA. Two independent clones were isolated from a placenta cDNA library panel (Rapid Screen, OriGene Technologies, Inc.). Additionally, a second strategy was followed using the 5′- and 3′-rapid amplification of cDNA ends PCR method on a human mammary gland cDNA library (Marathon-Ready; Clontech) according to the manufacturer’s protocol. The cDNAs isolated by this last method were cloned, sequenced, aligned, and compared with the clones isolated from the placenta library to determine the full-length cDNA. Primers for the 5′ and 3′ ends of WWOX cDNA were used as a first step to isolate additional BAC genomic clones. After the intron-exon junctions for a specific exon were determined, primers for the next exon were designed. The whole-length cDNA sequence was compared with the genomic sequence to determine the genomic structure of WWOX.
Protein Sequence Analysis.
The WWOX amino acid sequence was analyzed using the BLASTP and PSI-BLAST algorithms in search for matches or homologies in the GenBank protein databases. The identification of protein family domains was determined by using the Pfam domain models (PFAM: multiple alignments and profile HMMs of protein domains, release 4.3, The Pfam Consortium, http://pfam.wustl.edu/).
Northern blots using 2 μg of poly(A) RNA from breast cancer cell lines and normal human breast epithelium were prepared using standard procedures. The multitissue Northern blot was purchased from Clontech. A 1553-bp EcoRI restriction fragment of WWOXclone, spanning the 5′ end and amino acid-coding region, was used as probe after labeling with [32P]dCTP using random priming (Prime It II; Stratagene, La Jolla, CA). The membranes were hybridized in Rapid-hyb buffer (Amersham), followed by washing according to the manufacturer’s protocol.
In Vitro Translation.
In vitro translation was performed using an in vitro transcription-translation reticulocyte lysate assay (TNT T7 Quick Coupled Transcription/Translation System; Promega) with full-length WWOX cDNA as template.[35S]Methionine-labeled products were analyzed by SDS-PAGE followed by phosphorimager detection.
Genomic DNA isolated from a panel of 27 breast cancer cell lines was used to resequence each of the exons of WWOX. Primers for individual exon amplification and sequencing are specified in Table 1.
General Characterization of the Chromosome 16q Region of Interest.
We have focused on the 16q23.3–24.1 chromosome region as a consequence of our previous LOH analyses (11) and an additional high-resolution allelotypic study using a panel of 27 breast cancer cell lines. The latter study showed a very high incidence of hemizygosity within the area of interest affecting 70–80% of these cell lines and one primary tumor case with a homozygous deletion(13).
We followed two main approaches to characterize the chromosomal region of interest to isolate the putative tumor suppressor gene(s). After building a YAC and BAC contig spanning the D16S518–D16S516 region, we used conventional shotgun sequencing and cDNA isolation.
We isolated numerous cDNA clones from the area using a solution hybrid cDNA capture method (12). Thirty-five cDNAs were isolated and sequenced, of which 17 matched previously isolated ESTs, and 18 showed no matches in any of the GenBank databases. All of the isolated cDNA clones were mapped back to the corresponding BAC DNAs, and their sequences were compared with the genomic DNA sequence to identify evidence of exon-intron structure. Only one of these cDNAs showed such features and is described in the following section (i.e.,WWOX).
We also sequenced approximately 400,000 bp of the region covered by the overlapping BACs 112B7, 249B4, 286F3, and 36O22 (Fig. 1) including a continuous sequence of 96,371 bp (accession no. AF179633). This genomic sequence was also analyzed for matching EST clusters from GenBank databases. Of the numerous ESTs identified and analyzed, none showed evidence of ORF or exon-intron structure.
Isolation and Characterization of WWOX.
After sequencing the mentioned cDNA we isolated two independent full-length cDNA clones from a placenta cDNA library using specific PCR primers that spanned the transcript. These full-length cDNAs showed a consensus sequence of 2264 bp long with a predicted ORF of 1245 bp, a 125-bp-long 5′ UTR, and a 3′ UTR 870 bp long with a polyadenylation signal AATAAA starting at position +2091 (cDNA reported to GenBank under accession no. AF211943). The putative start ATG codon is located within a strong Kozak sequence (TCAGCCatgG), which contains a highly conserved G residue (position +4) and purine (G residue at position−3; Ref. 14). An in-frame stop codon is present −30 bp from the predicted translation start site, indicating that the whole ORF was cloned. We named this gene WWOX for the reasons discussed in the following section.
Next we determined the exon structure and exon-intron boundaries of WWOX and confirmed the chromosomal location of the gene to the region of interest. To this end various combinations of PCR primers were designed based on the cDNA sequence and then mapped back to the corresponding BACs. Subsequent sequencing of the predicted exons permitted the exon-intron boundaries to be established at the genomic DNA level (Fig. 1).
WWOX is composed of nine exons, ranging in size from 58 to 1060 bp (Table 1). On the basis of analysis of the promoter region(reported to GenBank under accession no. AF212843), we determined that the first exon is localized within a CpG island starting at position−660 and extending into the first intron at 292 bp from the ATG translation start site. This area shows a 63% content of C + G and 8% CpG with the highest percentage within the area from −300 bp to ATG (68 and 11%, respectively).
We also observed that the 3′ end of WWOX had high sequence homology to a previous GenBank entry of a human putative oxidoreductase(U13395, locus 9621).
Interestingly, we found that the physical map of WWOX spans the whole region of interest: we mapped exon 1 of WWOX to the BAC containing D16S518 and exon 9 and the 3′ UTR to the BAC containing D16S516 (Fig. 1). We estimate that WWOX spans a large genomic region of ∼1 Mb in size. Although the exact lengths of the intronic portions of this gene were not determined, we based this estimate on the known size of the YAC clones, the average size of BAC clones (∼150 kb), and our shotgun sequencing (described above)wherein the ∼400,000 bp corresponded to the intron 8 area of WWOX (Fig. 1). Interestingly, analysis of sequence contigs from this large intronic region allowed us to identify two previously described translocation breakpoints mapping to this same area. These translocation breakpoints, MM.1 and JJN3, have been described in multiple myeloma involving chromosomes 14 and 16, i.e.,t(14;16)(q32;q23) (15). Further sequence analysis of this area also identified the existence of a pseudogene for ribosomal protein S3 matching to sequence within this intronic region (AF179633). To our knowledge no other genes have been reported in this area(i.e., D165518–D165516), and we have not identified any other candidates.
WWOX Protein Structure.
The 1245-bp WWOX ORF encodes for a 414-amino acid protein(Fig. 2). The BLAST and PSI-BLAST algorithms were used to search for matches in GenBank databases. Interestingly, the NH2terminus of the putative WWOX protein showed homology to ubiquitin ligases such as NEDD4, YES-associated protein YAP65, and other WW domain-containing proteins (data not shown). Further amino acid sequence analysis using the PROSITE database identified two regions within the putative WWOX protein (amino acids 18–47 and 59–88), which have high homology to WW domain sequences. The first motif exhibits typical features of a WW domain; it is 26 bp long with the two highly conserved tryptophan and one proline residues. In the second WW domain one tryptophan is replaced by a tyrosine residue; this is an alternative functional replacement, which is also found in other WW domain proteins (Fig. 2).
The amino acid sequence also revealed homology to numerous proteins known as members of the SDR family. The SDR family encompasses a wide variety of enzymes, which act on diverse hydroxy and keto substrates. The most conserved features of SDR proteins are two domains constituting the cofactor, GXXXGXG, and substrate, YXXXK, binding sites (16). Further WWOX amino acid sequence analysis identified both the coenzyme, NAD(H)or NADP(H), binding site GANSGIG at position 131–137 and the potential substrate binding site YNRSK at positions 293–297 (Fig. 2).
Due to the presence of the WW domains and the homology to SDR, we named this novel protein WWOX. Analysis using the PSORT algorithm predicts that WWOX has no NH2-terminal signal peptide, and its localization is probably in the cytoplasm (17).
WWOX Expression Analysis.
Northern blot analysis with a probe derived from 5′ end of WWOX revealed a 2.2-kb mRNA (Fig. 3), which is in agreement with the length of the cDNA clone i.e., 2264 bp.
Analysis of the WWOX expression pattern in normal human tissues showed that expression was highest in testis, prostate, and ovary and significantly lower in the other examined tissues, including bulk breast tissues, which showed very low levels of expression (Fig. 3 A).
We also analyzed the expression of WWOX in normal mammary epithelial cells in culture and in breast cancer cell lines. All of the breast cancer lines analyzed showed higher WWOX expression than normal breast tissue and normal mammary epithelial cells (HME-87;Fig. 3, A and B). However, among the breast cancer lines analyzed, WWOX expression levels varied from relatively low in T47D and MDA-MB435 to high overexpression in ZR75–1 and MCF-7 cells (Fig. 3,B). Of these cell lines, previous extensive allelotypic analysis using highly polymorphic STS markers allowed us to determine that T47D, ZR-75, MDA-MB435, MDA-MB453, SKBR3,and UACC812 were among the group of breast cancer lines with no evidence of STS heterozygosity along most or all of 16q(13). This indicated the high likelihood that they had lost all or a large portion of one of the original parental 16q arms including the region spanned by WWOX. However, the putative hemizygous status of WWOX did not impede expression (Fig. 3 B). We also found no correlation between the estrogen receptor status of the breast cancer lines and the levels of WWOX expression.
To examine whether the translation of the ORF of WWOXproduced the predicted protein, we used an in vitrotranscription-translation system. SDS-PAGE analysis of the translated product revealed a single protein product of ∼46 kDa (Fig. 3 C). This agrees with the predicted molecular mass of WWOX based on its amino acid sequence (i.e., 46,676.8 Da).
Mutation Screening in Breast Cancer Cell Lines.
As mentioned above, WWOX spans the whole chromosomal area of interest between STS markers D16S518 and D16S516. The high incidence of LOH affecting this region led us previously to speculate on the existence of a putative tumor suppressor gene within this area. Thus,to investigate whether WWOX is a target for mutations in breast cancer, we performed a mutation screening on a panel of 27 breast cancer cell lines. This panel of breast cancer cell lines was of particular interest, because we have previously observed they exhibit a high incidence of hemizygosity within the chromosome 16q area of interest (13), i.e., cell lines in which the majority had already lost one WWOX allele via chromosomal rearrangements. Only one breast cancer case displayed a homozygous deletion in this region (13). We now determined that this deletion spanned part of the intron 8 region of WWOX from STS 249B4S to D16S3029 inclusive (see Fig. 1).
Each of the WWOX exons was amplified from the genomic DNA of each of the 27 breast cancer lines, and the products were sequenced. The intronic primers used for amplification and sequencing are detailed in Table 1.
We found no evidence of deletions or insertions in the examined DNAs. Two substitutions found appear to have a polymorphic rather than a mutational nature. The first, a C/T found at position −5 in the Kozak sequence, was observed in 50% of the tumor samples, but it was also observed that DNA isolated from normal mammary gland was polymorphic at this position. The second substitution, a G/A at position +534, results in an amino acid change, alanine to threonine, but because of the approximate frequency of 50% for either allele in the samples examined and because heterozygosity at this position was also found in normal DNA, we concluded that it also represents a polymorphism.
In these studies we have outlined the physical map of a 16q chromosomal region commonly affected by abnormalities in breast and other cancers. The area spans from STS marker D16S518 to D16S516 in the 16q23.3–24.1 region. Additionally, we have cloned a novel gene mapping to this area, with unique characteristics. This gene, WWOX,contains two WW domains on the NH2 terminus of the protein and a SDR central domain. By analogy to other WW domain-containing proteins, the WW motifs should play a role in protein-protein interactions. The SDR domain is expected to bind low molecular weight ligands and cofactors, and the corresponding putative binding motifs have also been identified in WWOX.
The protein motif called WW or WWP was identified in very different types of proteins, including peptidyl-prolyl isomerases involved in mitosis regulation (18, 19), the cytoskeletal protein dystrophin (20), spliceosome-associated proteins(21), the ubiquitin-protein ligase NEDD4(22), and signal-transducing protein YAP65(23). This domain is characterized by the presence of very conserved proline and tryptophan residues (20, 24, 25). WW domains are known to interact with the proline-rich motifs of other proteins. Thus far, four different WW binding proline-rich motifs have been identified: PPXY (23), PPLP(26), PGM/PPR (21), and phosphoserine/phosphothreonine (27). At this point it is not possible to predict which type of proline motif the WWOX protein would have affinity for.
SDRs represent a wide spectrum of enzymes. The protein domain database(PROSITE) identifies >60 different proteins from bacteria, fungi,plants, and animals that belong to this family. These are typically enzymes that metabolize different alcohols, sugars, keto-acyls,retinoids, steroids, and other hydroxy and keto substrates. One important group among the SDR proteins is the family of hydroxy-steroid dehydrogenases. The average size of SDR enzymes is 250–300 amino acids. Although overall similarity between the different SDRs can be as low as 15–30%, a small conserved substrate binding motif, YXXXK, and a coenzyme binding motif,GXXXGXG, are characteristic for these proteins(16). Although the 414 amino acids of WWOX make it larger than the average SDR enzyme, the WWOX dehydrogenase domain exhibits the typical sequence features and distances between conserved motifs that are characteristic of SDR enzymes (28). To our knowledge WWOX is, as yet, the only protein described that contains both binding motifs for low molecular weight ligands and substrates and WW domains.
WWOX has one additional putative signature, which is a serine residue 12 amino acids upstream of the YNRSK substrate binding motif. This serine is at a nearly identical location to that observed in steroid dehydrogenases (usually position −13 from Tyr),which is suggested to play an important role in the catalysis of steroid substrates (28).
At this stage it is difficult to predict the role that WWOX may play in the metabolism of the cell. Nevertheless, its unique features make it an interesting target for further investigation. Hydroxysteroid dehydrogenases and reductases usually show a wide tissue expression profile, although some enzymes of this family are tissue specific. Northern analysis has shown that WWOX transcripts are highly represented in hormonally active tissues, with testis showing the highest expression. This tissue specificity, in addition to the SDR domain features, leads us to speculate that the WWOX substrate for dehydrogenase and reductase activity is probably a steroid. Because WWOX has the ability to interact with other proteins via the WW domains, the possibility exists that this protein may play a role in steroid-receptor interaction regulation. It is also possible to predict, on the basis of amino acid sequence analysis, that WWOX may localize to the cytoplasm of cells (17).
We found no evidence of WWOX mutations in a variety of breast cancer lines suggesting that WWOX is not a tumor suppressor gene. However, we found that WWOX is overexpressed in breast cancer cells when compared with normal tissues. It is interesting to note that the cell line with the highest WWOX mRNA expression is the MCF7 line, which is characterized by its high dependence on estradiol for growth. This invites the speculation that perhaps WWOX plays a role in estradiol-estrogen receptor interaction regulation.
It is also intriguing that WWOX spans a chromosomal area characterized by a very high incidence of allelic loss and chromosomal rearrangements. Furthermore, we have mapped two previously described chromosomal breakpoints, MM.1 and JJN3, to the last intron of WWOX (Fig. 1). These specific 16q translocation breakpoints,t(14;16)(q32;q23), were previously described in multiple myeloma(15). Consequently, at least one of the alleles of WWOX should be truncated in some multiple myeloma lines. Hence, the potential role for WWOX inactivation in multiple myeloma needs to be investigated. In those myeloma studies it was also observed that other additional translocation breakpoints such as KMS11 and ANBL6(see Ref. 15) also map to the same region covered by YACs 933h2 and 972d3, hence, in very close vicinity of WWOX. Nevertheless,the putative oncogenic target for transcriptional dysregulation in the myeloma translocations was proposed to be the c-mafoncogene, which is located telomeric of WWOX and in the opposite 5′-3′ orientation (15). Due to those observations, we investigated whether the c-maf oncogene showed any expression alterations in the breast cancer lines shown in Fig. 3. In short, we found no abnormalities of c-mafexpression when comparing breast cancer lines with normal breast cells and tissues (data not shown). It is also worth mentioning that translocations and deletions affecting ch16q23 have been described as primary cytogenetic anomalies in several breast cancers (29, 30). In our studies, we found one breast cancer case with a homozygous deletion also mapping within the same intron 8 region in which the translocation breakpoints were mapped. Interestingly, the locus for the yet uncharacterized common fragile site FRA16D has been cytogenetically mapped to this very same chromosome region,16q23.2–23.3 (31, 32).
All the evidence suggests that the whole genomic region spanned by WWOX and, in particular, the intron 8 region appear to be an area prone to chromosomal fragility. Although highly speculative, this area could be the same as the mentioned common fragile site FRA16D,because it displays features of genomic fragility similar to those observed in other better-characterized common fragile site loci(e.g., FRA3B; Refs. 31, 32).
Our identification of WWOX and its possible association with cancer processes make it an interesting gene that deserves further investigation.
Note Added in Proof
While this manuscript was in press, additional chromosome 16q genomic sequence was reported by the DOE Joint Genome Institute under GenBank Accession number AC009044. This information allowed us to confirm WWOX’s genomic structure for exons 1–5 and estimate the length of the corresponding introns as follows: intron 1 (8,538 bp);intron 2 (1,295 bp); intron 3 (5,141 bp); intron 4 (49,019 bp); and intron 5 (>80,000 bp).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported by the Kleberg Foundation Fund Cancer Genetics Program, US Army Grant DAMD17-94-J-4078, and National Institute of Environmental Health Sciences Center Grant ES07784.
The abbreviations used are: LOH, loss of heterozygosity; STS, sequence-tagged site; BAC, bacterial artificial chromosome; YAC, yeast artificial chromosome; EST, expressed sequence tag; poly(A), polyadenylic acid; ORF, open reading frame; UTR,untranslated region; SDR, short-chain dehydrogenase/reductase.
|Exon/ intron .||Starting position in cDNA .||Exon length (bp) .||Acceptor splice sitea .||Donor splice sitea .||Intronic primers for mutation screening (5′-3′) .||PCR product size (bp) .|
|Exon/ intron .||Starting position in cDNA .||Exon length (bp) .||Acceptor splice sitea .||Donor splice sitea .||Intronic primers for mutation screening (5′-3′) .||PCR product size (bp) .|
Exonic sequences are indicated in uppercase; intronic sequences are indicated in lowercase.
We are grateful to Dr. Michael C. MacLeod for useful comments and Michelle Gardiner for secretarial assistance.