Purpose: Patients with metastatic adenocarcinoma of unknown origin are a common clinical problem. Knowledge of the primary site is important for their management, but histologically, such tumors appear similar. Better diagnostic markers are needed to enable the assignment of metastases to likely sites of origin on pathologic samples.

Experimental Design: Expression profiling of 27 candidate markers was done using tissue microarrays and immunohistochemistry. In the first (training) round, we studied 352 primary adenocarcinomas, from seven main sites (breast, colon, lung, ovary, pancreas, prostate and stomach) and their differential diagnoses. Data were analyzed in Microsoft Access and the Rosetta system, and used to develop a classification scheme. In the second (validation) round, we studied 100 primary adenocarcinomas and 30 paired metastases.

Results: In the first round, we generated expression profiles for all 27 candidate markers in each of the seven main primary sites. Data analysis led to a simplified diagnostic panel and decision tree containing 10 markers only: CA125, CDX2, cytokeratins 7 and 20, estrogen receptor, gross cystic disease fluid protein 15, lysozyme, mesothelin, prostate-specific antigen, and thyroid transcription factor 1. Applying the panel and tree to the original data provided correct classification in 88%. The 10 markers and diagnostic algorithm were then tested in a second, independent, set of primary and metastatic tumors and again 88% were correctly classified.

Conclusions: This classification scheme should enable better prediction on biopsy material of the primary site in patients with metastatic adenocarcinoma of unknown origin, leading to improved management and therapy.

Most cancers present at their site of origin—that is, it is the primary tumor which causes symptoms in the patient who then attends their doctor. Some 10% to 15% of cancers, however, present as metastases in solid organs, body cavities or lymph nodes (1). Most of these secondary tumors are adenocarcinomas (14), for which the seven commonest primary sites are breast, colon, lung, ovary, pancreas, prostate, and stomach (1). The prognosis and therapy of patients with metastatic adenocarcinoma are linked to the site of origin, so these sites, and others, are investigated by clinical examination, radiology, and serum tumor markers (24). If no primary cancer is found, then the metastatic deposit is usually biopsied, to confirm the diagnosis of malignancy and to subtype the tumor. Unfortunately, adenocarcinomas metastatic from different locations have similar microscopic appearances, which confound identification of the primary site (5). Patients with metastatic adenocarcinoma of unknown origin make up around 3% of all cancer patients and this category is among the 10 most common malignancies (4).

Traditionally, cancer classification has been based on clinical criteria and histopathology, which is itself based on tissue and cell morphology. These phenotypic abnormalities, however, are underpinned by changes in gene expression, at the mRNA and ultimately protein levels (6). The latter can be exploited in tissue samples through immunohistochemistry, which is already the standard method for lymphoma categorization (7). Our aim was to establish a similar classification scheme for adenocarcinomas.

For diagnostic pathology, ideally we want to: use as few markers as possible; correctly predict origin for many tumors (high coverage—sensitivity); and misclassify few tumors (high accuracy—specificity; refs. 8, 9). A number of tumor markers are already in use for this purpose (10, 11), including cytokeratins 7 and 20 (CK7 and CK20) and thyroid transcription factor 1 (TTF1; refs. 12, 13). However, we wished first to ensure that all potential candidates were evaluated. In a previous study, we used hierarchical clustering to show that the molecular profiles of the seven most common adenocarcinomas are characteristic of their primary site, and furthermore, that this is maintained during metastasis (14). We then developed a comprehensive bioinformatic approach that identified key differences between primary sites (14).

In the present study, by profiling selected candidate site-specific markers, we aimed to develop a scheme for the immunohistochemical evaluation of adenocarcinomas that improves the prediction of primary sites. The classifier would need to be applicable in routine diagnostic practice and so its utility was validated in a second independent set of primary and metastatic tumors. High-throughout expression profiling at a histologic level was enabled by the use of tissue microarrays (15).

Ethical approval for the use of clinical material in this research was given by the North Glasgow University Hospitals National Health Service Trust.

Selection of candidate markers. Through bioinformatic analyses, we identified 61 candidate markers which appeared to be either specific to one or restricted to two or three primary sites (14). Of these, 14 were taken forward to tissue profiling: carcinoembryonic antigen, chromogranin A, clusterin, fatty acid binding protein, immunoglobulin-κ light chain, insulin, mammaglobin 1, pancreatic polypeptide, pepsinogen C, prostate-specific antigen, prostatic acid phosphatase, surfactants A and B, and trefoil factor 2. This was integrated with the profiling of 13 immunohistochemical markers which are clinically established or have more recently emerged from the literature: CA125, CA19-9, CDX2, CK7, CK20, estrogen receptor, gross cystic disease fluid protein 15 (GCDFP-15), hepatocyte, lysozyme, mesothelin, mucin 5AC, trefoil factor 1, and TTF1 (8, 9, 12, 13, 1627).

Selection and assembly of tumors for tissue profiling. Our focus was on adenocarcinomas from the seven main sites. The sample numbers were proportional to their frequency of presentation with metastatic disease, rather than the overall incidence (1). The absolute numbers were chosen to yield statistical power, assuming marker sensitivity and specificity of at least 65% and 75%. The first round comprised 352 primary tumors in four tissue microarrays, with 261 adenocarcinomas from the main sites: 35 breast (28 ductal and 7 lobular), 47 colon, 46 lung, 28 ovary (18 serous and 10 mucinous), 53 pancreas, 18 prostate, and 34 stomach tumors (26 intestinal and 8 diffuse). Other tumors which arise at the same or nearby sites or enter the differential diagnosis were also included, as listed in Fig. 1A. The second round comprised 100 primary and 30 paired metastatic tumors, from the seven main locations only, in two tissue microarrays, with the numbers of primaries and metastases, respectively, given in brackets: 17 breast (10 + 3 ductal and 3 + 1 lobular), 22 colon (17 + 5), 26 (20 + 6) lung, 13 ovary (7 + 2 serous and 3 + 1 mucinous), 27 pancreas (20 + 7), 7 prostate (all primary), 18 stomach (10 + 4 intestinal and 3 + 1 diffuse).

Fig. 1.

A, table of immunohistochemistry results from the first round of tissue profiling for the 10 selected markers. The data for the remaining 17 markers are provided in Supplementary Table 1. The percentage positivity of each marker in each tumor type is given as a number and as background shading. B, the sensitivity and specificity of each marker. C, this diagnostic decision tree was built using the ranked specificities and sensitivities. To apply this, we start at the top. If the tumor is prostate-specific antigen–positive, then it is from prostate; if not, then we move down the tree, and so on. Pancreatic and gastric tumors mostly had to be grouped together. D-G, the results of applying the diagnostic table and decision tree, respectively, are presented in D and E for the first round data and F and G for the second round. In this format, termed a confusion matrix, correct predictions lie along the diagonal, shaded black; the incorrect predictions lie in the white areas. “Missing” tumors for the tree are those lacking any marker along its decision-making pathway, and for the table are those lacking at least five assessable markers. In addition, some tumors lacked any positive markers: these tumors have been called “negative” in the matrix for the diagnostic table and are regarded as errors when calculating predictive success; with the tree, such tumors fall into the “breast or stomach or pancreas” bin at the bottom. The metastatic tumors in the second round are shown in red. H, table of immunohistochemical results combined from both first and second rounds of tissue profiling.

Fig. 1.

A, table of immunohistochemistry results from the first round of tissue profiling for the 10 selected markers. The data for the remaining 17 markers are provided in Supplementary Table 1. The percentage positivity of each marker in each tumor type is given as a number and as background shading. B, the sensitivity and specificity of each marker. C, this diagnostic decision tree was built using the ranked specificities and sensitivities. To apply this, we start at the top. If the tumor is prostate-specific antigen–positive, then it is from prostate; if not, then we move down the tree, and so on. Pancreatic and gastric tumors mostly had to be grouped together. D-G, the results of applying the diagnostic table and decision tree, respectively, are presented in D and E for the first round data and F and G for the second round. In this format, termed a confusion matrix, correct predictions lie along the diagonal, shaded black; the incorrect predictions lie in the white areas. “Missing” tumors for the tree are those lacking any marker along its decision-making pathway, and for the table are those lacking at least five assessable markers. In addition, some tumors lacked any positive markers: these tumors have been called “negative” in the matrix for the diagnostic table and are regarded as errors when calculating predictive success; with the tree, such tumors fall into the “breast or stomach or pancreas” bin at the bottom. The metastatic tumors in the second round are shown in red. H, table of immunohistochemical results combined from both first and second rounds of tissue profiling.

Close modal

Cases were identified from pathology records. Tissue microarrays were constructed using Beecher Instruments' Manual Tissue Arrayer and 0.6 mm punches (Silver Spring, MD). Test samples were arrayed in triplicate (28), alongside duplicate internal control cores. The tissue microarray blocks were sectioned onto slides using the Paraffin Tape-Transfer System, Instrumedics, Inc. (Hackensack, NJ). Representative sections were checked by H&E staining.

Immunohistochemistry. Twenty-six primary antibodies were used according to the supplier's instructions for dilution and antigen retrieval: 10 as detailed in Table 1, plus a further 16: CA19-9, clusterin, fatty acid binding protein, trefoil factor 2, and surfactant protein B from Novocastra Laboratories, Ltd. (Newcastle upon Tyne, United Kingdom); carcinoembryonic antigen, chromogranin A, hepatocyte, immunoglobulin, prostatic acid phosphatase, and surfactant protein A from Dako, Ltd. (Ely, United Kingdom); trefoil factor 1 and mucin 5AC from Lab Vision, Ltd. (Newmarket, United Kingdom); pepsinogen C from Abcam, Ltd. (Cambridge, United Kingdom); pancreatic polypeptide from Harlan Sera-Lab (Loughborough, United Kingdom); and insulin from Diagnostics Scotland (Edinburgh, United Kingdom).

Table 1.

Details of selected markers and antibodies used

Antibody targetCloneSourceDilutionAntigen retrieval
CA125 Ov185:1 Novocastra 1:400 Heat 
CDX2 AMT28 Novocastra 1:100 Heat 
CK7 OV-TL Dako 1:500 Heat 
CK20 Ks20.8 Dako 1:500 Heat 
Estrogen receptor 6F11 Novocastra 1:50 Heat 
GCDFP-15 23A3 Novocastra 1:20 Heat 
Lysozyme Polyclonal Dako 1:5,000 Enzymatic 
Mesothelin 5B2 Novocastra 1:20 Heat 
Prostate-specific antigen ER-PR8 Dako 1:100 Heat 
TTF1 SPT249 Novocastra 1:50 Heat 
Antibody targetCloneSourceDilutionAntigen retrieval
CA125 Ov185:1 Novocastra 1:400 Heat 
CDX2 AMT28 Novocastra 1:100 Heat 
CK7 OV-TL Dako 1:500 Heat 
CK20 Ks20.8 Dako 1:500 Heat 
Estrogen receptor 6F11 Novocastra 1:50 Heat 
GCDFP-15 23A3 Novocastra 1:20 Heat 
Lysozyme Polyclonal Dako 1:5,000 Enzymatic 
Mesothelin 5B2 Novocastra 1:20 Heat 
Prostate-specific antigen ER-PR8 Dako 1:100 Heat 
TTF1 SPT249 Novocastra 1:50 Heat 

Immunohistochemistry was done by the standard indirect method, using Envision ChemMate from Dako, according to the manufacturer's instructions. The tissue microarray sections required longer dewaxing in xylene (20 minutes). Antigen retrieval was done enzymatically [0.1% trypsin/0.1% calcium chloride in Tris buffer (pH 7.6) at 37°C for 25 minutes] or with heat [0.005% EDTA in Tris buffer (pH 8.0) in a microwavable pressure cooker for 5 minutes]. Immunohistochemistry was done on a Dako Autostainer in the first round and manually by a different scientist in the second round.

In situ hybridization. IMAGE clones for mammaglobin 1 (109005) and pepsinogen C (10187) were obtained from the Medical Research Council Geneservice (Babraham, United Kingdom). Plasmid preparation was done and clones confirmed by sequencing. The template was linearized then digoxigenin-UTP-labeled riboprobes were generated by transcription with T7 or Sp6 RNA polymerase, using the DIG RNA Labelling Kit from Roche Diagnostics, Ltd. (Lewes, United Kingdom). In situ hybridization was done as previously described (29).

Microscopic assessment and scoring. The stained slides were assessed without knowledge of other clinical or pathologic data. For each sample, the three cores were scored individually and the highest figure used (30). The first round slides were scored separately by two observers (J. Dennis and K. Oien, a consultant histopathologist); cores with differing results were reviewed and agreement reached. The second round was assessed by one observer (K. Oien). The intensity of staining was scored on a categorical scale, from 0 to 3, where: 0 was absent; 1 was very weak, dubious staining; 2 was definite, mild, or moderate staining; and 3 was definite, strong staining (16, 26, 31). Only staining in tumor cells, in the appropriate nuclear or cytoplasmic location, was scored. Representative images of positive staining are presented in Supplementary Fig. 1.

Data analysis. The data were stored and analyzed using Microsoft Access. To identify subtle associations, we used Rosetta, available on-line at http://www.idi.ntnu.no/~aleks/rosetta (32). Rosetta is a software package implementing a supervised learning method, based on rough set theory and Boolean reasoning. It induces minimal decision rules, where each rule is constructed using the minimal information needed to discern objects. Sensitivity and specificity were calculated using standard formulae (16).

Tissue profiling of candidate markers in primary adenocarcinomas. For the first round of expression profiling, we studied 352 primary tumor samples with 27 candidate markers: 25 by immunohistochemistry alone, 1 by in situ hybridization (mammaglobin 1) alone, and 1 by both (pepsinogen C). Staining was scored blindly: for analysis, we regarded only convincing staining (scores of 2 and 3) as positive (16, 26, 31).

Our aim was then to select the best markers, and the best combination in which to apply them: the ideal would be to have one highly sensitive and specific marker for each of the seven sites. First, we analyzed the database manually: some associations were obvious, such as prostate cancer and its close-to-ideal marker prostate-specific antigen. To identify more subtle associations, we used the Rosetta system, which produces minimum rules, which describe and separate groups of objects, and which are ranked by specificity and sensitivity (32). To illustrate this process, for lung, candidate markers included TTF1 and surfactant proteins A and B. All three markers stained around 90% of lung cancers and thus were highly sensitive. Whereas TTF1 and surfactant A were essentially limited to lung tumors, surfactant B was also present in up to 13% of other adenocarcinomas and thus was less specific. TTF1 was selected as the lung marker on the basis of its strong and easily discriminated nuclear staining. In contrast with these lung candidates, the positive markers for breast, estrogen receptor, and GCDFP-15, stained lower proportions of breast cancers but seemed to be independent, such that some tumors were positive only with one or the other, hence both markers were selected (24).

This analysis led to a smaller panel of 10 markers: CA125, CDX2, CK7, CK20, estrogen receptor, GCDFP-15, lysozyme, mesothelin, prostate-specific antigen, and TTF1. The immunohistochemical results for these markers are presented in Fig. 1A. In breast, ovary and stomach, major tumor subtypes exist and were assessed: the expression profiles differed between subtypes only for ovary. The percentage positivity is given as a number and as background shading, for easy visual comparison. The sensitivity and specificity of each marker is given in Fig. 1B. Note that sensitivity is simply the number (or shading intensity) in the appropriate cell in Fig. 1A; and specificity is the (lack of) shading in the column for each marker outside the “correct” cell. Representative images of positive staining for the 10 selected markers are presented in Supplementary Fig. 1. The immunohistochemical results for the remaining 17 candidates are provided in Supplementary Table 1. In addition, the marker data have been summarized in a second graphic format, through the application of hierarchical clustering: the resulting heat-maps are shown in Supplementary Fig. 2.

As already indicated, a few markers, on their own, were diagnostic of a single primary site: prostate-specific antigen for prostate; and TTF1 for lung. Some markers, on their own, were diagnostic of a subset of tumors: estrogen receptor for breast or ovary; and CDX2 for gastrointestinal tumors (ovarian mucinous tumors excluded and discussed below). Other markers, in combination, were diagnostic of a single tumor site: estrogen receptor, mesothelin, and CA125 together indicate ovary. Some tumors had similar profiles that were difficult to separate: pancreas and stomach (between these two, CK7 negativity made pancreas unlikely; and CA125 positivity, especially with mesothelin positivity, made pancreas more likely).

Primary carcinomas which arise at the same sites or might enter the differential diagnosis of metastatic deposits were also studied: their expression profiles resembled the anatomically and developmentally closest “main” adenocarcinoma. Thus, endometrial adenocarcinoma was similar to ovarian cancer, and adenocarcinomas of the duodenal ampulla and bile ducts (cholangiocarcinoma) resembled pancreatic ductal tumors. Hepatocellular carcinomas expressed none of these markers: because the liver is commonly involved in metastatic spread, this facilitates the distinction between primary and secondary tumors.

Development and application of a diagnostic classification scheme. Pathologists are accustomed to using information in table format, where all data is considered simultaneously in making a diagnosis. However, we decided to also build a decision tree (Fig. 1C), using the ranked specificities and sensitivities of the markers. To apply this, we start at the top, with the most specific marker. If the tumor is prostate-specific antigen–positive, then it is from the prostate; if not, then we move down the tree, and so on. Pancreatic and gastric tumors mostly had to be grouped together. An alternative decision tree was built using the conventional statistical approach of recursive partitioning; it is shown for comparison in Supplementary Fig. 3, but was not selected for further use.

We then returned to our first round data and attempted to predict the primary site using these classifiers. The decision tree was used as depicted (Fig. 1C). With the diagnostic table, in most cases, a single site could be assigned. In a few cases, however, two sites seemed equally likely, and agreement of either with the actual site was regarded as correct. The four common pairs were: stomach or pancreas, stomach or colon (CDX2 and/or CK20 positive but also mesothelin and/or CA125 positive), ovary or pancreas (CK7, mesothelin, and CA125 positive but negative for estrogen receptor), and breast or ovary (estrogen receptor and CK7 positive but negative for GCDFP-15 and either mesothelin or CA125).

The results for these first round predictions are presented in Fig. 1D and E. The primary site was correctly predicted in 88% with the table and in 87% with the tree. The incorrect predictions could be divided into four main types. First, with both methods, errors were commonly between the tumor pairs above, where one site alone had been predicted. Second, where diagnosis relies on a single highly specific marker, such as TTF1 in lung, the few negative cases are misclassified. Third, some tumors simply fail to stain with any marker: using the decision tree, such tumors fall into the “bin” at the bottom (breast or stomach or pancreas). Except for the lack of immunohistochemical staining, these tumors did not seem histologically distinct from the majority: for example, they were not particularly poorly differentiated. Fourth, ovarian mucinous tumors are an oddity. They are well-recognized to express gastric, colonic, and/or pancreatic markers and so it is not surprising that their expression profile overlaps with those of gastrointestinal tumors (33).

Testing of classification scheme in an independent set of tumors. We then validated the diagnostic table and decision tree in a separate set of primary tumors and extended the analysis to metastatic tumors. Use of independent samples checks for “over-fitting” of the classification scheme to the original data. We constructed two tissue microarrays, containing 100 primary tumors and 30 paired metastases, from the seven main sites. The tissue microarrays were stained with the 10 markers, scored, and the primary site predicted using table and tree separately.

The results for these second round predictions are presented in Fig. 1F and G. The primary and metastatic tumor pairs had similar expression profiles, showing that our data from the primary tumors are applicable not only to other data sets but also to metastatic adenocarcinomas. Correct assignment was obtained again in 87% of primary and metastatic tumors with the table and in 89% with the tree: the figures for the metastases alone were 71% and 83%, respectively. Note that the metastatic set lacked tumors from the prostate, from which the primaries were always correctly classified: their absence alone lowers the success rate of the classifier. Most of the misclassified tumors were in pairs (i.e., a primary and its matched metastasis), with the incorrect predictions as described for the original data set. The immunohistochemical results from both rounds are combined in Fig. 1H.

Adenocarcinoma of unknown origin are a common clinical problem, so we are far from the first to address this diagnostic dilemma from a pathologic standpoint (811, 16, 19, 26, 34). However, over the past few years, a number of novel markers have emerged, including CK7, CK20, TTF1, CDX2, and mesothelin (12, 13, 17, 22, 23, 25). We believe that this is the first study to profile large numbers of these newer genes alongside established markers in a comprehensive manner, and to develop a rational classification scheme which has been successfully tested in an independent data set including metastatic tumor samples.

During this research, we performed both internal and external validation. The data were shown to be reproducible through: repetition of staining and scoring, staining by different scientists and methods in the first and second rounds, and by two assessors in the first round. The study design is comparable with the literature: the use of triplicate cores is regarded as standard (28), our numbers of missing or otherwise unassessable cores is low (3%; refs. 28, 31), and our scoring system is well established (16, 26, 31).

In general, our results agree with previous studies on the primary sites and individual markers tested here, as shown in Table 2 (8, 9, 12, 13, 16, 17, 1927, 3537), although for some markers (CK20 and CDX2), our sensitivities are at the lower end of the range. For CK20, most previous studies have used full tissue sections rather than tissue microarrays (13, 35, 36), and although there is generally concordance of staining between whole sections and tissue microarrays, expression levels of some genes are lower in tissue microarrays (31). Because the present classification is likely to be used primarily with small biopsy material, tissue microarrays arguably better reflect the situation in practice. For CDX2, Werling et al. (17) found widespread expression in the gut, with strong, uniform nuclear staining in colon, and weaker, more heterogeneous staining in stomach and pancreas, whereas we and others found that CDX2 was relatively specific for colon, with convincing staining absent from most gastric tumors, and in particular from all pancreatic adenocarcinomas (Fig. 1A and Table 2; ref. 25). This apparent discrepancy may simply indicate that our scoring criteria are more stringent, thus excluding heterogeneous, weak staining of non–colonic tumors, and illustrates the eternal difficulty of determining what threshold should be used for positivity (38).

Table 2.

Comparison of immunohistochemistry results from first round with the literature

Prostate-specific antigenTTF1GCDFP-15CDX2CK20CK7Estrogen receptorMesothelinCA125Lysozyme
Breast 0, 0 0, 0, 0 33, 52, 54, 62,67,72,74, 77 0, 0, 0 0, 0, 0, 0, 0, 4, 7, 19 70, 83, 89, 93, 93, 96 32, 33, 58, 60, 63, 73, 77 3, 3, 6 6, 13, 13, 13, 23, 24 14 
Colon 0, 0 0, 0, 0 0, 0, 0, 0, 0, 0, 0, 9 83, 90, 99 65, 68, 73, 84, 88, 92, 92, 93, 100 4, 5, 6, 7, 9, 16, 23 0, 0, 0, 0, 2, 13 2, 4, 22 0, 4, 4, 9, 10, 13 53 
Lung 0, 9 66, 75, 91 0, 0, 0, 0, 4, 4, 6 0, 0, 2 0, 2, 8, 9, 10, 10 91, 96, 100, 100 0, 0, 3, 9, 11 22, 24, 39 20, 20, 20, 35, 39 43 
Ovary  0, 0 0, 0 0, 0, 0, 0, 4, 4, 6 0, 0, 0 0, 0, 0, 4, 19 83, 89, 91, 100, 100 4, 12, 34, 50, 83 94, 95, 100 61, 63, 80, 89, 91, 96 0 
Pancreas 0, 0 0, 0, 2 0, 0, 0, 2 0, 0, 32 0, 19, 35, 39, 44, 62 87, 92, 95, 96 0, 0, 0 47, 75, 86 48, 48, 53 51 
Stomach 0, 3 0, 3 0, 0, 0, 3 18, 20, 70 18, 30, 41, 50, 51, 54, 68 35, 38, 51, 60, 71 0, 0, 2 10, 21, 29 7, 9, 11 38, 85 
Prostate 96, 100 0, 0, 11 0, 10, 15 0, 1, 4 0, 0, 0, 21 0, 0, 12, 29 10, 11 0, 0, 0 0, 2, 2 6 
Prostate-specific antigenTTF1GCDFP-15CDX2CK20CK7Estrogen receptorMesothelinCA125Lysozyme
Breast 0, 0 0, 0, 0 33, 52, 54, 62,67,72,74, 77 0, 0, 0 0, 0, 0, 0, 0, 4, 7, 19 70, 83, 89, 93, 93, 96 32, 33, 58, 60, 63, 73, 77 3, 3, 6 6, 13, 13, 13, 23, 24 14 
Colon 0, 0 0, 0, 0 0, 0, 0, 0, 0, 0, 0, 9 83, 90, 99 65, 68, 73, 84, 88, 92, 92, 93, 100 4, 5, 6, 7, 9, 16, 23 0, 0, 0, 0, 2, 13 2, 4, 22 0, 4, 4, 9, 10, 13 53 
Lung 0, 9 66, 75, 91 0, 0, 0, 0, 4, 4, 6 0, 0, 2 0, 2, 8, 9, 10, 10 91, 96, 100, 100 0, 0, 3, 9, 11 22, 24, 39 20, 20, 20, 35, 39 43 
Ovary  0, 0 0, 0 0, 0, 0, 0, 4, 4, 6 0, 0, 0 0, 0, 0, 4, 19 83, 89, 91, 100, 100 4, 12, 34, 50, 83 94, 95, 100 61, 63, 80, 89, 91, 96 0 
Pancreas 0, 0 0, 0, 2 0, 0, 0, 2 0, 0, 32 0, 19, 35, 39, 44, 62 87, 92, 95, 96 0, 0, 0 47, 75, 86 48, 48, 53 51 
Stomach 0, 3 0, 3 0, 0, 0, 3 18, 20, 70 18, 30, 41, 50, 51, 54, 68 35, 38, 51, 60, 71 0, 0, 2 10, 21, 29 7, 9, 11 38, 85 
Prostate 96, 100 0, 0, 11 0, 10, 15 0, 1, 4 0, 0, 0, 21 0, 0, 12, 29 10, 11 0, 0, 0 0, 2, 2 6 

NOTE: Data from the literature in regular font, from refs. (8, 9, 12, 13, 16, 17, 2027, 3537); our data in boldface. Diffuse staining taken as positive in refs. (13, 22, 23). Ovarian tumors are serous (or at least nonmucinous).

Prostate-specific antigen and TTF1 positivity in a few non–prostatic (mainly gynecologic) and non–lung tumors, respectively, was surprising, and emphasizes that the study was fully “blinded”. In fact, Dako's datasheet indicates that prostate-specific antigen is expressed outside the prostate, for example in endometrium. TTF1 positivity could represent a hitherto unrecognized neuroendocrine carcinoma component (12), but a more mundane explanation may apply: in the second round, for example, one tumor pair was submitted as pancreatic cancer with a lung metastasis, but both expressed TTF1, thus it is likely that the primary was in fact the lung cancer.

The use of decision trees is not yet established in pathology, but intuitively developed algorithms for this application have been described (8, 9). DeYoung et al. (8) emphasized that markers must be interpreted in a particular sequence for their predictive values to be maximized, so that the pathologist moves from the most specific to the least specific markers, as with our decision tree (Fig. 1C). In this way, as tumor sites are excluded, even relatively nonspecific markers further down the tree yield useful information (8).

In the development of any classification scheme, a figure of 88% correct in a data set with seven classes would be regarded as excellent. Clearly, this leaves some 12% of tumors which were not correctly classified. Likely reasons have already been described in Results, but note that, even in these cases, the list of likely primary sites has often been narrowed down. Importantly, our predictions were based purely on the numerical immunohistochemical scores, in isolation from both clinical information, such as gender, site of metastasis, and radiology, and the microscopic appearance of the tumor. In practice, both may be helpful. For example, on histology, the presence of “signet ring” cells suggests a gastric or breast origin, calcispherites suggest ovary and “dirty” necrosis suggests colon. The integration of such data may overcome some shortcomings of the present approach, for example, the difficulty in discriminating certain tumor pairs with immunohistochemistry alone. The question has been addressed by Sheahan et al. (5), who studied the contributions of morphology and clinical data to diagnosis. Two pathologists evaluated 100 metastatic adenocarcinomas from 10 primary sites. On morphology alone, the correct primary site was the first choice in 26% for both pathologists. Addition of gender and site of metastasis enabled the correct first choice in 50% and 55%, respectively (5).

These predictions improve with the use of immunohistochemistry, although surprisingly few comprehensive studies have been published. DeYoung et al. (8) studied over 2,800 epithelial malignancies (a wide range including germ cell tumors, mesotheliomas, and solid carcinomas) and found that a panel of 14 markers yielded the correct primary site in 67%. Brown et al. (26) studied metastatic adenocarcinomas from five sites (colon, breast, ovary, lung, and upper gastrointestinal tract). Using four markers (carcinoembryonic antigen, CA19-9, CA125, and BCA225), the primary site was correctly predicted in 66%; we have profiled three of these, but only CA125 was among the most informative. Lagendijk et al. (9) studied primary and metastatic adenocarcinomas from three sites (colon, breast, and ovary) and found that six markers (CK7, CK20, CA125, carcinoembryonic antigen, estrogen receptor, and GCDFP-15) yielded correct classification in 80% to 90%. Interestingly, our difficulty in first identifying and then separating out gastric and pancreaticobiliary cancers is a general experience (5, 26). In clinical practice, this is in fact not a significant problem. Diffuse (signet-ring cell) gastric cancers can be identified as such on histology, whereas the primary tumors of intestinal-type gastric cancers are generally visible at endoscopy; and if not already performed, radiologic assessment may also provide clarification. Thus, upper gastrointestinal and pancreatico-biliary cancers can generally be separated on clinical grounds once this differential diagnosis is raised on pathologic material.

It is important to establish that markers informative for the site of origin in a primary malignancy retain that utility in metastases. Our site-specific genes are essentially related to epithelial differentiation, rather than to tumor proliferation or invasion. For these markers, our current and previous data certainly suggest that the primary tumor and corresponding metastasis share similar profiles (14), and this has been the experience of others (9, 16). Clearly, further testing in biopsy samples from additional metastatic samples and from metastases of unknown origin would be worthwhile.

The dissemination of any diagnostic classification to other laboratories clearly raises challenges already being faced with testing for therapeutic markers such as estrogen receptor in breast cancer (28). Categorization of a marker as positive or negative is influenced by technical issues such as specimen fixation, tissue processing, and immunohistochemical conditions including antibody dilution and pretreatment, as well as by interpretative aspects including the observer's threshold for positivity and other variables (8, 35). Because the decision tree is based on marker specificity and has generalized well to unseen cases (in the second round), it should be reasonably robust for transfer to other centers, and it is flexible so that new markers may be added (8). Of course, in practice, any immunohistochemical classification should be interpreted in the appropriate histologic and clinical context.

Identification of the primary site in patients with metastatic adenocarcinoma is important for prognosis and rational therapy (24). The nihilism of the past has changed because of the emergence of relatively specific and increasingly effective chemotherapeutic regimens for metastatic adenocarcinomas from each site (39). Breast and prostate cancers may respond to hormonal manipulation, platinum- or taxane-based regimens are used with ovarian and lung tumors, 5-fluorouracil-based combinations are the mainstay of colorectal cancer management, and combination therapy with 5-fluorouracil and a platinum agent is the most widely used first-line regimen for gastric cancer. In contrast, gemcitabine alone is the treatment of choice for palliation of pancreatic cancer but would be regarded as suboptimal therapy for other tumors. Thus, prediction of the likely primary site enables the selection of a regimen optimal for antitumor activity, with its implications for palliation and possibly survival, whereas avoiding unnecessary toxicity.

For the investigation of patients with occult primary disease, it is generally accepted that immunohistochemical assessment of biopsy material is both useful and cost-effective, compared with exhaustive investigations using other modalities, predominantly radiology and endoscopy (24, 8, 39). We present here a validated approach to the immunohistochemical evaluation of small tumor samples that should enable the assignment of a likely site of origin for most patients with metastatic adenocarcinoma.

Grant support: Cancer Research UK through Departmental core funding, a Clinician Scientist Fellowship (K.A. Oien), and Program Grant (W.N. Keith); the University of Glasgow through the Florence Houston Bequest (J.L. Dennis); the European Union for an Access to Infrastructure Grant (K.A. Oien and J.L. Dennis through J. Komorowski); and Keystone Symposia for a Travel Grant (J.L. Dennis).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).

We thank Rod K. Ferrier for technical help with in situ hybridization, Jane Hair for searches of pathology records, and Drs. Euan Stronach for performing hierarchical clustering and Jeffry Evans for oncologic advice.

1
Nystrom JS, Weiner JM, Heffelfinger-Juttner J, Irwin LE, Bateman JR, Wolf RM. Metastatic and histologic presentations in unknown primary cancer.
Semin Oncol
1977
;
4
:
53
–8.
2
Hillen HFP. Unknown primary tumours.
Postgrad Med J
2000
;
76
:
690
–3.
3
Varadhachary GR, Abbruzzese JL, Lenzi R. Diagnostic strategies for unknown primary cancer.
Cancer
2004
;
100
:
1776
–85.
4
Pavlidis N, Briasoulis E, Hainsworth J, Greco FA. Diagnostic and therapeutic management of cancer of an unknown primary.
Eur J Cancer
2003
;
39
:
1990
–2005.
5
Sheahan K, O'Keane JC, Abramowitz A, et al. Metastatic adenocarcinoma of an unknown primary site. A comparison of the relative contributions of morphology, minimal essential clinical data and CEA immunostaining status.
Am J Clin Pathol
1993
;
99
:
729
–35.
6
Liotta LA, Petricoin E. Molecular profiling of human cancer.
Nat Rev Genet
2000
;
1
:
48
–56.
7
Jaffe ES. Hematopathology: integration of morphologic features and biologic markers for diagnosis.
Mod Pathol
1999
;
12
:
109
–15.
8
DeYoung BR, Wick MR. Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach.
Semin Diagn Pathol
2000
;
17
:
184
–93.
9
Lagendijk JH, Mullink H, van Diest PJ, Meijer GA, Meijer CJ. Immunohistochemical differentiation between primary adenocarcinomas of the ovary and ovarian metastases of colonic and breast origin. Comparison between a statistical and an intuitive approach.
J Clin Pathol
1999
;
52
:
283
–90.
10
Pecciarini L, Giulia Cangi M, Doglioni C. Identifying the primary sites of metastatic carcinoma: the increasing role of immunohistochemistry.
Curr Diagn Pathol
2001
;
7
:
168
–75.
11
Hammar SP. Metastatic adenocarcinoma of unknown primary origin.
Hum Pathol
1998
;
29
:
1393
–402.
12
Kaufmann O, Dietel M. Thyroid transcription factor-1 is the superior immunohistochemical marker for pulmonary adenocarcinomas and large cell carcinomas compared to surfactant proteins A and B.
Histopathology
2000
;
36
:
8
–16.
13
Tot T. Adenocarcinomas metastatic to the liver: the value of cytokeratins 20 and 7 in the search for unknown primary tumors.
Cancer
1999
;
85
:
171
–7.
14
Dennis JL, Vass JK, Wit EC, Keith WN, Oien KA. Identification from public data of molecular markers of adenocarcinoma characteristic of the site of origin.
Cancer Res
2002
;
62
:
5999
–6005.
15
Kononen J, Bubendorf L, Kallioniemi A, et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens.
Nat Med
1998
;
4
:
844
–7.
16
Perry A, Parisi JE, Kurtin PJ. Metastatic adenocarcinoma to the brain: an immunohistochemical approach.
Hum Pathol
1997
;
28
:
938
–43.
17
Werling RW, Yaziji H, Bacchi CE, Gown AM. CDX2, a highly sensitive and specific marker of adenocarcinomas of intestinal origin: an immunohistochemical survey of 476 primary and metastatic carcinomas.
Am J Surg Pathol
2003
;
27
:
303
–10.
18
Machado JC, Nogueira AM, Carneiro F, Reis CA, Sobrinho-Simoes M. Gastric carcinoma exhibits distinct types of cell differentiation: an immunohistochemical study of trefoil peptides (TFF1 and TFF2) and mucins (MUC1, MUC2, MUC5AC, and MUC6).
J Pathol
2000
;
190
:
437
–43.
19
Lagendijk JH, Mullink H, Van Diest PJ, Meijer GA, Meijer CJ. Tracing the origin of adenocarcinomas with unknown primary using immunohistochemistry: differential diagnosis between colonic and ovarian carcinomas as primary sites.
Hum Pathol
1998
;
29
:
491
–7.
20
Kaufmann O, Deidesheimer T, Muehlenberg M, Deicke P, Dietel M. Immunohistochemical differentiation of metastatic breast carcinomas from metastatic adenocarcinomas of other common primary sites.
Histopathology
1996
;
29
:
233
–40.
21
Loy TS, Quesenberry JT, Sharp SC. Distribution of CA 125 in adenocarcinomas. An immunohistochemical study of 481 cases.
Am J Clin Pathol
1992
;
98
:
175
–9.
22
Frierson HF Jr, Moskaluk CA, Powell SM, et al. Large-scale molecular and tissue microarray analysis of mesothelin expression in common human carcinomas.
Hum Pathol
2003
;
34
:
605
–9.
23
Ordonez NG. Application of mesothelin immunostaining in tumor diagnosis.
Am J Surg Pathol
2003
;
27
:
1418
–28.
24
Wick MR, Lillemoe TJ, Copland GT, Swanson PE, Manivel JC, Kiang DT. Gross cystic disease fluid protein-15 as a marker for breast cancer: immunohistochemical analysis of 690 human neoplasms and comparison with α-lactalbumin.
Hum Pathol
1989
;
20
:
281
–7.
25
Moskaluk CA, Zhang H, Powell SM, Cerilli LA, Hampton GM, Frierson HF Jr. Cdx2 protein expression in normal and malignant human tissues: an immunohistochemical survey using tissue microarrays.
Mod Pathol
2003
;
16
:
913
–9.
26
Brown RW, Campagna LB, Dunn JK, Cagle PT. Immunohistochemical identification of tumor markers in metastatic adenocarcinoma. A diagnostic adjunct in the determination of primary site.
Am J Clin Pathol
1997
;
107
:
12
–9.
27
Zamecnik J, Kodet R. Value of thyroid transcription factor-1 and surfactant apoprotein A in the differential diagnosis of pulmonary carcinomas: a study of 109 cases.
Virchows Arch
2002
;
440
:
353
–61.
28
Camp RL, Charette LA, Rimm DL. Validation of tissue microarray technology in breast carcinoma.
Lab Invest
2000
;
80
:
1943
–9.
29
Doughty SE, Ferrier RK, Hillan KJ, Jackson DG. The effects of ZENECA ZD8731, an angiotensin II antagonist, on renin expression by juxtaglomerular cells in the rat: comparison of protein and mRNA expression as detected by immunohistochemistry and in situ hybridization.
Toxicol Pathol
1995
;
23
:
256
–61.
30
Hsu FD, Nielsen TO, Alkushi A, et al. Tissue microarrays are an effective quality assurance tool for diagnostic immunohistochemistry.
Mod Pathol
2002
;
15
:
1374
–80.
31
Swierczynski SL, Maitra A, Abraham SC, et al. Analysis of novel tumor markers in pancreatic and biliary carcinomas using tissue microarrays.
Hum Pathol
2004
;
35
:
357
–66.
32
Hvidsten TR, Laegreid A, Komorowski J. Learning rule-based models of biological process from gene expression time profiles using gene ontology.
Bioinformatics
2003
;
19
:
1116
–23.
33
Tenti P, Aguzzi A, Riva C, et al. Ovarian mucinous tumors frequently express markers of gastric, intestinal, and pancreatobiliary epithelial cells.
Cancer
1992
;
69
:
2131
–42.
34
Ellis IO, Hitchcock A. Tumour marker immunoreactivity in adenocarcinoma.
J Clin Pathol
1988
;
41
:
1064
–7.
35
Tot T. Cytokeratins 20 and 7 as biomarkers: usefulness in discriminating primary from metastatic adenocarcinoma.
Eur J Cancer
2002
;
38
:
758
–63.
36
Chu P, Wu E, Weiss LM. Cytokeratin 7 and cytokeratin 20 expression in epithelial neoplasms: a survey of 435 cases.
Mod Pathol
2000
;
13
:
962
–72.
37
Park SY, Kim HS, Hong EK, Kim WH. Expression of cytokeratins 7 and 20 in primary carcinomas of the stomach and colorectum and their value in the differential diagnosis of metastatic carcinomas to the ovary.
Hum Pathol
2002
;
33
:
1078
–85.
38
Goldstein NS, Bassi D. Cytokeratins 7, 17, and 20 reactivity in pancreatic and ampulla of vater adenocarcinomas. Percentage of positivity and distribution is affected by the cut-point threshold.
Am J Clin Pathol
2001
;
115
:
695
–702.
39
Mintzer DM, Warhol M, Martin AM, Greene G. Cancer of unknown primary: changing approaches. A multidisciplinary case presentation from the Joan Karnell cancer center of Pennsylvania hospital.
Oncologist
2004
;
9
:
330
–8.

Supplementary data