Abstract
Background: Although tissue microarray (TMA) studies of histopathologic material have been frequently reported in studies of malignant diseases, the question of sample size (i.e., the diameter and the number of tissue cylinders investigated) has been rarely discussed. This study addresses the methodologic question of sample size in a variety of tumor types.
Material and Methods: Material from 29 cases of lung carcinoma (small cell, squamous cell, and adenocarcinomas) was examined immunohistochemically for Ki-67 and p53 expression in virtually constructed cylinders of different diameters. The influence of tissue sample size (i.e., different numbers of virtual cylinders) was also investigated. Results from Ki-67 evaluation were analyzed as a continuous variable, whereas p53 expression was scored. p53 evaluations based on scoring in cylinders versus scoring of whole sections were also compared. Furthermore, 10 cases of endometrial and breast carcinomas were evaluated for estrogen receptor, Ki-67, and HER2 by scoring up to five cylinders.
Results and Conclusions: Tissue cylinders of 0.6 and 1.0 mm diameters were compared and found equally informative about Ki-67 expression (intraclass correlation, 0.96). A statistical approach considering intraindividual and interindividual variation data is presented, indicating that in this specific setting three cylinders per case is an adequate sample size for TMA studies. Further sampling yields only a small gain in accuracy as determined by Ki-67 quantification and p53 scoring (κ-coefficient, 0.9). For endometrial and breast tissues, TMA scoring of three cylinders yielded excellent agreement (κ, >0.75) compared with whole-section scoring. (Cancer Epidemiol Biomarkers Prev 2009;18(7):2014–21)
Introduction
High-throughput techniques for molecular studies related to carcinogenesis, prognosis, or therapy in cancer have been introduced in recent years. These methods include assays at both the genomic and proteomic levels. One technique is tissue microarray (TMA), which makes it possible to study histopathologic material from a large number of different tissue samples within a limited experimental setting. TMA analysis can thus be done both on historical material with long follow-up and as part of ongoing clinical studies.
Twenty years ago, Battifora (1) described a technique that facilitated putting several tissue samples together in one tissue block, the multitumor (sausage) tissue block. However, the tissue samples were not placed in an orderly way and, therefore, finding a specific tissue in the block became difficult or even impossible. Consequently, this technique has mostly been used to construct control tissue blocks for immunohistochemistry (2). In 1998, Kononen et al. (3) presented the TMA technique, in which tissue cylinders (diameter, 0.6-4.0 mm) are punched from a donor block and inserted into a recipient block made of paraffin wax. The cylinders are arranged in a systematic pattern and are therefore easy to identify. The TMA technology has greatly facilitated retrospective studies of formalin-fixed and paraffin-embedded tissues. Large sets of tissues can now be put together in the same block and subjected to the same laboratory treatment, thus minimizing batch-to-batch variability and making the analysis less time-consuming and more cost-effective (4, 5). However, one should keep in mind that TMA is not intended for individual clinical diagnosis, tumor classifications, or grading within studies but has been designed to facilitate biomarker studies in large tissue materials (4).
A frequently voiced concern (6) is whether or not these small cylinders can be representative of large sets of tumor material. One crucial part of the TMA method is the careful selection of the tumor segment to be punched to ensure that the sample accurately represents tumor heterogeneity. Another question is whether or not the small amount of material in the tissue cylinder analyzed by TMA is enough to be representative of the whole tumor. Increasing the diameter of the tissue cylinder may do more damage to the donor tissue without substantially increasing the likelihood of detecting tumor heterogeneity. The solution may be to array several small cylinders from the donor tissue.
Heterogeneity and the number of cylinders needed to get a reliable result have been studied for several types of tumors (5, 7-9), resulting in a recommended range from one to four cylinders per specimen. To our knowledge, neither non–small cell lung carcinomas nor small cell lung carcinomas have been extensively studied in this context. Almost 50% of non–small cell lung carcinomas exhibit more than 1 of the major histologic types, namely squamous cell carcinoma and adenocarcinoma (10). In view of the heterogeneity of lung cancer tumors, it is vital to take extra precautions when constructing the TMA blocks in lung cancer research concerned with pathogenetic or prognostic factors and markers related to therapies.
Another factor to consider is the extent of the expression of antigens to be studied as well as the evaluation system. Proliferation (i.e., Ki-67 expression) is determined either as a continuous variable (7) or scored in categories (11). For other markers, such as p53 protein, human epidermal growth factor receptor 2 (HER-2), and estrogen receptor, a semiquantitative two-grade or three-grade scoring system is mainly used (12). p53, in particular, is a suitable model for methodologic studies of immunohistochemistry in lung carcinomas. In about 50% of lung neoplasms, the p53 gene is mutated, and an accumulation of mutant p53 protein is readily detected by immunohistochemistry (10). In the same way, a proliferation marker, such as Ki-67, is suitable due to the differences of proliferation activity within and between the different histologic types of lung carcinomas (10). A few other tissues and markers have also been included in the study (i.e., HER-2 and estrogen receptor), which are established biomarkers in endometrial and breast carcinomas.
The aim of this study was to investigate the diameter and number of cylinders needed to obtain representative histopathologic material. This approach was addressed by using virtually constructed tissue cylinders from different tumor material, with emphasis on material from small cell lung carcinomas as well as non–small cell lung carcinomas.
Materials and Methods
Lung Cancer Specimens
Thirty cases, 10 of small cell carcinoma, 10 of primary lung adenocarcinoma, and 10 of squamous cell carcinoma, were included in the study. The tumor specimens were obtained during major surgical procedures, such as lobectomies and pneumonectomies. One tissue block per case was selected, the block with the largest tumor area. Non–small cell lung carcinoma specimens were consecutively collected during 2002 to 2004 from the pathology archives at the Department of Laboratory Medicine, Örebro University Hospital, Sweden. All available cases of small cell lung carcinoma specimens from 1987 to 2006 were identified. One case had to be excluded due to poor tissue morphology; no further cases from major surgery was available. Before analysis the materials were anonymized.
Breast and Endometrial Carcinoma Specimens
Ten cases of breast carcinomas from 2007 to 2008 were selected based on HercepTest scores in the original histopathologic report and were reclassified (M.K.). Ten consecutive cases of endometrial curettages with adenocarcinoma were also included.
Immunohistochemistry
The tumors were originally formalin-fixed and paraffin-embedded, and had thereafter been stored at room temperature according to routine laboratory procedures.
Four-micrometer sections were cut onto Dako ChemMate capillary gap microscope slides (Dako) and placed at 60°C for 1 h. The slides were then subjected to immunohistochemistry as follows. After initial deparaffinization, antigen retrieval was carried out by microwaving the slides in a Tris-EDTA buffer (pH 9) at 650 W for 30 min. The immunohistochemistry was done with the use of a ChemMate Dako EnVision Detection kit (Dako) according to the manufacturer's instructions, with primary antibody incubation for 25 min at room temperature, [Ki-67, clone mib-1, 1:400; p53, clone DO-7, 1:700 (both from Dako); ER clone SP1 from NeoMarkers]. HercepTests were done according to the manufacturer's protocol (Dako). p53 was done on lung cancer specimens, Ki-67 on lung cancer and endometrial specimens, estrogen receptor on endometrial and breast cancer specimens, and HER-2 on breast cancer specimens. Negative control slides were prepared by substituting the primary antibody for Dako ChemMate antibody diluent (Dako). Positive control sections were included.
Delineation of Tumor Extension
The delineation of the tumor area on each slide was carried out by a histopathologist (M.K.) on H&E-stained slides; this marking was then transferred to immunohistochemically stained slides to avoid measurement on normal tissue, intratumor hemorrhage, or necrotic areas. Within this marked area, five scattered spots were selected to represent both central and border parts of the tumor, mimicking the standard procedure for TMA construction (13).
Image Acquisition
With the use of the Leica QWin system (Leica), pictures were taken from the center of the spot to the periphery until areas corresponding to cylinders of 0.6 and 1.0 mm diameters had been collected. In total, 9 pictures per spot were obtained. This approach was used for the lung cancer specimens and the Ki-67 counting (see below).
For the scored markers (p53, estrogen receptor, HER-2, and Ki-67 on lung, breast, and endometrial tumors), one high-resolution picture corresponding to 0.6-mm diameter per spot was obtained as described above; thus, five pictures per case were procured.
Ki-67 Counting on Lung Cancer Specimens
Positive and negative cells were counted manually in the corresponding digitized color pictures. For each area (0.6 and 1.0 mm diameters), the percentage of positive cells was determined.
Each of the picture series, corresponding to a cylinder, was coded and analyzed without knowledge of the other picture series from the same slide.
Scoring of Immunohistochemistry
The p53 scoring on the coded picture series from lung tumor specimens, as described above, was done on digitized pictures corresponding to a cylinder of 0.6-mm diameter. Subsequently, a whole-section scoring was done by a microscopic evaluation of the whole section on a separate occasion. p53 positivity was scored as <10%, 10% to 30%, and >30% (12).
Ki-67 and estrogen receptor in breast and endometrial carcinomas were scored in a similar manner as the p53 scoring mentioned above. The cutoff values for the different markers were: <10%, 10% to 20%, and >20% for Ki-67; and <10% and ≥10% for estrogen receptor.4
Slides stained with HercepTest were scored according to the manufacturer's recommendations (0-1 as negative, 2 as weakly positive, and 3 as strongly positive) on the coded picture series as mentioned above.5
All immunohistochemistry was counted or scored by C.K. except for the HercepTest, which was scored by M.K.Statistics
Agreement between the percentage of positive cells found from 0.6 mm cylinders and 1.0 mm cylinders was estimated by the intraclass correlation (14) and by Bland-Altman plots (15). The intraclass correlation is the preferred correlation coefficient for measuring agreement because it considers both the linear association between the measurements as well as their agreement. High values, close to 1.0, are optimal. The Bland-Altman plot is a suitable graphic device to show the differences between the pair-wise readings in relation to the mean of the pair-wise readings. The plot indicates if systematic deviations between the 0.6-mm and the 1.0-mm readings are present and, in addition, shows the limits of agreement [mean of difference ± 2 × SD (difference)] according to Bland and Altman. Both the intraclass correlation and the Bland-Altman plots were applied to all tumors together and then to each histologic type. Variations in measurements between subjects and within subjects were calculated and reported as SDs.
The ratio of within-subject SD divided by the mean value estimated the coefficient of variation. Finally, some guidelines for choosing the optimal number of cylinders to be used in an applied setting were given by comparing 2 × SEM (i.e., the approximate length of a 95% confidence interval for the mean) with different precision criteria, both absolute and relative. SEM was given by the within-subject SD divided by the square root of the number of cylinders. We did calculations in which the number of cores was increased from 1 to 10. The relative precision criterion is suggested to be a fraction of the mean (16). The comparison of 2 × SEM with the precision criteria is visualized in a table and a figure for all tumors together as well as for subtypes.
p53 was evaluated with weighted κ-statistics with the use of quadratic weights (17, 18) to compare the scoring of different numbers of cylinders versus whole-section scoring. A κ-coefficient of 0.40 to 0.75 indicates a fair-to-good agreement, whereas values >0.75 are considered to be excellent (17). All possible series of data for one to five cylinders were compared with whole-section scoring. In case of even numbers of observations of two categories, the highest score was applied. When divergent scoring results were obtained within a series of three or more observations, the most frequent score was applied to the series.
Tumors from the breast and endometrial carcinomas with markers for Ki-67 and estrogen receptor were evaluated as p53. Data from HercepTest scoring was treated in the same way as above. To mimic diagnostic scoring,6
in which a single area of intense staining may determine the score for the whole section, a classification in which the highest score in a series would determine the score was also included. This approach was also evaluated by κ-statistics.Results
Lung cancer
Cylinder Diameter. A total of 45 to 50 observations per type of tumor were recorded. The intraclass correlation for the percentage of Ki-67 positive cells between observations for the 0.6-mm-diameter and 1.0-mm-diameter cylinders was 0.96 for the material as a whole, which can be regarded as a very high value. The possible influence of the histopathologic type of tumor was also studied. Table 1 shows the mean proliferation percentage as well as correlation coefficient for each tumor type, and the overall values for all cylinders of 0.6 and 1.0 mm diameters, respectively. Thus, the histopathologic tumor type did not influence the correlation between cylinders of 0.6 and 1.0 mm diameters.
Subjects . | No. of measured pairs (n) . | Cylinder diameter . | Mean . | ICC (between 0.6 and 1.0 mm) . | SD (All measurements) . | Between-subjects SD . | Within-subject SD (SD between cylinders) . | CV* . |
---|---|---|---|---|---|---|---|---|
All | 145 | 0.6 | 25.12 | 15.43 | 13.36 | 7.61 | 30.3% | |
0.962 | ||||||||
145 | 1.0 | 24.64 | 15.35 | 13.81 | 6.59 | 26.7% | ||
ADCA | 50 | 0.6 | 13.63 | 10.42 | 8.07 | 6.41 | 47.0% | |
0.923 | ||||||||
50 | 1.0 | 11.78 | 8.74 | 6.81 | 5.35 | 45.4% | ||
SCC | 50 | 0.6 | 29.00 | 16.04 | 13.49 | 8.37 | 28.9% | |
0.961 | ||||||||
50 | 1.0 | 28.96 | 15.85 | 13.71 | 7.63 | 26.3% | ||
SCLC | 45 | 0.6 | 33.58 | 11.45 | 8.08 | 7.93 | 23.6% | |
0.913 | ||||||||
45 | 1.0 | 34.12 | 10.26 | 7.71 | 6.60 | 19.3% |
Subjects . | No. of measured pairs (n) . | Cylinder diameter . | Mean . | ICC (between 0.6 and 1.0 mm) . | SD (All measurements) . | Between-subjects SD . | Within-subject SD (SD between cylinders) . | CV* . |
---|---|---|---|---|---|---|---|---|
All | 145 | 0.6 | 25.12 | 15.43 | 13.36 | 7.61 | 30.3% | |
0.962 | ||||||||
145 | 1.0 | 24.64 | 15.35 | 13.81 | 6.59 | 26.7% | ||
ADCA | 50 | 0.6 | 13.63 | 10.42 | 8.07 | 6.41 | 47.0% | |
0.923 | ||||||||
50 | 1.0 | 11.78 | 8.74 | 6.81 | 5.35 | 45.4% | ||
SCC | 50 | 0.6 | 29.00 | 16.04 | 13.49 | 8.37 | 28.9% | |
0.961 | ||||||||
50 | 1.0 | 28.96 | 15.85 | 13.71 | 7.63 | 26.3% | ||
SCLC | 45 | 0.6 | 33.58 | 11.45 | 8.08 | 7.93 | 23.6% | |
0.913 | ||||||||
45 | 1.0 | 34.12 | 10.26 | 7.71 | 6.60 | 19.3% |
NOTE: Variation in measurements for all measurements and divided into between-subjects and within-subject variations. SDs and CV (within-subject variation) are shown.
Abbreviations: ICC, intraclass correlation; CV, coefficient of variation; ADCA, adenocarcinoma; SCC, squamous cell carcinoma; SCLC, small cell lung carcinoma.
CV = within-subject SD/mean × 100.
Another way to analyze the influence of cylinder diameter in the overall setting is to determine the relative influence of within-subject variation. The coefficient of variation was similar for the 0.6 and 1.0 mm observations, with values ranging from 19.3% (small cell lung cancer, 1.0 mm diameter) to 47.0% (adenocarcinoma, 0.6 mm diameter); data shown in Table 1. The differences between the histologic subtypes are large, with adenocarcinoma showing substantially higher values for the coefficient of variation than the two other subtypes.
To further analyze the possibility of a systematic difference between the 0.6-mm-diameter and 1.0-mm-diameter observations, a Bland-Altman plot was constructed. Figure 1 shows a small average difference of 0.5% between the observations from cylinders of different diameters for the overall material, and there is no indication of a systematically changing difference with average size. The limits of agreement are -8.1% to 9.1%, which are reasonably small as they cover ∼95% of the differences.
Together these data show that cylinders with a diameter of 0.6 mm are, with only a small margin, as informative and reliable as cylinders with a 1.0-mm diameter. For all tumor types as well as for the overall material, the between-subject variation exceeded the within-subject variation.
Number of Cylinders. Table 2 shows the influence of the number of cylinders on the length of a 95% confidence interval for the mean expressed as 2 × SEM. The data reveals that an increased number of cylinders results in a decreased SEM; the major change in SEM already occurring between observations based on one and two cylinders. To facilitate the choice of a suitable number of cylinders to achieve a specified precision, the data from Table 2, with extended values for 6 to 10 cylinders, are given in Fig. 2. The figure, a simplified nomogram, can be used both for absolute and relative precision criteria. For an absolute criterion in which the length of the 95% confidence interval for the mean of c cylinders should not exceed the value given by the criterion, say 8 units, the horizontal line at 8 gives the sufficient number of cylinders, c, from the intersection with the polygon for the SEM of the tumor type, (see the figure for an explanation). For adenocarcinoma, this precision is accomplished with three cylinders; for all tumors, with four cylinders. If a relative precision criterion is used (e.g., the fraction of the mean), the same principle could be applied by interpolation between the horizontal lines. For example, one third of the mean of all tumors gives 8.37 as the precision, and the sufficient number of cylinders will be three.
Subjects and type of cylinder . | 2 × SEM for 1 cylinder . | 2 × SEM for 2 cylinders . | 2 × SEM for 3 cylinders . | 2 × SEM for 4 cylinders . | 2 × SEM for 5 cylinders . | Mean . |
---|---|---|---|---|---|---|
All, 0.6 mm | 15.21 | 10.76 | 8.78 | 7.61 | 6.80 | 25.12 |
All, 1.0 mm | 13.19 | 9.32 | 7.61 | 6.59 | 5.90 | 24.64 |
ADCA, 0.6 mm | 12.83 | 9.07 | 7.41 | 6.41 | 5.74 | 13.63 |
ADCA, 1.0 mm | 10.70 | 7.57 | 6.18 | 5.35 | 4.79 | 11.78 |
SCC, 0.6 mm | 16.73 | 11.83 | 9.66 | 8.37 | 7.48 | 29.00 |
SCC, 1.0 mm | 15.26 | 10.79 | 8.81 | 7.63 | 6.83 | 28.96 |
SCLC, 0.6 mm | 15.85 | 11.21 | 9.15 | 7.93 | 7.09 | 33.58 |
SCLC, 1.0 mm | 13.20 | 9.33 | 7.62 | 6.60 | 5.90 | 34.12 |
Subjects and type of cylinder . | 2 × SEM for 1 cylinder . | 2 × SEM for 2 cylinders . | 2 × SEM for 3 cylinders . | 2 × SEM for 4 cylinders . | 2 × SEM for 5 cylinders . | Mean . |
---|---|---|---|---|---|---|
All, 0.6 mm | 15.21 | 10.76 | 8.78 | 7.61 | 6.80 | 25.12 |
All, 1.0 mm | 13.19 | 9.32 | 7.61 | 6.59 | 5.90 | 24.64 |
ADCA, 0.6 mm | 12.83 | 9.07 | 7.41 | 6.41 | 5.74 | 13.63 |
ADCA, 1.0 mm | 10.70 | 7.57 | 6.18 | 5.35 | 4.79 | 11.78 |
SCC, 0.6 mm | 16.73 | 11.83 | 9.66 | 8.37 | 7.48 | 29.00 |
SCC, 1.0 mm | 15.26 | 10.79 | 8.81 | 7.63 | 6.83 | 28.96 |
SCLC, 0.6 mm | 15.85 | 11.21 | 9.15 | 7.93 | 7.09 | 33.58 |
SCLC, 1.0 mm | 13.20 | 9.33 | 7.62 | 6.60 | 5.90 | 34.12 |
NOTE: The data in this table are utilized in the nomogram of Fig. 2.
*An approximate 95% confidence interval for the mean of cylinders.
Scoring of p53. p53 scoring showed an almost perfect correlation between whole-section scoring and scoring of cylinders as seen already with the inclusion of just one cylinder (Table 3). With the use of two cylinders, a single mismatch decreases the statistical representativity of the method. When three or more cylinders are included, there is almost total agreement between the scoring of cylinders and that of whole sections.
. | % of positive cells . | Whole section . | 1 cylinder . | 2 cylinders . | 3 cylinders . | 4 cylinders . | 5 cylinders . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lung tumors | ||||||||||||||
p53 | >30 | 52% | 52% | 53% | 52% | 52% | 52% | |||||||
10-30 | 17% | 17% | 16% | 16% | 17% | 17% | ||||||||
<10 | 31% | 31% | 31% | 32% | 31% | 31% | ||||||||
n | 29 | 145 | 290 | 290 | 145 | 29 | ||||||||
κ | 0.96 | 0.95 | 1.0 | 1.0 | 1.0 | |||||||||
Endometrial carcinomas | ||||||||||||||
ER | ≥10 | 100% | 100% | 100% | 100% | 100% | 100% | |||||||
<10 | 0% | 0% | 0% | 0% | 0% | 0% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | |||||||||
Ki-67 | >20 | 70% | 72% | 85% | 77% | 86% | 80% | |||||||
10-20 | 20% | 24% | 14% | 10% | 14% | 20% | ||||||||
<10 | 10% | 4% | 1% | 3% | 0% | 0% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.61 | 0.57 | 0.64 | 0.59 | 0.69 | |||||||||
Breast carcinomas | ||||||||||||||
ER | ≥10 | 80% | 82% | 82% | 80% | 80% | 80% | |||||||
<10 | 20% | 18% | 18% | 20% | 20% | 20% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.94 | 0.94 | 1.0 | 1.0 | 1.0 | |||||||||
Ki-67 | >20 | 50% | 40% | 50% | 50% | 50% | 50% | |||||||
10-20 | 40% | 24% | 29% | 26% | 30% | 30% | ||||||||
<10 | 10% | 26% | 21% | 24% | 20% | 20% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.85 | 0.90 | 0.88 | 0.91 | 0.91 | |||||||||
HER-2 | Score 0-1 | 40% | 48% | 38% | 46% | 40% | 40% | |||||||
Score 2 | 40% | 36% | 43% | 37% | 40% | 40% | ||||||||
Score 3 | 20% | 16% | 19% | 17% | 20% | 20% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.78 | 0.85 | 0.79 | 0.82 | 0.82 | |||||||||
HER-2 (highest score applied) | Score 0-1 | 40% | 48% | 38% | 34% | 30% | 30% | |||||||
Score 2 | 40% | 36% | 43% | 46% | 50% | 50% | ||||||||
Score 3 | 20% | 16% | 19% | 20% | 20% | 20% | ||||||||
n | 10 | 50 | 50 | 100 | 100 | 10 | ||||||||
κ | 0.78 | 0.78 | 0.87 | 0.87 | 0.90 | 0.91 |
. | % of positive cells . | Whole section . | 1 cylinder . | 2 cylinders . | 3 cylinders . | 4 cylinders . | 5 cylinders . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lung tumors | ||||||||||||||
p53 | >30 | 52% | 52% | 53% | 52% | 52% | 52% | |||||||
10-30 | 17% | 17% | 16% | 16% | 17% | 17% | ||||||||
<10 | 31% | 31% | 31% | 32% | 31% | 31% | ||||||||
n | 29 | 145 | 290 | 290 | 145 | 29 | ||||||||
κ | 0.96 | 0.95 | 1.0 | 1.0 | 1.0 | |||||||||
Endometrial carcinomas | ||||||||||||||
ER | ≥10 | 100% | 100% | 100% | 100% | 100% | 100% | |||||||
<10 | 0% | 0% | 0% | 0% | 0% | 0% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | |||||||||
Ki-67 | >20 | 70% | 72% | 85% | 77% | 86% | 80% | |||||||
10-20 | 20% | 24% | 14% | 10% | 14% | 20% | ||||||||
<10 | 10% | 4% | 1% | 3% | 0% | 0% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.61 | 0.57 | 0.64 | 0.59 | 0.69 | |||||||||
Breast carcinomas | ||||||||||||||
ER | ≥10 | 80% | 82% | 82% | 80% | 80% | 80% | |||||||
<10 | 20% | 18% | 18% | 20% | 20% | 20% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.94 | 0.94 | 1.0 | 1.0 | 1.0 | |||||||||
Ki-67 | >20 | 50% | 40% | 50% | 50% | 50% | 50% | |||||||
10-20 | 40% | 24% | 29% | 26% | 30% | 30% | ||||||||
<10 | 10% | 26% | 21% | 24% | 20% | 20% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.85 | 0.90 | 0.88 | 0.91 | 0.91 | |||||||||
HER-2 | Score 0-1 | 40% | 48% | 38% | 46% | 40% | 40% | |||||||
Score 2 | 40% | 36% | 43% | 37% | 40% | 40% | ||||||||
Score 3 | 20% | 16% | 19% | 17% | 20% | 20% | ||||||||
n | 10 | 50 | 100 | 100 | 50 | 10 | ||||||||
κ | 0.78 | 0.85 | 0.79 | 0.82 | 0.82 | |||||||||
HER-2 (highest score applied) | Score 0-1 | 40% | 48% | 38% | 34% | 30% | 30% | |||||||
Score 2 | 40% | 36% | 43% | 46% | 50% | 50% | ||||||||
Score 3 | 20% | 16% | 19% | 20% | 20% | 20% | ||||||||
n | 10 | 50 | 50 | 100 | 100 | 10 | ||||||||
κ | 0.78 | 0.78 | 0.87 | 0.87 | 0.90 | 0.91 |
NOTE: Whole-section scoring in which the data show the number of cases studied (n), and the percentage indicates the proportion of cases classified in each category. The corresponding data for one to five cylinders is also given, in which n is the total number of simulated observations for possible combinations of cylinders within the respective case, and the percentage value indicates the proportion of the simulated observations categorized in the respective category. The weighted κ-value indicates the level of agreement between whole-section scoring and the simulated series of observations within each case studied.
Abbreviation: ER, estrogen receptor.
Breast and Endometrial Carcinoma Specimens
Ki-67 Scoring. In those tissues, Ki-67 was determined in mib-1–stained slides as described above. Scoring was done in three classes (<10%, 10%-20%, and >20% positive cells). Whole-section scoring was compared with scoring of one to five cylinders per case. Data for all possible combinations of cylinders were generated and compared with whole-section scoring (Table 3), and the agreement was evaluated by weighted κ-statistics. The agreement in endometrial carcinomas was good (κ, 0.57-0.69) and excellent in breast carcinomas (κ, 0.85-0.91). An uneven number of cylinders yielded in the endometrial carcinoma setting slightly better κ-values.
Estrogen Receptor Scoring. Estrogen receptor positivity was scored in two classes: negative, <10% positivity; and positive, ≥10% positivity, in tumor cell nuclei. Scoring between cylinders and whole sections were compared as above (Table 3). For both tissues studied, the κ-values showed excellent agreement.
HER-2 Scoring. Whole-section scoring of HercepTest showed four negative cases (score 0-1), two weakly positive cases (score 2), and four strongly positive cases (score 3).
Discussion
The present study addresses the methodologic aspects of TMA construction. In lung cancer, the tissue material that is available for research differs due to divergent clinical handling of the various types of tumors. In non–small cell lung cancer, at least in cases of localized disease, specimens from lung cancer surgery are available, whereas in the case of small cell lung cancer, only diagnostic biopsies are collected before pharmacologic treatment is started. It is thus of importance to establish guidelines for whether and how TMA studies should be done on different types of specimens from lung cancer. For studies in which a limited amount of tissue is available, it is important to establish the smallest amount of tissue that can be regarded as representative for use in TMA, although this approach should not be considered for diagnostic purposes. Furthermore, because TMA construction could be a part of present and future clinical studies of new drugs for the treatment of lung cancer, the methodologic aspect on TMA-based research is of utmost concern.
Only a few previous studies have addressed the issue of tumor sampling and representativity of TMA in lung cancer. Biomarkers and whole-tumor sampling have been studied in non–small cell lung cancer. Biomarker expression within a single block seems to be representative of the whole tumor when compared with extensive cytologic sampling of the whole-tumor volume (19). TMA sampling of blocks have been reported for a few markers. Choi et al. (20) reported an excellent correlation between three-grade scoring of Fascin expression in a whole section and a 2-mm cylinder in lung adenocarcinomas. In another study, c-kit expression in small cell lung cancer has been compared with the use of whole-section scoring; that is, determinations of the percentage of immunopositive cells within 10 microscopic high-power fields and TMA of up to four cylinders (21). When expression of c-kit was determined as negative or positive (>10% of tumor cells), the TMA analysis showed a high specificity for positive results but a rather low sensitivity compared with whole-section scoring. When c-kit expression was analyzed as a continuous variable, no correlation could be shown between TMA and whole section (21). However, c-kit analysis by immunohistochemistry may by itself have added further methodologic problems to the study beyond those posed by uncertainty about TMA (22). The present data show that tissue cylinders of 0.6 mm and 1.0 mm diameters give similar representativity in terms of the heterogeneity of the tissue. This observation is important in the context of TMA construction from small tissue samples; for example, biopsy tissue of small cell lung cancer. Because tumor heterogeneity is more frequent in non–small cell lung cancer than in small cell lung cancer (10), one possibility could be to further increase the cylinder size above 1.0 mm. However, our study did not address this possibility because such an approach would be more harmful to the tissue and also would not be possible to use on biopsy material.
Another approach to address the issue of tumor heterogeneity would be to take more cylinders from each tumor. Although this might solve one problem, the sole purpose of TMA is to be able to take a limited amount of sample and put the samples together in as few blocks as possible. This aim is not fulfilled if too many cylinders are taken out of each tumor. A number of studies have addressed the issue of how many cylinders are needed to give a representative material in a number of different tumors. Hoos et al. (7) found that three cylinders of 0.6-mm diameter were sufficient for human fibroblastic tumors, whereas Rubin et al. (8) determined that prostate tumors needed four cylinders, and Rosen et al. (5) reported that two cylinders of 1.0-mm diameter were representative for ovarian tumors. Zhang et al. (9) used a single cylinder with a 1.0-mm diameter for the detection of estrogen receptor, progesterone receptor, and HER-2 in breast tissues. The variation in the number and size of cylinders found to be optimal in these various studies highlights the importance of evaluating how much material is needed for the specific tumor type of interest. In lung cancer, our results indicate that three cylinders of each tumor would be included in TMA studies to fulfill a precision criterion of a reasonable practical value; that is, mean/3. If a continuous evaluation model is to be applied, such as in the case of proliferation markers, more extensive sampling is needed than if scored markers, such as p53, are used. The agreement in scores as determined by the κ-coefficient values shows excellent agreement. However, it may be an advantage to score an uneven number of cylinders for each case due to the semiquantitative nature of the data set in accordance with other reports (23).
We have furthermore expanded the semiquantitative study to include another two malignancies: endometrial and breast carcinomas. Scoring of estrogen receptor in endometrial carcinomas showed, in agreement with others (23), excellent correlation between all combinations of cylinders and whole-section scoring. Semiquantitative scoring of Ki-67 expression, a true continuous variable, rendered good correlation values in both the endometrial and breast carcinoma settings. In the latter, excellent agreement was reached, most likely due to the low cylinder-to-cylinder variation, as has been observed in renal carcinomas (24).
In breast carcinomas, we have reported data for estrogen receptor, Ki-67, and HER-2, all markers showing excellent agreement between TMA and whole section as determined by κ-coefficients. Estrogen receptor, Ki-67, and HER-2 expression was scored applying cutoff values according to clinical routine. For estrogen receptor, a whole-section scoring of >10% positivity determines the case as positive in our clinical setting.7
Thus, accumulating data from one to five cylinders should reflect a cumulative increased sampling of the whole section. In this context, a single hot spot of extensive positivity should not determine the overall evaluation. On the other hand, a single hot spot of extensive and intense HER-2 staining could very well render a positivity scoring of the whole section.8 Thus, a cumulative as well as a hot-spot approach was used to compare HER-2 TMA scoring to whole-section scoring. κ-coefficient analyses indicate that the latter approach, for this specific marker, would slightly increase the representativity of TMA analysis. Our observations are in concordance with other studies of steroid hormone receptors and HER-2 in breast carcinomas (24-26) comparing TMA data with whole-section studies. Furthermore, in a recent study, tumor heterogeneity and TMA sampling for biomarkers have been addressed (27), showing that a single TMA very well reflects the findings within the whole breast tumor.Thus, our observations from these tissues, including three different markers, are in concordance with our observations in lung carcinomas. This supports that our approach of determining TMA sampling is valid for other tissues than lung carcinomas and that in some markers, such as HER-2, a hot-spot approach may increase correlation to whole-section scoring.
Whether or not TMA sampling hampers the association of biological markers to clinical outcome data has been addressed in some studies (28-33). This does not seem to be the case for scored markers in prostate (28) and bladder carcinoma (29). TMA sampling might even increase the correlation to clinical outcome data as observed in a study of breast carcinoma by Torhorst et al. (30). In lung cancer, TMA studies have been able to show, for example, the prognostic impact of epidermal growth factor aberrations on the genomic as well as proteomic level (31).
We also present a new statistical approach for determining sample size in the construction of TMA (i.e., the number of cylinders per case). An increased number of cylinders will, of course, decrease the influence of intraindividual variation or, in other words, intraindividual tumor heterogeneity. However, in the TMA setting, an overall cost-effectiveness approach should be considered in which a rather large number of cases would compensate for the level of precision on the individual level.
Furthermore, we introduce a model in which preset levels of precision are used to determine the cutoff level for the number of cylinders. This precision criterion may be a relative precision criterion (16) determined as, for example, one half to one fifth of the mean, but the same approach could also be used with an absolute criterion; for example, ±10% for the proliferation rate setting.
Adding up these factors in TMA construction, one must keep in mind that all sampling is bound to introduce some kind of sampling error. Regardless of how extensive the histopathologic evaluation of a tumor may be, it reflects only a minute part of the tumor volume. However, in the daily practice of surgical pathology, the limited sampling of tumor material in this approach has been proven to be of value in contributing prognostic and therapeutic information in patient care. In our approach to TMA construction and the evaluation of biomarkers, a number of factors have been discussed: cylinder diameter and number of cylinders for sampling of each individual case, as well as different approaches to data evaluation. Regardless of our choices in these variables, sampling errors will be introduced. We have focused on a way to define them in statistical terms and thus to keep them at an equal level when defining the sampling size from each individual case. Furthermore, the extent of intraindividual sampling needs to be put into the perspective of the natural biological variation within the tumor group of the study at the interindividual level, another information provided by our statistical approach.
In conclusion, our approach to evaluate the validity of TMA analysis could be applied to any type of material, malignant or benign, surgical specimens or biopsies (5, 6, 32, 33). We have shown one way to address the issue of sample size in TMA construction, an issue that only a few studies have thus far addressed although TMA studies are frequently reported. Because TMA data will be of great importance for our future understanding of carcinogenesis as well as for the evaluation of biological markers in clinical studies, the validity of the method needs to be determined in every situation. This approach to TMA construction could be used as a method for further TMA-based studies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.