To the Editors: Robust and reversible biomarkers are needed for the successful completion of phase II prevention trials, an area where Fabian and colleagues have made enormous contributions. One potential marker is cell proliferation, measured by Ki-67 labeling; Khan et al. present data obtained by random periareolar fine-needle aspiration in 147 women at high-risk, which they interpret as providing preliminary support for use of Ki-67 as a response biomarker in prevention trials (1). They base this on an association between morphologically defined hyperplasia with atypia, and Ki-67 labeling index. These results are based on a single random periareolar fine-needle aspiration procedure, so that a key attribute of a useful biomarker (reproducibility; ref. 2) remains untested.
Our experience at Northwestern University using ductal lavage for similar analyses has both similarities and differences to these data. In an ongoing phase II chemoprevention study with tamoxifen as the intervention agent, we found a mean Ki-67 labeling index in ductal lavage samples of 0.72% (based on results from 117 women with >100 cells at baseline lavage). Although we observed a good correlation of Ki-67 labeling index with total epithelial cell yield (P = 0.001), no correlation was seen with cytological atypia. In agreement with Khan et al., we see a very strong correlation between cytology and total cell number, and note with interest that when Khan et al. performed a multivariate analysis to predict Ki-67 expression, a significant correlation of Ki-67 with cytomorphology was seen only after cell number was excluded from the analysis. This, together with our own findings in relation to the correlation of cell number with biomarkers such as endoplasmic reticulum, cyclooxygenase-2, and Ki-67 expression, suggests that epithelial cell yield is at least as good a biomarker of response to preventive intervention as Ki-67 labeling.
We have compared the mean Ki-67 labeling index in women who did not take tamoxifen with those who accepted tamoxifen therapy, at baseline and 6 months later. In the no-intervention group, Ki-67 labeling index dropped from 0.73 to 0.27 (P = 0.003) while in the tamoxifen group, comparable values were 0.55 and 0.37 (P = 0.35; ref. 3). In phase Ia and Ib studies of women with hormone receptor–positive breast cancer, Fabian et al. found no significant difference in Ki-67 labeling, although proliferating cell nuclear antigen–labeled cells decreased significantly following arzoxifene intervention (4).
Considering these results together, we remain skeptical whether parameters such as Ki-67 labeling and morphological atypia are robust and reproducible indicators of response.
In Response: Drs. Bhandare and Khan in their letter to the editor question whether Ki-67 labeling and cytomorphologic evidence of atypia in cytology specimens are sufficiently robust biomarkers to be used as response end points in early phase prevention trials. This is based on their experience with a limited number of ductal lavage specimens in which sampling was repeated over time.
Phase II prevention trials often rely on risk biomarker modulation to determine whether or not the intervention should be tested further in phase III trials, which generally use development of cancer as the primary end point. Whereas a risk biomarker needs to be measurable, biologically plausible, and attainable in the majority of the high-risk population with a minimally invasive procedure at a reasonable cost, a response biomarker must in addition be reproducible. Thus, in the absence of an effective intervention, assessment of a response biomarker should yield similar results over a short (6-12 months) interval. Unfortunately, there are many sources of variance which interfere with reproducibility. These include (a) interpretive variance; (b) physiologic variance due to differences in hormonal or cytokine milieu; (c) technical variance due to differences in assay procedure, equipment or personnel; and (d) sampling variance due to inability to obtain an adequate sample from the same areas or extreme variance in geographic distribution of the biomarker of interest. Although all these types of variance may be dealt with through use of a randomized double-blind study design, the larger the variance, the greater the subject number required to show modulation of the biomarker of interest (1). In general, interpretive and sampling variance may be greater for atypical cytology than Ki-67 whereas physiologic and technical variance is generally higher for Ki-67 than morphology.
Hyperplasia with atypia in random periareolar fine needle aspiration (RPFNA) has been shown in a prospective trial to be a powerful risk biomarker. Importantly, RPFNA stratifies women at increased risk based on the Gail model into moderate and very high-risk groups (2). Despite these promising results, cytologic variance (improved or worsening) in the placebo arm of a phase II chemoprevention trial was 46% using traditional morphologic categories, with 28% of women in placebo group judged to have improved morphology (3). The majority of this variance was believed to result from interpretation but sampling may have also played a role.
Although often multifocal and multicentric in high-risk women (4), atypia generally has a more patchy and restricted distribution than hyperplasia and/or expression of proliferation markers. Because RPFNA by design is meant to sample a field effect, Ki-67, which has minimal interpretive variance and a more generalized distribution than clusters of atypical cells, might be a more ideal response biomarker if it can be established that Ki-67 is a risk biomarker. Clearly, a prospective trial is required, but our preliminary finding that Ki-67 is associated with cytologic atypia is corroborating evidence of the cross-sectional study of Shaaban et al. (5). Ki-67 definitely varies with the phase of the menstrual cycle such that its use in trials of young premenopausal women may not be advisable unless women are resampled on the same or very similar day of the cycle.
The expression of Ki-67 in benign tissue is low. Consequently, a minimum of 500 and an optimum of 1,000 to 2,000 epithelial cells should be counted to reduce sampling variance. Further, in a small phase II prevention study, only subjects with a relatively high Ki-67 (>1.5-2%) should be entered if one wishes to try to detect a favorable modulation with a relatively small number of subjects (6).
Although we would agree with Drs. Bhandare and Khan that further studies with Ki-67 are required before it can be considered a validated risk biomarker, we would not agree that their experience with Ki-67 reproducibility in ductal lavage specimens would necessarily translate into RPFNA specimens. Ductal lavage often samples material from a single duct and there are multiple issues related to the ability to even repeatedly cannulate and harvest epithelial cells from the same duct over time (7). Furthermore, the low mean baseline level of Ki-67 in ductal lavage samples and the low minimum number (100) of ductal lavage epithelial cells required in Drs. Bhandare and Khan's study for Ki-67 assessment make the ductal lavage results referred to in their letter extremely susceptible to several types of variance.