We conducted studies to determine the magnitude and sources of variability in androgen assay results and to identify laboratories capable of performing such assays for large epidemiological studies. We studied androstanediol (ADIOL), androstanediol glucuronide (ADIOL G),androstenedione (ADION), androsterone glucuronide (ANDRO G),androsterone sulfate (ANDRO S), dehydroepiandrosterone (DHEA),dehydroepiandrosterone sulfate (DHEA S), dihydrotestosterone (DHT), and testosterone (TESTO). A single sample of plasma was obtained from five postmenopausal women, five premenopausal women in the midfollicular phase of the menstrual cycle, and five women in the midluteal phase,divided into aliquots, and stored at −70°. Four sets of two coded aliquots from each woman were then sent to participating labs for analysis at monthly intervals over 4 months.

Using the logarithm of assay measurements, we estimated the components of variance and three measures of reproducibility. The usual coefficient of variation is a function of the components that are under the control of the laboratory. The intraclass correlation between measurements for a given individual is the proportion of the total variability that is associated with individuals. The minimum detectable relative difference is important to evaluate study feasibility. Results suggest that a single sample of ADIOL G, DHEA, DHEA S, and ANDRO G(with two lab replicates per sample) can be used to discriminate reliably among women in a given menstrual phase or menopausal status. The results for DHT, TESTO, ADION, and ANDRO S are more problematic and suggest that the present measurement techniques should be used with care, especially with midluteal phase women. The results for ADIOL suggest that this assay is not yet ready for use in epidemiological studies.

Endogenous steroid hormones are believed to play a major role in breast cancer etiology, although a consensus does not yet exist about the precise endocrine patterns that maximize risk (1, 2, 3). Estrogens, especially estradiol, stimulate division of breast epithelial cells and have long been linked to the promotion and growth of breast cancer (4). More recently, androgens have also been postulated to be important in breast carcinogenesis, possibly as a source of estrogens, or by other mechanisms (3, 5, 6, 7). Epidemiological studies to explore the effects of endogenous androgens require reliable and accurate assays.

To help identify appropriate techniques and laboratories for measuring endogenous hormone in blood and urine samples collected in large epidemiological studies, the reproducibility of several capable laboratories was determined and compared. In earlier reports, Gail et al.(8) estimated the sources of variability and reproducibility of assays of estradiol, estrone, estrone sulfate,and progesterone in plasma from pre- and postmenopausal women; and Ziegler et al.(9) determined the reproducibility and validity of new measurement techniques for 2-hydroxyestrone and 16α-hydroxyestrone in urine. The present report presents similar results for nine androgens measured in plasma samples from women:ADIOL,2ADIOL G, ADION, ANDRO G, ANDRO S, DHEA, DHEA S, DHT, and TESTO. These include the ovarian and adrenal androgens previously analyzed in epidemiological studies of breast cancer, as well as other androgen metabolites of potential importance in breast cancer etiology. We have estimated assay variability over the time required to assay samples from a large epidemiological study by using four measurements spaced over 3 months. Because androgen relationships have been reported to differ for premenopausal and postmenopausal breast cancer, we present variability and reproducibility separately for follicular phase premenopausal women, luteal phase premenopausal women, and postmenopausal women.

Experimental Design.

Plasma was collected from 15 women who were volunteers from the National Cancer Institute. Five were in the midfollicular phase of the menstrual cycle (4–6 days after the start of menses; mean age, 40 years), five were in the midluteal phase (4–6 days before menses; mean age, 36 years), and five were postmenopausal (natural menopause; mean age, 56 years). None were taking exogenous hormones. Immediately after separation, plasma was stored at 4°C. Within 4 h of draw, the plasma was mixed, aliquoted, and stored at −70°. Further details on the material acquisition and handling are provided by Gail et al.(8).

Each participating lab received four batches of samples, with one batch to be assayed at the beginning of each of 4 consecutive months. Each batch contained two aliquots from each of the 15 subjects. The identifying numbers for the 30 samples within each batch were randomly assigned, separately for each batch. Lab personnel were told only whether a sample was from a premenopausal or postmenopausal woman. Each aliquot was assayed in duplicate. Thus, this study provides information on assay variability among women, among days on which assays were performed, among aliquots, and among lab replicates, but it does not provide information on temporal variations in hormone levels within women.

Laboratory Methods.

Four laboratories, two academic and two commercial, recognized for their skill and experience in measuring endogenous hormones, were invited and willing to participate in this study. Each lab was asked to use their standard assay procedures and to perform only those assays with which they had experience.

Laboratory 1.

Lab 1 assayed ADIOL G, ADION, DHEA, DHEA S, TESTO, and DHT in plasma. The assay for ADIOL G included organic extraction of the plasma to remove unconjugated 3α ADIOL and other unconjugated steroids,followed by incubation of the aqueous phase with β-glucuronidase. After enzyme hydrolysis and celite chromatography, the product of hydrolysis, ADIOL, was measured by RIA (10, 11). ADION was measured by extracting plasma with ethyl acetate (20%) in hexane,celite column chromatography, and RIA (12, 13, 14). DHEA was also determined by extraction with hexane:ethyl acetate (80:20), celite column chromatography, and RIA (15, 16, 17). DHEA S was measured by RIA after diluting the specimens 1:2500 with assay buffer (15, 18, 19). TESTO was measured by RIA, preceded by extraction of plasma with ethyl acetate (20%) in hexane and celite column chromatography (20, 21, 22). DHT was also measured by RIA involving ethyl acetate:hexane extraction and celite column chromatography (13, 23). Lab 1 reported the sensitivity of the assays to be 3 ng/dl for ADION, 25 ng/dl for ADIOL G, 15 ng/dl for DHEA, 5 μg/dl for DHEA S, 2 ng/dl for TESTO, and 5 ng/dl for DHT.

Laboratory 2.

Lab 2 measured ADIOL G, ADION, DHEA, DHEA S, TESTO, and DHT in plasma. ADIOL G was assayed using a method developed at lab 2. Plasma was extracted with a polar solvent. The dried extract was subjected to complete enzymatic hydrolysis, followed by extraction of free ADIOL with hexane:ethyl acetate and purification by high performance liquid chromatography. ADIOL in the purified eluate was quantitated by RIA. ADION was measured by extracting plasma with hexane:ethyl acetate,followed by RIA developed at lab 2. DHEA was also determined by extraction with hexane:ethyl acetate and RIA developed in lab 2. The assay for DHEA S was similar to that for DHEA, except that the initial step was the removal of sulfate by overnight hydrolysis with sulfatase. TESTO was measured by RIA after extraction and column chromatography according to the method of Furuyama et al.(24). DHT was measured by an RIA developed at lab 2. Plasma samples were first extracted with hexane:ethyl acetate, followed by treatment with a strong oxidizer to destroy all unsaturated steroids, and purification on alumina columns. Lab 2 reported the sensitivity of the assays to be 14 ng/dl for ADION, 10 ng/dl for ADIOL G, 20 ng/dl for DHEA, 5 μg/dl for DHEA S, 3 ng/dl for TESTO, and 2 ng/dl for DHT.

Laboratory 3.

Lab 3 assayed DHEA, DHEAS, ADION, TESTO, DHT, ADIOL, ADIOL G, ANDRO S,and ANDRO G. DHEA S was quantified by direct RIA after a 1000-fold dilution of the plasma sample with assay buffer (25). DHEA, ADION, TESTO, DHT, and ADIOL were measured by RIA after extraction of plasma with diethyl ether and subsequent purification by celite column chromatography (26, 27, 28, 29). ADIOL G was quantified directly in plasma using a validated commercial kit(Diagnostic Systems Laboratories, Webster, Texas; Ref. 30). ANDRO S and ANDRO G were measured after unconjugated steroids were removed by extraction with diethyl ether, and the remaining conjugated steroids were hydrolyzed using hydrochloric acid and β-glucuronidase to cleave the sulfate and glucuronide moieties,respectively (31). In both assays, the product of hydrolysis, androsterone, was quantified by RIA after extraction with ethyl acetate and purification by celite column chromatography. Lab 3 reported the sensitivity of the assays to be as follows: 2 μg/dl for DHEA S, 20 ng/dl for DHEA, 10 ng/dl for ADION, 4 ng/dl for TESTO, 2 ng/dl for DHT, 2 ng/dl for ADIOL, 5 ng/dl for ADIOL G, 3 ng/dl for ANDRO G, and 6 ng/dl for ANDRO S.

Laboratory 4.

Lab 4 performed measurements of ADION, DHEA, DHEA S, and TESTO in plasma. ADION was measured by carbon tetrachloride extraction of plasma followed by an RIA kit (ICN Biochemicals, Diagnostics Division). DHEA was measured by dichloromethane extraction and an RIA kit from Coat-A-Count. DHEA S was assayed directly in plasma using a double antibody RIA kit from ICN Biochemicals, Diagnostics Division. TESTO was also measured directly in plasma using an ICN RIA kit. The sensitivity of the assays, as reported in the kit documentation, was 0.1 ng/ml for ADION, 0.04 ng/ml for DHEA, 0.5 ng/ml for DHEA S, and 0.1 ng/ml for TESTO.

Statistical Methods.

Measurements were analyzed on the natural logarithmic scale. This transformation reduces the dependence of the SD of the response on the mean so that variance can be assumed to be unrelated to subject. For each of the three groups of women (midfollicular, midluteal, and postmenopausal), a nested component of variance analysis was performed. Components were estimated for subjects(ς2a), month(ς2b), aliquots on the same day (ς2c), and replicates from the same aliquot (ς2). Letting zijkl denote the hormone measurement for woman i (i = 1,2,3,4,5) on analysis day j(i) (j = 1,2,3,4), using aliquot k(ij) (k = 1,2) and replicate l(ijk) (l = 1,2), the statistical model is written (1):

\[log_{\mathrm{e}}(z_{ijkl}){=}{\mu}{+}a_{i}{+}b_{j(i)}{+}c_{k(ij)}{+}{\epsilon}_{l(ijk)}\]

where loge denotes the natural logarithm (base e). In the model, μ is the average level of the hormone, and ai, bj(i), ck(ij), andε l(ijk) are normal independent variates with means zero and variancesς 2a, ς2b, ς2c, and ς2, respectively. Restricted maximum likelihood estimates of the variance components were obtained using the SAS procedure PROC VARCOMP (32). The procedure also provides estimates of the SE of the estimated variance components. Restricted maximum likelihood estimates cannot be less than zero, and they agree with the usual ANOVA estimates when all estimates are greater than zero.

Knowledge of the variance components allows a careful quantitative consideration of assay reproducibility. We use three measures of reproducibility derived from these components: the CV, the ICC coefficient, and the MDRD. The three measures are quite different, but each measure is useful depending on the application.

The common measure of reproducibility used by the labs is the CV,namely the population SD of a measurement divided by its mean. The components associated with day, aliquot, and replicate are the components that are under the control of the lab. An application of theδ method (33) shows that the sum of these components is a good estimate of the square of the CV (8). Because the labs all used two replicates, the CV expressed as a percentage is estimated by 100(ς2b + ς2c2/2)1/2, where hats denote estimates of corresponding parameters. This CV incorporates the variation associated with day and may be much larger than a CV based only on variability due to aliquots and replicates on a single day. The validity of this approximation depends heavily on the model and particularly on the assumption that after logarithmic transformation the variance is unrelated to subject.

The assay would not be useful if the differences in true values between subjects were not large compared to total assay variability. For this to be the case, the variability associated with subjects should be large compared to the variability under the control of a lab. It is appropriate then to compareς 2a, the component associated with subjects, with the sum of all components. The ratio is close to unity when the biological component is large relative to the components associated with the lab. In fact, the ICC between measurements on different days from a given individual is exactly this ratio. If two replicates are used for each sample, the estimated ICC between two measurements on different days is ICC =ς 2a/(ς2a+ ς2b + ς2c+ ς 2/2). We express the ICC in percent by multiplying by 100. Ifς 2a is small, the ICC may not be near one, even when the CV is small. The ICC is of importance to the epidemiologist because it indicates the effect of measurement error on study results. Specifically, regression analyses relating the log relative risk of disease to the log hormone assay level will be attenuated by the ICC. If the ICC is 0.90, there will be a downward bias that is slight, only 10%, but an ICC of <0.80 results in bias that may be important.

Assay variability can decrease the power of a study to detect a difference in hormone levels between cases and controls. Knowing the variance components allows one to determine the minimum difference that is reliably detected with a given number of cases and controls. Specifically, for a two-sided α = 0.05 level test, the minimum difference in average log assay values (δ =μ 1 − μ2) detectable with power 0.90 is the solution to

\[{\delta}^{2}{=}({\varsigma}^{2}_{\mathrm{a}}{+}{\varsigma}^{2}_{\mathrm{b}}{+}{\varsigma}^{2}_{\mathrm{c}}{+}{\varsigma}^{2}/2)(1/n_{1}{+}1/n_{2})(1.96{+}1.282)^{2}\]

where 1.96 is the 97.5th percentile of the standard normal distribution, 1.282 is the 90th percentile, and n1 and n2 are the case and control sample sizes. From δ, one can calculate the minimum percentage difference detectable with power 0.90 as 100{exp(μ1) −exp(μ2)}/exp(μ2) = 100{exp(δ) − 1}. We call this quantity the MDRD. Usually an investigator has a sense of what differences exist and what sample sizes can be used so the MDRD is useful. Ifς 2a is small, the value of δand therefore the MDRD may be small even when the CV is large.

For each hormone and laboratory, we examined graphs of grand means,daily means, and aliquot means and examined these quantities for stability over time. We also examined differences in variability and agreement of results among the lab assays. These graphs are presented for ADIOL G and for DHT. Graphs for other androgens are available upon request. Spearman rank correlations are used to measure concordance of grand means among lab assays. The estimated components of variance and SEs of the estimates are tabulated in the “Appendix.” The components of variance are used to obtain estimates of the CVs, ICCs,and MDRDs, which are compared among the labs. To calculate ICCs and CVs in Tables 1,2, we assume that the measurement used is the mean of the two logarithmic-transformed replicates. To calculate MDRD, we assume, in addition, that n1 = 300 cases and n2 = 600 controls are used;these numbers approximate the sample sizes available in an ongoing study of Asian-American women that motivated these assay reliability studies.

ADIOL G.

Fig. 1, a, b, and c displays the results for Loge(ADIOL G), respectively, among the five midfollicular phase women, five midluteal phase women, and five postmenopausal women. Each woman is represented by a different symbol. The leftmost symbols represent the grand means of the 16 measurements for each of the five women. The next symbols represent the daily means of the four measurements for each woman on analysis day 1, and connected to each of these symbols are symbols that represent the aliquot means of the two replicate readings from each of the aliquots on that day. Results for the four measurements for each woman on analysis days 2, 3, and 4 are then presented in a similar manner.

No consistent time trends were evident, although there was a decreasing trend in measurements from lab 2 for some midfollicular phase women(Fig. 1,a). There was overlap of the women’s aliquot means from labs 1 and 2 in Fig. 1, a, b, and c; thus the ranks of the subjects’ aliquot means were not completely consistent over time. Results from lab 3 were more consistent in that aliquot means were well separated for women in all phases, except for minimal overlap of two for midfollicular and midluteal phase women.

The geometric means of all ADIOL G measurements were 151 ng/dl at lab 3, 74.4 at lab 1, and 64.1 ng/dl at lab 2. These differences are not surprising because the labs do not correct for molecular weight differences, hydrolysis, and procedural losses in the same way. The correlations of the ranks of the subjects’ mean responses were 0.94 between labs 1 and 2, but only 0.88 between labs 1 and 3 and 0.80 between labs 2 and 3.

For midfollicular and midluteal women, the CVs ranged from 13 to 17%at labs 1 and 3 but were somewhat higher, about 30%, at lab 2 (Table 1). The CV was 16% for measurements from postmenopausal women at lab 1 and much higher at labs 2 and 3 (25–35%). The ICCs were all >80%and >90% for labs 1 and 3. The estimated MDRDs were 14–18% using measurements from lab 1 and somewhat larger for labs 2 and 3 (Table 1). Estimates of individual variance components and their SEs are provided in Table A1 of the “Appendix.”

DHEA.

No definite time trends were evident (figure not shown). There was some overlap of aliquot means for subjects in all groups. The ranks of the subjects’ mean responses were highly correlated for all lab pairs. Correlations ranged from 0.98 to 0.99. The geometric means of all measurements of DHEA were 158.9, 199.5, 187.9, and 101.4 ng/dl at labs 1, 2, 3, and 4, indicating somewhat lower levels for lab 4.

The CVs were the smallest from lab 1, ranging from 8 to 9%, and somewhat larger from the other labs, ranging from 13 to 28% (Table 1). The CV was small for midluteal phase measurements from each lab. The ICCs were largest for measurements at lab 1, 97–99%, and somewhat smaller for other labs (87–94%). The MDRDs were smallest for midluteal phase measurements, 9–13%, and larger for other phases,14–21%.

DHEA S.

No time trends were evident in measurements from any lab in any phase(figure not shown). There was some overlap in assay measurements from each lab for women in each phase. The ranks of the subjects’ mean responses were highly correlated (0.98–1.00) across labs. The mean levels of DHEA S, 85.3, 61.1, 95.1, and 76.6 μg/dl at labs 1, 2, 3 and 4, were statistically significantly different for all pairs of labs.

The CVs were lowest from lab 4 (7–10%), slightly higher from lab 1(10–12%), and still higher at the other labs (11–19%; Table 1). The ICCs were >96% at labs 1 and 4 and slightly lower but still very high at labs 2 and 3 (92–95%). The MDRD ranged from 12 to 19%.

DHT.

There were no consistent time trends (Fig. 2, a-c). There was some overlap of aliquot means of midfollicular women, but there was substantial overlap of aliquot means of midluteal and postmenopausal women. In particular, there were large differences between aliquot means of midluteal women at labs 2 and 3. For all laboratories, the ranks of the subjects’ mean responses were highly correlated (0.97–0.99). The geometric mean levels of DHT were similar for labs 1 and 3 (8.53 and 8.22 ng/dl) and somewhat lower for lab 2 (6.56 ng/ml).

The CVs were smallest for midfollicular and midluteal measurements,11–17%, and slightly larger for postmenopausal measurements, 18–21%(Table 1). The ICC was 93–99% for midfollicular measurements;slightly lower for postmenopausal measurements, 81–82%; and very low for midluteal measurements, 0–23%. The ICC was zero when the observed variability between subjects was very small compared to the variability associated with the lab, i.e., when the variability between subjects could be completely explained by the within lab variability. The MDRDs were 10–16% for midfollicular and postmenopausal measurements, but only 3–4% for midluteal measurements. These very small MDRDs occurred because the total variability was quite small.

TESTO.

There were no clear time trends in the measurements for TESTO (figure not shown). The aliquot means from lab 3 were clearly separated for postmenopausal women; there was some overlap in the aliquot means from other labs. There was extensive overlap of aliquot means from all labs for midfollicular and midluteal women. The ranks of the subjects’ mean responses were highly correlated (0.90–1.00) for labs 1, 2, and 3. Correlations with lab 4 were smaller, −0.10 to 0.20 for midluteal and postmenopausal women and 0.80–0.90 for midfollicular women. Geometric mean levels of TESTO, 18.5, 16.0, 19.1, and 18.8 ng/dl, were significantly different at labs 1, 2, and 3.

The CVs differed by lab (Table 1). CVs for measurements from labs 1 and 3 were 9–14%, whereas those from labs 2 and 4 were 21–26%. ICCs ranged from 0.84 to 0.88 for midfollicular women, 0.55–0.75 for midluteal women, and 0.88–0.96 for postmenopausal women. The ICCs reflected the large variability of measurements between postmenopausal women and small variability of the measurements between midluteal women. The MDRDs were smaller for the midluteal measurements, 6–11%,and somewhat larger for midfollicular or postmenopausal measurements,8–15%.

ADION.

No time trends were evident in the data for lab 1 (figure not shown). Although there was some overlap of aliquot means of the midfollicular and postmenopausal women, measurements at the high levels were clearly separated from those at lower levels. The grand means of the midluteal women had a narrow range, and there was greater overlap with poor separation. There was an increasing trend with time in means for all menstrual phases in the data from lab 2. There was also overlap and poor separation for all phases. The daily means from lab 3 showed substantial variability because measurements on day 3 were consistently lower than comparable measurements on other days. Measurements from lab 4 showed no time trends. There was no overlap of aliquot means and clear separation for midfollicular phase women. However, there was substantial overlap for midluteal women and some overlap for postmenopausal women with only the highest and lowest measurements clearly separated.

The ranks of the subjects’ mean responses for midluteal and postmenopausal women were highly correlated (0.90–1.00), but not the ranks for midfollicular women (0.30–0.90). The geometric means of ADION were 60.0, 62.9, 55.7, and 65.6 ng/dl and were significantly lower from labs 1 and 3 than from labs 2 and 4. Labs 2 and 3 had lower ICCs and higher CVs than labs 1 and 4 (Table 1). Estimated MDRDs were slightly larger at lab 2 than at other labs. At all labs, ICCs and MDRDs were much smaller for women in the midluteal menstrual phase than for women in the midfollicular and postmenopausal phases.

ANDRO G.

One lab provided measurements for ANDRO G levels. There were no definite time trends (figure not shown). Although there was some overlap of aliquot means for midfollicular phase women, the separation was clear. There was some overlap among the aliquot means for midluteal women, and only those with highest and lowest measurements were clearly separated. For postmenopausal women, there was considerable overlap,and only the woman with the lowest measurements were separated.

The estimated CVs were around 20% for midfollicular and midluteal women and somewhat higher, 33%, for postmenopausal women (Table 2). The ICCs were high, >85% (Table 2). The MDRD was about 14% for midluteal samples, but >20% for postmenopausal and midfollicular samples. Estimates of individual variance components and their SEs are provided in Table A2 of the “Appendix.”

ANDRO S.

There was considerable overlap in aliquot means for all groups (figure not shown). The estimated CVs (Table 2) were 25–30%. The ICCs were 88–92% for midluteal and postmenopausal measurements, but they were much lower, 64%, for midfollicular measurements. The MDRDs for midluteal and postmenopausal women were 22%, but only 10% for midfollicular women. The component of variance for subjects,ς a2, was relatively small for these midfollicular women, resulting in small total variation and therefore a small MDRD.

ADIOL.

Only one lab carried out assays for ADIOL. No definite time trends were evident for any phase (figure not shown). There was great overlap of aliquot means for all groups. The range of aliquot means for a particular woman on 1 day can be large. The ICCs were only 12.1, 2.0,and 11.3 for midfolicular, midluteal, and postmenopausal women,respectively (Table 2). The CVs for the ADIOL assay were large,43–79% (Table 2). Nevertheless, because the total variability in the data were small, the MDRDs were 10–21% (Table 2).

Epidemiological field studies that are planned to evaluate associations between serum hormone levels and risk of cancer may require that many samples be analyzed over a period of months or years. The degree of variability in hormone assay results should be small enough so that the assay is likely to detect differences between cases and controls. In this study, we have obtained data on components of variability in androgen assay results. Such data allow one to assess the reproducibility of these assays and the measurements for epidemiological studies.

ADIOL G, DHEA, DHEA S, DHT, TESTO, and ADION were assayed in several laboratories. There was variation in the mean assay levels among the participating labs, but the correlations of rankings of subjects’ mean results among the labs were high. The CVs were fairly high and did not vary widely by menstrual or menopausal states. The CVs for measurements from lab 1 were usually <15% but ranged to 20%, whereas those from lab 2 were usually <20% but ranged as high as 40%.

The ICC was 100 times the ratio of the biological variability among women to the total variability, including sources of variation associated with lab procedures. Values of ICC near 100 indicated that lab variability was small compared to biological variability. ICC values for ADIOL G, DHEA, and DHEAS exceeded 90% for lab 1 and lab 3 and usually exceeded 85% for labs 2 and 4. At all labs, the ICCs usually exceeded 80% for DHT, TESTO, and ADION in postmenopausal and midfollicular phase women. For midluteal phase women, the biological variability among women was small, and there were lower values of the ICC; specifically, they were <70% for TESTO and <22% for DHT and ADION.

Another way to assess the utility of these assays is to determine the minimal detectable relative difference in percent, MDRD, that can be detected in a case-control study. The comparison in this report was based on 300 cases and 600 controls, approximately the size of the study we are contemplating. For ADIOL G, DHEA, DHEA S, DHT, TESTO, and ADION, the MDRDs for this design were <20% at labs 1 and 4, and<30% at labs 2 and 3. The MDRDs were smallest for DHT, TESTO, and ADION, the assays for which the biological variability among women was particularly small.

Lab 3 was the only lab that volunteered to assay ANDRO G, ANDRO S, and ADIOL. For ANDRO G and ANDRO S, the CVs ranged from 18% to 33%,whereas the CVs for ADIOL were very high (43–79%). Levels of the latter hormone were very low (usually <100 pg/ml) in both premenopausal and postmenopausal women. The ICCs ranged from 86–97%for ANDRO G and 64–92% for ANDRO S, but were <15% for ADIOL. For each of these assays, the MDRDs ranged from 10 to 25% for a study with 300 cases and 600 controls.

The CV is useful for lab quality control, whereas the ICC and MDRD are more important in determining the feasibility of an epidemiological study. If the variation among subjects is large, the ICC may be large even if the CV is large. If the ICC is large, estimates of the slope of the log relative risk on log (hormone) will suffer little attenuation from lab measurement error, and required sample sizes will be minimally inflated from lab measurement error. On the other hand,the MDRD depends on all sources of variability. If variation among subjects is small, the MDRD may be small enough to justify an epidemiological study even if the CV is large, provided the ICC is not too small. Conversely, a study can be impeded by small values of ICC and large values of MDRD, even when the CV is small.

Estimates of the components of variance allow one to identify the aspects of lab procedures that lead to increased variability and to learn how to efficiently allocate resources to improve assay reproducibility. For example, if there were more variation among aliquots than among replicates, one might increase the number of aliquots and decrease the number of replicates. The total effort would not change, but the CV would decrease, the ICC would increase, and the MDRD would decrease. For those interested in design issues, the estimated components of variance and their SEs for each of the androgen assays are given in the “Appendix.”

Our study used aliquots from a woman whose blood was drawn on a single day, so our estimates of subject variation for a premenopausal or postmenopausal woman include both the between subject variation and the secular variation for a given woman in the middle of that phase. This reliability study design is entirely appropriate for the typical case-control study, which uses only one sample per subject. These data do not allow us to estimate the component of variation that corresponds to repeated blood samples taken over time from the same woman.

Lab 1 usually exhibited smaller CVs, higher ICCs, and smaller MDRDs than the other labs. Lab 3 also exhibited relatively small CVs, high ICCs, and small MDRDs while also providing results on more hormones than the other labs.

The present study used only five women in each men opausal or menstrual phase. Larger numbers of women would be desirable to estimate ICCs and other parameters with greater precision. This study provided valuable guidance, nonetheless, for designing epidemiological studies. These data suggest that a single sample (with two lab replicates per sample) of ADIOL G, DHEA, DHEA S, and ANDRO G can be used to discriminate reliably among women in a given menstrual phase or menopausal status. The results for DHT, TESTO, ADION, and ANDRO S are more problematic and suggest that the present measurement techniques should be used with care, especially with midluteal women. The results for ADIOL suggest that this assay is not yet ready for use in epidemiological studies.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

                
2

The abbreviations used are: ADIOL,androstanediol; ADIOL G, ADIOL glucuronide; ADION, androstenedione;ANDRO G, androsterone glucuronide; ANDRO S, androsterone sulfate;DHEA, dehydroepiandrosterone; DHEA S, DHEA sulfate; DHT,dihydrotestosterone; TESTO, testosterone; CV, coefficient of variation;ICC, intraclass correlation; MDRD, minimum detectable relative difference.

1
Bernstein L., Ross R. K. Endogenous hormones and breast cancer risk.
Epidemiol. Rev.
,
15
:
48
-65,  
1993
.
2
Pike M. C., Spicer D. V., Dalmoush L., Press M. F. Estrogens, progestogens, normal breast cell proliferation, and breast cancer risk.
Epidemiol. Rev.
,
15
:
17
-35,  
1993
.
3
Kuller L. H. The etiology of breast cancer—from epidemiology to prevention.
Public Health Rev.
,
23
:
157
-213,  
1995
.
4
Key T. J. A., Pike M. C. The role of oestrogens and progestogens in the epidemiology and prevention of breast cancer.
Eur. J. Cancer Clin. Oncol.
,
24
:
29
-43,  
1988
.
5
Stoll B. A., Secreto G. New hormone-related markers of high risk to breast cancer.
Ann. Oncol.
,
3
:
435
-438,  
1992
.
6
Toniolo P. G. Endogenous estrogens and breast cancer risk: the case for prospective cohort studies.
Environ. Health Perspect.
,
105(Suppl.3)
:
587
-592,  
1997
.
7
Dorgan J. F., Stanczyk F. Z., Longcope C., Stephenson H. E., Jr., Chang L., Miller R., Franz C., Falk R. T., Kahle L. Relationship of serum dehydroepiandrosterone (DHEA), DHEASulfate, and 5-androstene-3β, 17β-diol to risk of breast cancer in postmenopausal women.
Cancer Epidemiol. Biomark. Prev.
,
6
:
177
-181,  
1997
.
8
Gail M. H., Fears T. R., Hoover R. N., Chandler D. W., Donaldson J. L., Hyer M. B., Pee D., Ricker W. V., Siiteri P. K., Stanczyk F. Z., Vaught J. B., Ziegler R. G. Reproducibility studies and interlaboratory concordance for assays of serum hormone levels: estrone, estradiol, estrone sulfate and progesterone.
Cancer Epidemiol. Biomark. Prev.
,
5
:
835
-844,  
1996
.
9
Ziegler R. G., Rossi S. C., Fears T. R., Bradlow H. L., Adlercreutz H., Sepkovic D., Kiuru P., Wahala K., Vaught J. B., Donaldson J. L., Falk R. T., Fillmore C. M., Siiteri P. K., Hoover R. N., Gail M. H. Quantifying estrogen metabolism: An evaluation of the reproducibility and validity of enzyme immunoassays for 2-hydroxyestrone and 16α-hydroxyestrone in urine.
Environ. Health Perspect.
,
105
: (Suppl.3)
607
-614,  
1997
.
10
Horton R., Hawks D., Lobo R. 3-α, 17-β androstanediol glucuronide in plasma: a marker of Androgen action in idiopathic hirsutism.
J. Clin. Invest.
,
69
:
1203
-1206,  
1982
.
11
Horton R., Imperator-McGinley J., Peterson R. Androstanediol glucuronide (3 a-diol) in plasma is a unique marker in disorders of peripheral Androgen production and action in male pseudohermaphroditism Serio M. Martini L. eds. .
Sexual Differentiation
,
:
261
-273, Raven Press New York  
1984
.
12
Gives J. R. Hirsutism and hyperandrogenism.
Adv. Int. Med.
,
21
:
221
-247,  
1976
.
13
Abraham G. E., Maroulis G. B., Buster J. E. Effect of dexamethasone on serum cortisol and Androgen levels in hirsute patients.
Obstet. Gynecol.
,
47
:
395
-402,  
1976
.
14
Korth-Schutz S., Virdis R., Saenger P., Chow D. M., Levine L. S., New M. I. Serum Androgens as a continuing index of adequacy of treatment of congenital adrenal hyperplasia.
J. Clin. Endocrinol. Metab.
,
46
:
452
-458,  
1978
.
15
Buster J. E., Abraham G. E. Radioimmunoassay of plasma dehydroepiandrosterone sulfate.
Ann. Lett.
,
5
:
543
-547,  
1972
.
16
de Peretti E., Forest M. G. Unconjugated dehydroepiandrosterone plasma levels in normal subjects from birth to adolescence in humans: the use of a sensitive radioimmunoassay.
J. Clin. Endocrinol. Metab.
,
43
:
982
-991,  
1976
.
17
Nieschlag E., Loriaux D. L., Ruder H. J., Zucker I. R., Kirschner M. A., Lipsett M. B. The secretion of dehydroepiandrosterone and dehydroepiandrosterone sulfate in man.
J. Endocrinol.
,
57
:
123
-134,  
1973
.
18
Korth-Schutz S., Levine L. S., New M. I. Dehydroepiandrosterone sulfate levels, a rapid test for abnormal adrenal Androgen secretion.
J. Clin. Endocrinol. Metab.
,
42
:
1005
-1013,  
1976
.
19
de Peretti E., Forest M. G. Pattern of plasma dehydroepiandrosterone sulfate levels in humans from birth to adulthood: evidence for testicular production.
J. Clin. Endocrinol. Metab.
,
47
:
572
-577,  
1978
.
20
Kinouchi T., Pages L., Horton R. A. A specific radioimmunoassay for testosterone in peripheral plasma.
J. Lab. Clin. Med.
,
82
:
309
-316,  
1973
.
21
Kirschner M., Bardin W. Androgen production and metabolism in normal and virilized women.
Metabolism
,
21
:
667
-688,  
1972
.
22
Rao P. N., Moore P. H. Synthesis of new steroid haptens for radioimmunoassay. I. 15-β-carboxyethylmercapto-testosterone-bovine serum albumin conjugate. Measurement of testosterone in male plasma without chromatography.
Steroids
,
28
:
101
-109,  
1976
.
23
Rao P. N., Khan A. H., Moore P. H. Synthesis of new steroid haptens for radioimmunoassay. III. 15-β-carboxyethlymercaptosteroid-bovine serum albumin conjugates. Specific antisera for radioimmunoassay of 5-α-dihydrotestosterone 5-α-androstane-3-β,17β-dioland5-α-androstane-3-α,17-β-diol.
Steroids
,
29
:
171
-184,  
1977
.
24
Furuyama S., Mayes D. M., Nugent C. A radioimmunoassay for plasma testosterone.
Steroids
,
16
:
415
-428,  
1970
.
25
Lobo R. A., Paul W., Goebelsmann U. Dehydroepiandrosterone sulfate as an indicator of adrenal Androgen function.
Obstet. Gynecol.
,
57
:
69
-73,  
1981
.
26
Lobo R. A., Goebelsmann U., Brenner P. F., Mishell D. R., Jr. The effects of estrogens on adrenal Androgens in oophorectomized women.
Am. J. Obstet. Gynecol.
,
142
:
471
-478,  
1982
.
27
Goebelsmann U., Horton R., Mestman J. H., Arce J. J. Male pseudoherm-aphroditism due to testicular 17-β-hydroxysteroid dehydrogenase deficiency.
J. Clin. Endocrinol. Metab.
,
36
:
867
-879,  
1973
.
28
Goebelsmann U., Bernstein G. S., Gale J. A., Kletzky O. A., Nakamura R. M., Coulson A. H., Korelitz J. J. Serum gonadotropin, testosterone, estradiol and estrone levels prior to and following bilateral vasectomy Lepow I. H. Crozier R. eds. .
Vasectomy: Immunologic and Pathophysiologic Effects in Animals and Man
,
:
165
-175, Academic Press New York  
1979
.
29
Serafini P., Ablan F., Lobo R. A. 5-α-Reductase activity in the genital skin of hirsute women.
J. Clin. Endocrinol. Metab.
,
60
:
349
-355,  
1985
.
30
Narang R., Rao J., Savjani G., Peterson J., Gentzschein E., Stanczyk F. Z. .
Radioimmunoassay kit for the quantitative measurement of androstanediol glucuronide in unextracted serum
, Presented at the 17th National Meeting of the Clinical Ligand Assay Society, April 10–13, 1991, Chicago, IL.
31
Matteri R. K., Stanczyk F. Z., Gentzschein E. E., Delgado C., Lobo R. A. Androgen sulfate and glucuronide conjugates in nonhirsute women with polycystic ovarian syndrome.
Am. J. Obstet. Gynecol.
,
161
:
1704
-1709,  
1989
.
32
SAS Institute Inc. 4th ed. .
SAS/STAT User’s Guide, Version 6
,
Vol. 2
: SAS Institute Inc. Cary, NC  
1989
.
33
Rao C. R. eds. .
Linear Statistical Inference and Its Applications
, 2nd ed. John Wiley New York  
1965
.