Background: Metabolite levels within an individual vary over time. This within-individual variability, coupled with technical variability, reduces the power for epidemiologic studies to detect associations with disease. Here, the authors assess the variability of a large subset of metabolites and evaluate the implications for epidemiologic studies.

Methods: Using liquid chromatography/mass spectrometry (LC/MS) and gas chromatography-mass spectroscopy (GC/MS) platforms, 385 metabolites were measured in 60 women at baseline and year-one of the Shanghai Physical Activity Study, and observed patterns were confirmed in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening study.

Results: Although the authors found high technical reliability (median intraclass correlation = 0.8), reliability over time within an individual was low. Taken together, variability in the assay and variability within the individual accounted for the majority of variability for 64% of metabolites. Given this, a metabolite would need, on average, a relative risk of 3 (comparing upper and lower quartiles of “usual” levels) or 2 (comparing quartiles of observed levels) to be detected in 38%, 74%, and 97% of studies including 500, 1,000, and 5,000 individuals. Age, gender, and fasting status factors, which are often of less interest in epidemiologic studies, were associated with 30%, 67%, and 34% of metabolites, respectively, but the associations were weak and explained only a small proportion of the total metabolite variability.

Conclusion: Metabolomics will require large, but feasible, sample sizes to detect the moderate effect sizes typical for epidemiologic studies.

Impact: We offer guidelines for determining the sample sizes needed to conduct metabolomic studies in epidemiology. Cancer Epidemiol Biomarkers Prev; 22(4); 631–40. ©2013 AACR.

Metabolomics is the assessment of small molecules (1), often defined to be only those molecules participating in cellular metabolism, within a given biologic system (2). Modern methods, such as nuclear magnetic resonance (NMR) and mass spectroscopy coupled with liquid chromatography or gas chromatography (3), can identify and quantify a large number of metabolites simultaneously within a biospecimen, capturing its metabolomic profile. These profiles have been used to predict the risk of diabetes (4, 5), diagnose prostate cancer (6), and identify biomarkers of Crohn's disease (7). While these initial studies have shown the potential of metabolomics, several important issues need to be resolved before considering metabolomics as a tool for large epidemiologic studies.

A common goal in epidemiology is to relate a “usual” (8) level of an exposure, such as blood pressure, vitamin D levels, or smoking status, with the risk of disease. Usual is an ambiguous term, but it might be loosely translated as the average level over the last month or, perhaps, year. To assess the potential association, epidemiologic studies often rely on only a single measurement in time as an estimate or surrogate for an individual's usual level. For characteristics that have large day-to-day variation or are measured with low technical reliability, the surrogate may poorly reflect the desired quantity. Given that it is the usual level that is likely to be associated with the disease, within-individual and technical variability will reduce the study's power to detect and quantify the tested association (9–11).

There is a potential concern that a single metabolomic profile may poorly reflect usual levels. Several metabolites are already known to vary within an individual over time. For example, vitamin D levels vary with the seasons (12), estrogen levels vary with the menstrual cycle in premenopausal women (13, 14), and aldosterone, cortisol, and rennin levels follow a circadian rhythm (15). On a shorter time scale, carbohydrate, lipid, and amino acids levels in the blood respond to dietary patterns, spiking sharply in the postprandial period (1–2 hours after eating; ref. 16). However, recent studies have suggested that metabolomic profiles may be relatively stable (11, 17–20). Floegel and colleagues found that the median intraclass correlation (ICC), over a 4-month interval, of 163 serum metabolites measured by mass spectroscopy was 0.57 (20) and Nicholson and colleagues found the stable proportion of biologic variation, over a similar period, for 38 annotated plasma metabolites measured by NMR was, on average, 0.68 (11), with similar ICCs for a larger number of spectral peaks. Here, we extend this research by studying a larger set of 385 metabolites, by including nonfasting samples so as to represent samples typically collected in epidemiologic studies and by considering measurements separated by 1 year so as to capture the variability around persistent exposures, which are more likely to affect the risk of many diseases.

Our overarching goal is to provide key information needed to design metabolomics analyses in the context of large-scale epidemiologic studies. Our first objective is to estimate the within-individual, technical, and between-individual variability in 385 plasma metabolites measured by LC/MS and GC/MS, when samples are collected as part of an epidemiologic study. Higher between-individual variability is desirable, because that encompasses the measurable differences that can be associated with disease. We assess these 3 sources of variability in 184 individuals from the Shanghai Physical Activity (SPA) study and confirm our observations in a smaller subsample from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. Our second objective is to translate these estimates of variability into estimates of the study power (11) that can be expected for epidemiologic studies, with a specific focus on large case–control and case–cohort studies. While our conclusions are based on our results observed for LC/GC mass spectroscopy, our methodologic framework can evaluate other metabolomic platforms, such as NMR.

Studies and sample collection

The Shanghai Women's Health Study (SWHS) and Shanghai Men's Health Study (SMHS) are prospective cohort studies that include 74,943 women (ages 40–70 years at study baseline) and 61,582 men (ages 40–75 years at baseline) from 8 communities in Shanghai, China, between 1997 and 2006. The SPA study included a randomly selected subcohort residing in 2 communities (21, 22). Participants were each enrolled for 1 year and provided EDTA plasma samples at the beginning (T0) and end of the study year (T1). Samples were stored at −70°C (23). Our analysis includes all 106 women and 78 (out of 100) men who were enrolled in the first wave of recruitment, donated T1 plasma samples and have a valid Actigraph accelerometer measurement, a requirement for a complementary study of physical activity. The study included the 60 men with the most extreme levels of physical activity (30 high; 30 low) and 18 randomly selected men. The median age at T1 was 55 and 52 years for men and women, respectively and 55% of the women were postmenopausal.

Metabolite levels were measured for all 184 individuals at T1 and a randomly selected subset of 60 women at T0. Although fasting was not required, 6, 4, and 21 of these 60 women reported to be fasting during the morning of sample collection at both T0 and T1, T0 only, and T1 only. Two replicate samples, needed to assess technical variability, were measured on 8 of the T0 samples.

The PLCO screening trial is a large randomized trial, starting in 1993, which examines the effects of screening on cancer-related outcomes in the United States (24). Biologic specimen were collected under a uniform protocol and placed in long-term storage at −70°C in a common PLCO Biorepository at Frederick, MD (25). Our analysis focused on 254 individuals, collected as healthy age and gender-matched controls for 254 colorectal cancer cases. At baseline (T0), the median age for the 143 men and 111 women were 65 and 63 years, respectively, and 98% of the women reported being postmenopausal. Metabolites were measured in serum samples from all 254 individuals at T0 and a randomly selected group of 30 individuals (14 women and 16 men) at year 1 (T1). To evaluate technical variability in PLCO, we used EDTA plasma samples collected as part of a separate pilot study that measured replicate samples collected during the fourth year (T4) of follow-up from 15 randomly selected healthy men. Previous studies have already shown that plasma and serum metabolite profiles behave similarly (26). Because of differences in the study populations, we do not conduct a combined analysis, and instead, use our observations from PLCO, which has fewer individuals with multiple measurements, to confirm the SPA results.

Metabolite measurement

Study samples were analyzed at the laboratory of Metabolon Inc. using ultra high-performance liquid-phase chromatography and gas chromatography coupled with mass spectrometry and tandem mass spectrometry, as described previously (27, 28). A nontargeted single extraction was used, followed by protein precipitation, to recover a diversity of metabolites. Relative quantities were obtained from mass spectroscopy peaks, and peaks were linked by informatics methods to metabolite identities. The list of measured metabolites includes, but is not limited to, amino acids, carbohydrates, fatty acids, androgens, and xenobiotics (Supplementary Table S1). Metabolites were individually normalized according to test-day.

Between-individual, within-individual, and technical variability

For each metabolite, we can estimate the variance across all measurements. We consider the variance of the transformed quantity, the log of the peak intensity, as it is the quantity most commonly used in association studies. After log-transformation, metabolites were approximately normally distributed (Supplementary Fig. S4). One goal is to decompose this total variance, |$\sigma _{\rm T}^2$|⁠, into 3 different components: the between subject variance, |$\sigma _{\rm B}^2$|⁠, which can also be considered the variance of the “usual” level in a population; the within subject variance, |$\sigma _{\rm W}^2$|⁠, which reflects the true year-to-year variability around the “usual” level within an individual; the technical variance or laboratory reproducibility, |$\sigma _{\rm E}^2$|⁠, which is the expected variance from 2 identical samples.

These 3 variance components can be combined into other quantities of interest.

The biologic variance (29, 30) is |$\sigma _{\rm B}^2 + \sigma _{\rm W}^2$|⁠.

The technical ICC is the proportion of the total variation that is attributable to biologic variance, as opposed to random laboratory error. The technical ICC is a common measure of laboratory accuracy or reproducibility.

We denote the proportion of the population's biologic variability that is due to the variation across individuals, by

The usual measurement “error,” or the variation around an individual's “usual” level, is |$\sigma _{\rm W}^2 + \sigma _{\rm E}^2$|⁠. Larger values of this “error” in the usual measurement often imply lower power for an epidemiologic study to detect associations. Therefore, we desire that the proportion, |$\pi _{\rm T}^{\rm B}$|⁠, of total variability attributable to between subject differences to be large. This proportion, |$\pi _{\rm T}^{\rm B}$|⁠, has also been known as the ICC (11, 20) in previous literature but we use |$\pi _{\rm T}^{\rm B}$| here to avoid confusion with the technical ICC.

We can estimate each of the 3 variance components, and the other relevant quantities, by using linear mixed models with the normalized log-transformed metabolite level, Y, as the outcome and random effects for subject, S, and year, T (nested within subject; ref. 9). Estimates of |$\pi _{\rm T}^{\rm B}$| in PLCO are based only on individuals with samples collected at year T0 and T1, whereas estimates of technical ICC are based only on the 15 individuals with samples collected at T4.

Evaluating associations with age, fasting status, and gender

For each metabolite, we can further partition the sources of variation by expanding upon Eq. (1). We now consider covariates for subject i: Gi = gender, Fi = fasting status, and Ai = age quartile, with both gender and fasting status being binary, 0 or 1, variables.

We expand Eq. (1) by including fixed effects for age quartile, gender, and fasting status, which are respectively represented by α, γ, and ϕ in Eq. (2). The subscripts, Gi, Fi, and Ai, indicate that we are including the fixed effect appropriate for subject i. By adjusting out these factors, we can assess the percentage of variability attributable to between-subject differences within specific demographic subcohorts.

We can now define the proportion of unexplained variability attributed to between subject differences using Eq. (2) as:

We will also identify the variance attributable to age, gender, and fasting, denoted by σ2(age), σ2(gender), and σ2(fasting) to get a better idea of their overall influence on the metabolomic profile. Exact definitions of these variance components are provided in Appendix A, but we can now define the total variance by

and examine the proportion of the variance attributable to each of the 3 covariates

Furthermore, we can assess whether the covariates are significantly associated with metabolite levels and obtain P values by conducting an ANOVA on the mixed models described by Eq. (2).

Global summaries

Until now, we have focused on statistics that describe the behavior of a single metabolite. For each metabolite, we have described calculating |$\hat \pi _{{\rm BW}}^{\rm B}$|⁠, |$\hat \pi _{\rm T}^{\rm B}$|⁠, and ICCest, our estimates for |$\pi _{{\rm BW}}^{\rm B}$|⁠, |$\pi _{\rm T}^{\rm B}$|⁠, and ICC, by fitting linear mixed models. However, we are also interested in statistics that can describe the global behavior of all metabolites that were reported in at least 90% of the samples. We will therefore report the proportion of these metabolites where the estimated parameters exceed 0.2, 0.5, or 0.8 and treat these proportions as estimates of the proportion of metabolites exceeding the corresponding threshold. As a global summary, we also estimate the proportion of these metabolites that are associated with age, gender, and fasting status by the maximum false discovery rate estimated among all metabolites.

Estimates of power

Our objective is to estimate the expected power for a case–control study focused on a single disease. Specifically, we estimate the average power, or the proportion of true metabolite-disease associations that are expected to be discovered, accounting for the 3 sources of variability and the testing of multiple metabolites.

We assume that a study will collect n individuals equally split between cases and controls and use a t test with the appropriate Bonferroni-corrected significance threshold to test for an association between the disease and each metabolite. We then estimate the power to detect each metabolite given its variance components measured in SPA and an assumed effect size. Study-level power averaged these values over all metabolites. We also consider the scenario where there are 1 to 5 samples per individual.

For purposes of interpretation and to enable comparisons with previously reported studies, we define the effect size for a given metabolite to be the relative risk of disease for an individual in the top quartile of the usual metabolite levels, as compared with the bottom quartile. Note, we still presume the metabolites are normally distributed and assume a t test is used in the study (Appendix B).

Measurement/technical variability

Within the 252 SPA samples, there were 567 observed metabolites. Of those 567 metabolites, 385 metabolites were observed in at least 90% of all samples and 341 were observed in 95% of all samples. We consider only those 385 most common metabolites for the remainder of this article. Of those 385 metabolites, the identities of 254 had already been determined.

The majority of technical ICCs, a measure of the similarity between replicate samples, were high (Fig. 1A). With the SPA samples, the estimated ICCs for 57%, 85%, and 97% of the metabolites exceeded 0.8, 0.5, and 0.2, respectively (Table 1). The distribution of ICCs was similar for all categories of metabolites (Supplementary Table S2). The distribution of Coefficient of Variation (CVs) is illustrated in Supplementary Fig. S3. When the analysis was repeated using the T4 samples from 15 men in PLCO, the distribution of estimated ICCs was nearly identical and is depicted by the red line in Fig. 1A. Among metabolites common to both studies, the reported ICCs were highly correlated (ρ = 0.48) but far from identical (Supplementary Fig. S1), as expected for 2 distinct populations, 1 from China and 1 from the United States, and given the limited sample size.

Figure 1.

The plots illustrate the distributions of the technical ICCs, a measure of laboratory variability (A) and |$\hat \pi _{\rm T}^{\rm B}$|⁠, a measure of between-individual variability (B). The x-axis represents the metabolite quantile ranking (e.g., 0.5 represents the median), the y-axis represents the actual ICC (A) or |$\hat \pi _{\rm T}^{\rm B}$| (B) quantile, and the curves (black for SPA and dashed red for PLCO) show the ICC (A) or |$\hat \pi _{\rm T}^{\rm B}$| (B) quantile for the specified metabolite quantile ranking [e.g., the median ICC is 0.84 and is illustrated by the height of the black curve in (A) being 0.84 above the quantile ranking of 0.5].

Figure 1.

The plots illustrate the distributions of the technical ICCs, a measure of laboratory variability (A) and |$\hat \pi _{\rm T}^{\rm B}$|⁠, a measure of between-individual variability (B). The x-axis represents the metabolite quantile ranking (e.g., 0.5 represents the median), the y-axis represents the actual ICC (A) or |$\hat \pi _{\rm T}^{\rm B}$| (B) quantile, and the curves (black for SPA and dashed red for PLCO) show the ICC (A) or |$\hat \pi _{\rm T}^{\rm B}$| (B) quantile for the specified metabolite quantile ranking [e.g., the median ICC is 0.84 and is illustrated by the height of the black curve in (A) being 0.84 above the quantile ranking of 0.5].

Close modal
Table 1.

The percentage of SPA metabolites with the specified parameter (ICC, |$\pi _{{\rm BW}}^{\rm B}$|⁠, |$\pi _{\rm T}^{\rm B}$|⁠) exceeding each threshold (0.2, 0.5, 0.8)

Parameter threshold
0.20.50.8
ICC 97% 85% 57% 
|$\pi _{{\rm BW}}^{\rm B}$| 93% 61% 23% 
|$\pi _{\rm T}^{\rm B}$| 87% 36% 3.6% 
Parameter threshold
0.20.50.8
ICC 97% 85% 57% 
|$\pi _{{\rm BW}}^{\rm B}$| 93% 61% 23% 
|$\pi _{\rm T}^{\rm B}$| 87% 36% 3.6% 

NOTE: The first row lists the percentage of metabolites in the SPA study with an estimated ICC coefficient that exceeds thresholds of 0.2, 0.5, and 0.8. The second row lists the percentages of metabolites where the estimated proportion of biologic variability (⁠|$\pi _{{\rm BW}}^{\rm B}$|⁠) attributable to between-subject differences exceeds these same thresholds. The third row lists the percentages of metabolites where the estimated proportion of total variability (⁠|$\pi _{\rm T}^{\rm B}$|⁠) attributable to between-subject differences exceeds these same thresholds.

Within and between-individual variability

Given only a single measurement, a study's power to detect long-term epidemiologic associations tends to be higher when |$\pi _{\rm T}^{\rm B}$|⁠, the proportion of total variability attributed to between subject differences, is larger. The estimates of |$\pi _{\rm T}^{\rm B}$| were generally lower than the estimated ICCs, with only 3.6%, 36%, and 87% of metabolites having |$\hat \pi _{\rm T}^{\rm B}$| exceeding 0.8, 0.5, and 0.2, respectively (Fig. 1B and Table 1). The distribution of |$\hat \pi _{\rm T}^{\rm B}$| was not unimodal, with 23 identified and 13 unidentified metabolites having high values of |$\hat \pi _{\rm T}^{\rm B}$| above 0.7. The majority of these metabolites were in the biosynthesis pathway of androsterone or markers of specific dietary habits. The metabolites with the lowest values of |$\hat \pi _{\rm T}^{\rm B}$|⁠, among all metabolites with an ICCest > 0.8, were more heterogeneous and included multiple markers for episodically consumed foods (Supplementary Table S1).

Within specific age and gender demographic groups, study power will be limited by |${\pi ^\prime} _{\rm T}^{\rm B}$|⁠, the proportion of variability attributed to between subject differences after adjusting for the covariates in Eq. (2). Similar to the distribution of |$\hat \pi _{\rm T}^{\rm B}$|⁠, 3.1%, 31%, and 83% of metabolites had values |$\hat \pi {^\prime} _{\rm T}^{\rm B}$| exceeding 0.8, 0.5, and 0.2. When estimating |$\pi _{\rm T}^{\rm B}$| in a subgroup of only women and among only metabolites measured in women, results again were nearly unchanged with 3.7%, 33%, and 88% of metabolites having values of |$\hat \pi {^\prime} _{\rm T}^{\rm B}$| exceeding 0.8, 0.5, and 0.2. Even among the 36 metabolites with the highest estimates of |$\hat \pi _{\rm T}^{\rm B}$|⁠, many of which were strongly associated with age and gender, the proportion of variation attributable to between subject differences after these adjustments only decreased minimally (Table 2).

Table 2.

A list of the identified metabolites with the highest values of between-subject variability, |$\hat \pi _{\rm T}^{\rm B}$| (e.g., the lowest within-subject variability), among all metabolites

|$\bm \hat \pi _{\rm T}^{\rm B}}$||${\bm \hat \pi _{\rm T}^{\rm B}}$||$\hat \pi _{\rm T}^{\rm B}$|P valueP value
A.G.A.WomenAgeGender
1,5-Anhydroglucitol (1,5-AG) 0.91 0.91 0.92 0.88 0.32 
4-Androsten-3β,17β-diol disulfate 1 0.9 0.86 0.88 0.085 <0.0001 
Pregnen-diol disulfate* 0.9 0.87 0.89 0.018 <0.0001 
DHEA-S 0.89 0.86 0.89 0.00039 <0.0001 
4-Androsten-3β,17β-diol disulfate 2 0.85 0.82 0.86 0.076 <0.0001 
Pyroglutamine 0.83 0.68 0.74 <0.0001 <0.0001 
Androsterone sulfate 0.82 0.77 0.82 0.022 <0.0001 
Andro steroid monosulfate 2 0.81 0.81 0.84 0.92 0.19 
5α-Androstan-3β,17β-diol disulfate 0.8 0.7 0.72 0.0064 <0.0001 
Epiandrosterone sulfate 0.79 0.73 0.78 0.033 <0.0001 
Pseudouridine 0.78 0.68 0.8 <0.0001 0.43 
Pregn steroid monosulfate 0.76 0.72 0.75 0.014 <0.0001 
3-(4-Hydroxyphenyl)lactate 0.76 0.7 0.69 0.0044 <0.0001 
21-Hydroxypregnenolone disulfate 0.76 0.74 0.77 0.13 0.00067 
α-Hydroxyisovalerate 0.76 0.72 0.67 0.4 <0.0001 
C-Glycosyltryptophan 0.74 0.63 0.75 <0.0001 0.15 
Urate 0.74 0.72 0.73 0.14 <0.0001 
Glutaroyl carnitine 0.72 0.69 0.65 0.0037 0.00081 
Creatine 0.72 0.62 0.55 0.00026 <0.0001 
3-Dehydrocarnitine 0.72 0.71 0.71 0.87 0.32 
1-Arachidonoylglycerophosphocholine 0.72 0.7 0.74 0.3 0.15 
2-Hydroxybutyrate (AHB) 0.71 0.71 0.71 0.43 0.24 
Undecanoate (11:0) 0.7 0.65 0.64 0.53 <0.0001 
|$\bm \hat \pi _{\rm T}^{\rm B}}$||${\bm \hat \pi _{\rm T}^{\rm B}}$||$\hat \pi _{\rm T}^{\rm B}$|P valueP value
A.G.A.WomenAgeGender
1,5-Anhydroglucitol (1,5-AG) 0.91 0.91 0.92 0.88 0.32 
4-Androsten-3β,17β-diol disulfate 1 0.9 0.86 0.88 0.085 <0.0001 
Pregnen-diol disulfate* 0.9 0.87 0.89 0.018 <0.0001 
DHEA-S 0.89 0.86 0.89 0.00039 <0.0001 
4-Androsten-3β,17β-diol disulfate 2 0.85 0.82 0.86 0.076 <0.0001 
Pyroglutamine 0.83 0.68 0.74 <0.0001 <0.0001 
Androsterone sulfate 0.82 0.77 0.82 0.022 <0.0001 
Andro steroid monosulfate 2 0.81 0.81 0.84 0.92 0.19 
5α-Androstan-3β,17β-diol disulfate 0.8 0.7 0.72 0.0064 <0.0001 
Epiandrosterone sulfate 0.79 0.73 0.78 0.033 <0.0001 
Pseudouridine 0.78 0.68 0.8 <0.0001 0.43 
Pregn steroid monosulfate 0.76 0.72 0.75 0.014 <0.0001 
3-(4-Hydroxyphenyl)lactate 0.76 0.7 0.69 0.0044 <0.0001 
21-Hydroxypregnenolone disulfate 0.76 0.74 0.77 0.13 0.00067 
α-Hydroxyisovalerate 0.76 0.72 0.67 0.4 <0.0001 
C-Glycosyltryptophan 0.74 0.63 0.75 <0.0001 0.15 
Urate 0.74 0.72 0.73 0.14 <0.0001 
Glutaroyl carnitine 0.72 0.69 0.65 0.0037 0.00081 
Creatine 0.72 0.62 0.55 0.00026 <0.0001 
3-Dehydrocarnitine 0.72 0.71 0.71 0.87 0.32 
1-Arachidonoylglycerophosphocholine 0.72 0.7 0.74 0.3 0.15 
2-Hydroxybutyrate (AHB) 0.71 0.71 0.71 0.43 0.24 
Undecanoate (11:0) 0.7 0.65 0.64 0.53 <0.0001 

NOTE: Rows include metabolite name, |$\hat \pi _{\rm T}^{\rm B}$|⁠, the equivalent value from the age and gender adjusted (A.G.A) model, the equivalent from a female-only model, P value for the metabolite's association with age, and P value for the metabolite's association with gender.

We also measured |$\hat \pi _{\rm T}^{\rm B}$| in the individuals from PLCO. Although these measurements were from serum samples, the distribution of |$\hat \pi _{\rm T}^{\rm B}$| was nearly identical in this population (Fig. 1B), and there was high correlation (ρ = 0.49) when comparing the estimates of between-subject variability among metabolites common to both groups. Supplementary Fig. S2 confirms that those metabolites with high values of |$\hat \pi _{\rm T}^{\rm B}$| in SPA have similarly high values in PLCO.

Our study design permits the estimation of |$\pi _{{\rm BW}}^{\rm B}$|⁠, the proportion of biologic variability that can be attributed to between-individual differences. Although these results are limited by our ability to distinguish technical and within-individual variability using only 8 replicate samples, the majority of natural variability seemed to be attributable to between-subject differences: 23%, 61%, and 93% of the metabolites having estimated values of |$\pi _{{\rm BW}}^{\rm B}$| exceeding 0.8, 0.5, and 0.2 (Table 1). Again, adjusting for age and gender did not alter our estimates much: 22%, 62%, and 92% of metabolites had estimates of |$\pi {^\prime} _{{\rm BW}}^{\rm B}$| exceeding 0.8, 0.5, and 0.2.

Age, gender, and fasting status

Those covariates suspected to have associations with metabolite levels were able to explain small, but statistically significant, proportions of the variation in many of the metabolites. We found that age, fasting status, and gender were correlated with 30%, 34%, and 67% of metabolites, respectively. Using the Bonferonni-adjusted α-level of 0.05/385, we find that 9.1%, 14.3%, and 7.3% of metabolites have a statistically significant association with age, fasting status, and gender, respectively. However, the proportion of the variability attributable to each metabolite was small, explaining why |$\pi _{\rm T}^{\rm B}$| changed little after adjusting for covariates. Figure 2 shows the proportion of total variability attributed to these covariates.

Figure 2.

A–C, illustrate the distribution of |$\hat \pi ({\rm Age})$|⁠, |$\hat \pi ({\rm Fasting})$|⁠, and |$\hat \pi ({\rm Gender})$|⁠. The x-axis represents the metabolite quantile ranking (e.g., 0.5 is the median), the y-axis represents |$\hat \pi$|⁠, and the curves show the |$\hat \pi$| for the specified metabolite quantile ranking (e.g., the median |$\hat \pi$| is less than 0.02 for all 3 covariates).

Figure 2.

A–C, illustrate the distribution of |$\hat \pi ({\rm Age})$|⁠, |$\hat \pi ({\rm Fasting})$|⁠, and |$\hat \pi ({\rm Gender})$|⁠. The x-axis represents the metabolite quantile ranking (e.g., 0.5 is the median), the y-axis represents |$\hat \pi$|⁠, and the curves show the |$\hat \pi$| for the specified metabolite quantile ranking (e.g., the median |$\hat \pi$| is less than 0.02 for all 3 covariates).

Close modal

Power

We quantified the effect size as the relative risk of disease when comparing individuals in the top and bottom quartiles of the usual metabolite level. However, when calculating power, we presumed a t test comparing cases and controls. Given this definition of effect size and the assumption that all measured metabolites are equally likely to be associated with the disease, a case–control study with a total of 500 individuals is expected to detect less than 1%, 38%, and 75% of the metabolites with a relative risk of 1.5, 3.0, and 5.0 (Fig. 3A). Similarly, a study with 1,000 individuals should detect 3%, 74%, and 92% and a study with 5,000 individuals should detect 55%, 97%, and 98% of metabolites with a relative risk of 1.5, 3.0, and 5.0. All estimates assume a conservative Bonferroni-adjusted α-level of 0.0013 = 0.05/385 (Fig. 3A and Table 3). Although these relative risks are larger than typically reported in epidemiologic studies, the naïve or observed relative risks would be lower and in-line with typical values. When the true relative risks are 1.5, 3.0, and 5.0, the naïve relative risks are expected to be 1.3, 2.0, and 2.8.

Figure 3.

The curves show the proportion of metabolites likely to be detected in a case–control study as a function of effect size. Effect size is defined by the relative risk (RR, x-axis) of disease when comparing individuals within the highest quartile of the “usual” metabolite level, as compared with the lowest quartile. The top axis indicates the “naïve” relative risk that would be observed in the specified case–control study when not adjusting for measurement error. Each figure varies one parameter: sample size, α-level, or number of samples/individual. A, power for studies with 500 (black), 1,000 (red), or 5,000 (green) individuals (α-level = 0.00013 = 0.05/385). B, power for studies that define significance as a P value below a threshold or α-level of 0.0013 = 0.05/385 (red), 0.001 (orange), and 0.01 (brown; 1,000 individuals). C, power for studies with 1, 2, 3, or 5 distinct blood samples (1,000 individuals). All measured metabolites are assumed equally likely to be associated with disease.

Figure 3.

The curves show the proportion of metabolites likely to be detected in a case–control study as a function of effect size. Effect size is defined by the relative risk (RR, x-axis) of disease when comparing individuals within the highest quartile of the “usual” metabolite level, as compared with the lowest quartile. The top axis indicates the “naïve” relative risk that would be observed in the specified case–control study when not adjusting for measurement error. Each figure varies one parameter: sample size, α-level, or number of samples/individual. A, power for studies with 500 (black), 1,000 (red), or 5,000 (green) individuals (α-level = 0.00013 = 0.05/385). B, power for studies that define significance as a P value below a threshold or α-level of 0.0013 = 0.05/385 (red), 0.001 (orange), and 0.01 (brown; 1,000 individuals). C, power for studies with 1, 2, 3, or 5 distinct blood samples (1,000 individuals). All measured metabolites are assumed equally likely to be associated with disease.

Close modal
Table 3.

The average power to detect associations by sample size (500, 1000, 5000) and relative risk (1.5, 3.0, 2.8) for a case-control study

Relative risk
N1.53.05.0
500 <1% 38% 75% 
1,000 2.9% 74% 92% 
5,000 55% 97% 98% 
Relative risk
N1.53.05.0
500 <1% 38% 75% 
1,000 2.9% 74% 92% 
5,000 55% 97% 98% 

NOTE: The entries list the average power to detect associations between metabolites and disease in a case–control study that has 500, 1,000, and 5,000 individuals and where the metabolites have true relative risks of 1.5, 3.0, and 5.0. These true values translate to naïve estimates for relative risk of 1.3, 2.0, and 2.8. *Corresponding naïve relative risks would be 1.3, 2.0, and 2.8.

We will detect higher proportions of those metabolites that have higher ICCs. Considering only those 36 metabolites with a |$\hat \pi _{\rm T}^{\rm B}$| more than 0.7, a case–control study with 1,000 individuals should detect 25%, 50%, and 80% of metabolites with a true relative risk of 1.7, 1.9, and 2.2. Focusing on only the 287, 142, and 36 metabolites with a |$\hat \pi _{\rm T}^{\rm B}$| exceeding 0.3, 0.5, and 0.7 would be equivalent to setting the α-threshold at 0.00017, 0.00035, and 0.0014.

If the most promising set of metabolites can be evaluated in a second stage of the study or if a complementary study can limit candidate pathways, requiring a family-wise error rate of 0.05 would be unnecessarily strict. If we raise the α-level to 0.001, a case–control study with 1,000 individuals should detect 25%, 50%, and 80% of metabolites with a true relative risk of 1.8, 2.1, and 2.8. The corresponding naïve relative risk would be respectively 1.5, 1.6, and 2.0. Figure 3B compares the power for studies with an α-threshold of 0.01, 0.001, and 0.00013.

Power would be improved by collecting multiple samples from each individual. Additional samples reduce the within-individual and technical variability. Figure 3C illustrates the gains in power from taking 2, 3, or 5 samples throughout the year for a 1,000 subject study, where we assume that the correlation between any 2 measures is independent of the time separating them. Collecting a second sample increases the study's power by 1.84×, 1.15×, and 1.05× when the relative risk are 2, 3, and 4, respectively.

Our objective was to assess the potential role of metabolomics in large epidemiologic studies, with a specific focus on case–control studies. We first showed that although LC/MS and GS-MS produced reliable and reproducible results, there was also considerable within-individual variability. Approximately, 40% of the biologic variability, on average, could be attributed to variation occurring within an individual over time. Using our estimates of technical and within-individual variability, we then estimated the power for detecting metabolite-disease associations in epidemiologic studies.

Although we assume associations will be tested by a t test or linear regression, we quantify a metabolite's effect size by the relative risk comparing individuals within the top and bottom quartiles of the metabolite's distribution. We show the need for a large number of samples in case–control studies, and expect our figures relating relative risk to power, as well as the distributions of |$\hat \pi _{\rm T}^{\rm B}$|⁠, |$\hat \sigma ^2$|⁠, and ICCest used to calculate that relationship, to serve as a guide for studies considering metabolomic profiling of samples. If laboratory variability can be reduced, perhaps by using a targeted approach of only a few metabolites, similar levels of power could be achieved with fewer individuals.

Our results corroborate and expand upon previous studies measuring metabolomic variability and estimating power for epidemiologic studies (11, 20). Our median 1-year |$\hat \pi _{\rm T}^{\rm B}$| of 0.43 was similar to an earlier targeted analysis of 163 metabolites that found a median 4-month |$\hat \pi _{\rm T}^{\rm B}$| (or ICC using their definition) to be 0.57 and an NMR analysis of 38 metabolites that also found a median 4-month |$\hat \pi _{\rm T}^{\rm B}$| around 0.57 (0.68/1.19), after accounting for technical variability. Our |$\hat \pi _{\rm T}^{\rm B}$| are likely lower, but perhaps more pertinent for planning epidemiologic studies, because of the 8 additional months between measurements and the fewer requirements imposed on study participants (e.g., no fasting). Unlike many metabolomic focused studies that control for diet (11, 20) or behavior (31), our samples were collected as part of SPA and PLCO, and therefore the observed variability will likely be more similar to that reported in future epidemiologic studies. Even with our slightly lower |$\hat \pi _{\rm T}^{\rm B}$| values, we reached the same qualitative conclusion as Nicholson and colleagues (11) that studies will require large sample sizes, upwards of 1,000 subjects, to detect metabolomic associations. We have further expanded upon Nicholson and colleagues' results by relating power to relative risk, considering different significance thresholds, and discussing the power from repeat measurements.

Studies should plan for, but not be discouraged by, the potentially high intraindividual variability. Strong associations between usual exposure levels and disease risk have allowed previous epidemiologic studies to overcome imprecision of this magnitude. For example, in postmenopausal women, insulin and estradiol have |$\hat \pi _{\rm T}^{\rm B}$| of 0.68 and 0.59, respectively, over a span of 1 to 3 years (32, 33), but studies, nonetheless, have successfully detected their associations with breast cancer (34, 35). Similarly, for heart-disease (36) and diabetes (4), studies have recently identified metabolites related to branch chain amino acids as important predictors of disease risk, with ORs ranging from 2 to 4 for top versus bottom quartile comparisons. For smoking-related cancers, a comparison of high versus low cotinine levels would be expected to yield ORs of up to 20+ (37).

Moreover, we showed that although a reasonably high proportion of metabolites were associated with age, fasting status, and gender, these 3 covariates only accounted for a small proportion of the total variability. Therefore, metabolomic profiles can still be useful for distinguishing risks within specific demographic cohorts, in that within-cohort variation is still high. Similarly, metabolomic profiles can still be useful even when epidemiologic studies did not impose dietary restrictions (e.g., fasting) before blood draws. These factors do little to affect our overall conclusions about detectable relative risks.

Our study had 5 main limitations. First, the SPA dataset only contained measurements on women at 2 time points, and therefore within-subject variability results are gender specific. However, similar results were seen in the PLCO dataset, which includes equal numbers of men and women. The second limitation is that we only calculated power for identifying disease associations with individual metabolites. There is also interest in identifying metabolic profiles, such as those created by partial least squares regression (PLS; ref. 38), that can differentiate cases and controls and be used as a diagnostic tool. Third, LC/MS and GC/MS platforms do not report the actual metabolite levels, but peak intensities and our estimates of parameters can be slightly sensitive to scale. Fourth, we only had a total of 23 replicates for assessing technical variability. While this may introduce imprecision for estimating the |$\hat \sigma _{\rm E}^{\rm 2}$| and |$\hat \pi _{{\rm BW}}^{\rm B}$| for individual metabolites, their distributions, across all metabolites, should be more accurate. Also, our power calculations combined within-individual and technical variability, and therefore were based on the larger sample set of 60 samples repeated over time. Finally, the |$\hat \pi _{\rm T}^{\rm B}$| may be underestimated if storage at −70°C had a variable effect on samples or if the additional year of storage had a significant impact on biomarker levels (39–41).

Even given the limitations of the study, we were able to assess the magnitude of each of the 3 variance components, and estimate the power for large case–control studies. Because the likely relative risks will depend on the disease, time between sample collection, disease diagnosis, and specimen type, we do not offer any universal conclusion about the use of metabolomics in epidemiologic studies. We do, however, strongly suggest considering our analysis of power when planning such studies.

We discuss the “variance” attributable to age, fasting status, and gender. However, we caution against any literal interpretation of their values, which is our reason for the quotes. We define the “variance” as the proportion of the total variance that can be explained by that covariate in a population where all categories are equally represented (e.g., 50% men/50% women and 50% fasting/50% nonfasting) and there is 1 sample per individual. Note that the variances are highly dependent on how we chose to categorize the variables and their distribution within SPA.

Dropping the quotes, we now define the variances for the 3 covariates to be

where αk is the fixed effect for age quartile k, γ1, and γ0 are the fixed effects for gender, ϕ1 and ϕ0 are the fixed effects for fasting, and:

These variances and their corresponding proportions π(Age), π(Fasting), and π(Gender), provide a measure of the influence of these 3 covariates on metabolite profiles.

This appendix provides the details for estimating power. If we define the effect of a metabolite in terms of its SD and the mean difference between cases and controls, this calculation would be trivial. The appendix is only needed because we choose to define the effect by the more interpretable relative risk. Again, the relative risk (RR) is defined as the probability of disease for an individual in the top quartile of the usual metabolite levels, as compared with an individual in the bottom quartile:

where D and X are random variables respectively indicating disease status and metabolite level, t0 is the threshold for the top quartile of X, and t1 is the threshold for the bottom quartile of X, i.e.:

We assume that the usual metabolite level within cases and controls are each normally distributed with respective means at μRR/2 and −μRR/2 and a common variance, |$\sigma _{\rm B}^2$|⁠. Thus Eq. (3) can be reformulated as

where Z is a normal variable with mean = 0 and variance = 1. We further assume that the disease has a prevalence of 0.1 and Eq. (4) can be reformulated as:

We can thus solve for μRR, t0, and t1 from the set of 3 equations earlier. For a given value of μRR, |$\sigma _{\rm T}^2$| and false-positive rate, α, the power for a case–control study is the probability that a chi-squared variable with noncentrality parameter |$n{\frac{{\mu _{{\rm RR}}^2}}{{\sigma _{\rm T}^2}}}$| and 1 degree of freedom (df) exceeds the 1 − α quantile of a central chi-squared distribution with 1 df.

We have defined the effect size for a given metabolite in terms of the “usual” metabolite level. The listed relative risks will therefore be substantially higher than those reported in previous epidemiologic studies that did not correct for measurement error. To assess whether the listed relative risks are reasonably in line with previous studies, we also report the naïve relative risk, RR′, or the uncorrected estimate. For each metabolite and each specified relative risk, we estimate the naïve relative risk by solving the following set of equations for RR′, t0′, and t1′, where μRR is calculated earlier and σB is replaced by σT:

We then estimate the average naïve relative risk for a given true relative risk across all metabolites.

No potential conflicts of interest were disclosed.

Conception and design: J.N. Sampson, X.O. Shu, R.Z. Stolzenberg-Solomon, A.W. Hsing, B.-T. Ji, R. Sinha, A.J. Cross, S.C. Moore

Development of methodology: J.N. Sampson, A.W. Hsing, R. Sinha, S.C. Moore

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): X.O. Shu, R.Z. Stolzenberg-Solomon, C.E. Matthews, A.W. Hsing, Y.T. Tan, B.-T. Ji, W.-H. Chow, Q. Cai, D.K. Liu, G. Yang, Y.B. Xiang, W. Zheng, R. Sinha, A.J. Cross, S.C. Moore

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.N. Sampson, S.M. Boca, X.O. Shu, R.Z. Stolzenberg-Solomon, C.E. Matthews, A.W. Hsing, Y.T. Tan, B.-T. Ji, R. Sinha, A.J. Cross, S.C. Moore

Writing, review, and/or revision of the manuscript: J.N. Sampson, S.M. Boca, X.O. Shu, R.Z. Stolzenberg-Solomon, C.E. Matthews, A.W. Hsing, Y.T. Tan, B.-T. Ji, W.-H. Chow, G. Yang, Y.B. Xiang, W. Zheng, R. Sinha, A.J. Cross, S.C. Moore

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Y.T. Tan, B.-T. Ji, D.K. Liu, Y.B. Xiang, W. Zheng, R. Sinha, A.J. Cross, S.C. Moore

Study supervision: X.O. Shu, B.-T. Ji, R. Sinha

The authors thank Dr. Mitch Gail (National Cancer Institute) for valuable discussions.

This study is, in part, supported by the Intramural Research Program of the NIH and the Breast Cancer Research Stamp Fund, awarded through competitive peer review. SWHS was supported by R37CA070867.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Wishart
DS
,
Tzur
D
,
Knox
C
,
Eisner
R
,
Guo
AC
,
Young
N
, et al
HMDB: the human metabolome database
.
Nucleic Acids Res
2007
;
35
:
D521
6
.
2.
Nicholson
JK
,
Wilson
ID
. 
Understanding ‘Global’ systems biology: metabonomics and the continuum of metabolism
.
Nat Rev Drug Discov
2003
;
2
:
668
76
.
3.
Dettmer
K
,
Aronov
PA
,
Hammock
BD
. 
Mass spectrometry-based metabolomics
.
Mass Spectrom Rev
2007
;
26
:
51
78
.
4.
Wang
TJ
,
Larson
MG
,
Vasan
RS
,
Cheng
S
,
Rhee
EP
,
McCabe
E
, et al
Metabolite profiles and the risk of developing diabetes
.
Nat Med
2011
;
17
:
448
53
.
5.
Suhre
K
,
Meisinger
C
,
Döring
A
,
Altmaier
E
,
Belcredi
P
,
Gieger
C
, et al
Metabolic footprint of diabetes: a multiplatform metabolomics study in an epidemiological setting
.
PLoS ONE
2010
;
5
:
e13953
.
6.
Abate-Shen
C
,
Shen
MM
. 
Diagnostics: the prostate-cancer metabolome
.
Nature
2009
;
457
:
799
800
.
7.
Jansson
J
,
Willing
B
,
Lucio
M
,
Fekete
A
,
Dicksved
J
,
Halfvarson
J
, et al
Metabolomics reveals metabolic biomarkers of Crohn's disease
.
PLoS ONE
2009
;
4
:
e6386
.
8.
Dodd
KW
,
Guenther
PM
,
Freedman
LS
,
Subar
AF
,
Kipnis
V
,
Midthune
D
, et al
Statistical methods for estimating usual intake of nutrients and foods: a review of the theory
.
J Am Diet Assoc
2006
;
106
:
1640
50
.
9.
Laird
NM
,
Ware
JH
. 
Random-effects models for longitudinal data
.
Biometrics
1982
;
38
:
963
74
.
10.
Carroll
RJ
. 
Measurement error in epidemiologic studies
.
Encyclopedia of Biostatistics
.
New York
:
John Wiley & Sons, Ltd.
; 
2005
.
11.
Nicholson
G
,
Rantalainen
M
,
Maher
AD
,
Li
JV
,
Malmodin
D
,
Ahmadi
KR
, et al
Human metabolic profiles are stably controlled by genetic and environmental variation
.
Mol Syst Biol
2011
;
7
:
525
.
12.
Norman
AW
. 
Sunlight, season, skin pigmentation, vitamin D, and 25-hydroxyvitamin D: integral components of the vitamin D endocrine system
.
Am J Clin Nutr
1998
;
67
:
1108
10
.
13.
Lee
SJ
,
Lenton
EA
,
Sexton
L
,
Cooke
ID
. 
The effect of age on the cyclical patterns of plasma LH, FSH, oestradiol and progesterone in women with regular menstrual cycles
.
Hum Reprod
1988
;
3
:
851
5
.
14.
Wallace
M
,
Hashim
YZ
,
Wingfield
M
,
Culliton
M
,
McAuliffe
F
,
Gibney
MJ
, et al
Effects of menstrual cycle phase on metabolomic profiles in premenopausal women
.
Hum Reprod
2010
;
25
:
949
56
.
15.
Katz
FH
,
Romfh
P
,
Smith
JA
,
Roper
EF
,
Barnes
JS
,
Boyd
JB
. 
Diurnal variation of plasma aldosterone, cortisol and renin activity in supine man
.
J Clin Endocrinol Metab
1975
;
40
:
125
34
.
16.
Secor
S
. 
Specific dynamic action: a review of the postprandial metabolic response
.
J Comp Physiol B
2009
;
179
:
146
52
.
17.
Kaplan
RC
,
Ho
GYF
,
Xue
X
,
Rajpathak
S
,
Cushman
M
,
Rohan
TE
, et al
Within-individual stability of obesity-related biomarkers among women
.
Cancer Epidemiol Biomarkers Prev
2007
;
16
:
1291
3
.
18.
Kotsopoulos
J
,
Tworoger
SS
,
Campos
H
,
Chung
F-L
,
Clevenger
CV
,
Franke
AA
, et al
Reproducibility of plasma, red blood cell, and urine biomarkers among premenopausal and postmenopausal women from the Nurses' Health Studies
.
Cancer Epidemiol Biomarkers Prev
2010
;
19
:
938
46
.
19.
Shah
SH
,
Hauser
ER
,
Bain
JR
,
Muehlbauer
MJ
,
Haynes
C
,
Stevens
RD
, et al
High heritability of metabolomic profiles in families burdened with premature cardiovascular disease
.
Mol Syst Biol
2009
;
5
:
258
64
.
20.
Floegel
A
,
Drogan
D
,
Wang-Sattler
R
,
Prehn
C
,
Illig
T
,
Adamski
J
, et al
Reliability of serum metabolite concentrations over a 4-month period using a targeted metabolomic approach
.
PLoS ONE
2011
;
6
:
e21103
.
21.
Peters
TM
,
Moore
SC
,
Xiang
YB
,
Yang
G
,
Shu
XO
,
Ekelund
U
, et al
Accelerometer-measured physical activity in chinese adults
.
Am J Prev Med
2010
;
38
:
583
91
.
22.
Peters
TM
,
Shu
X-O
,
Moore
SC
,
Xiang
YB
,
Yang
G
,
Ekelund
U
, et al
Validity of a physical activity questionnaire in Shanghai
.
Med Sci Sports Exerc
2010
;
42
:
2222
30
.
23.
Zheng
W
,
Chow
W-H
,
Yang
G
,
Jin
F
,
Rothman
N
,
Blair
A
, et al
The Shanghai Women's Health Study: rationale, study design, and baseline characteristics
.
Am J Epidemiol
2005
;
162
:
1123
31
.
24.
Prorok
PC
,
Andriole
GL
,
Bresalier
RS
,
Buys
SS
,
Chia
D
,
David
Crawford E
, et al
Design of the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial
.
Control Clin Trials
2000
;
21
:
273S
309S
.
25.
Hayes
RB
,
Reding
D
,
Kopp
W
,
Subar
AF
,
Bhat
N
,
Rothman
N
, et al
Etiologic and early marker studies in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial
.
Control Clin Trials
2000
;
21
:
349S
55S
.
26.
Yu
Z
,
Kastenmüller
G
,
He
Y
,
Belcredi
P
,
Möller
G
,
Prehn
C
, et al
Differences between human plasma and serum metabolite profiles
.
PLoS ONE
2011
;
6
:
e21230
.
27.
Sreekumar
A
,
Poisson
LM
,
Rajendiran
TM
,
Khan
AP
,
Cao
Q
,
Yu
J
, et al
Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression
.
Nature
2009
;
457
:
910
4
.
28.
Suhre
K
,
Shin
S-Y
,
Petersen
A-K
,
Mohney
RP
,
Meredith
D
,
Wagele
B
, et al
Human metabolic individuality in biomedical and pharmaceutical research
.
Nature
2011
;
477
:
54
60
.
29.
Lacher
DA
,
Hughes
JP
,
Carroll
MD
. 
Estimate of biological variation of laboratory analytes based on the third National Health and Nutrition Examination Survey
.
Clin Chem
2005
;
51
:
450
2
.
30.
Fraser
CG
,
Petersen
PH
. 
Desirable standards for laboratory tests if they are to fulfill medical needs
.
Clin Chem
1993
;
39
:
1447
53
.
31.
Nieman
DC
,
Gillitt
ND
,
Henson
DA
,
Sha
W
,
Shanely
RA
,
Knab
AM
, et al
Bananas as an energy source during exercise: a metabolomics approach
.
PLoS ONE
2012
;
7
:
e37479
.
32.
Kabat
GC
,
Kim
M
,
Caan
BJ
,
Chlebowski
RT
,
Gunter
MJ
,
Ho
GYF
, et al
Repeated measures of serum glucose and insulin in relation to postmenopausal breast cancer
.
Int J Cancer
2009
;
125
:
2704
10
.
33.
Hankinson
SE
,
Manson
JE
,
Spiegelman
D
,
Willett
WC
,
Longcope
C
,
Speizer
FE
. 
Reproducibility of plasma hormone levels in postmenopausal women over a 2–3-year period
.
Cancer Epidemiol Biomarkers Prev
1995
;
4
:
649
54
.
34.
Gunter
MJ
,
Hoover
DR
,
Yu
H
,
Wassertheil-Smoller
S
,
Rohan
TE
,
Manson
JE
, et al
Insulin, insulin-like growth factor-i, and risk of breast cancer in postmenopausal women
.
J Natl Cancer Inst
2009
;
101
:
48
60
.
35.
James
RE
,
Lukanova
A
,
Dossus
L
,
Becker
S
,
Rinaldi
S
,
Tjønneland
A
, et al
Postmenopausal serum sex steroids and risk of hormone receptor–positive and -negative breast cancer: a nested case–control study
.
Cancer Prev Res
2011
;
4
:
1626
35
.
36.
Wang
Z
,
Klipfell
E
,
Bennett
BJ
,
Koeth
R
,
Levison
BS
,
DuGar
B
, et al
Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease
.
Nature
2011
;
472
:
57
63
.
37.
Boffetta
P
,
Clark
S
,
Shen
M
,
Gislefoss
R
,
Peto
R
,
Andersen
A
. 
Serum cotinine level as predictor of lung cancer risk
.
Cancer Epidemiol Biomarkers Prev
2006
;
15
:
1184
8
.
38.
Trygg
J
,
Wold
S
. 
Orthogonal projections to latent structures (O-PLS)
.
J Chemom
2002
;
16
:
119
28
.
39.
Shih
WJ
,
Bachorik
PS
,
Haga
JA
,
Myers
GL
,
Stein
EA
. 
Estimating the long-term effects of storage at −70°C on cholesterol, triglyceride, and HDL-cholesterol measurements in stored sera
.
Clin Chem
2000
;
46
:
351
64
.
40.
Rundle
AG
,
Vineis
P
,
Ahsan
H
. 
Design options for molecular epidemiology research within cohort studies
.
Cancer Epidemiol Biomarkers Prev
2005
;
14
:
1899
907
.
41.
Tworoger
SS
,
Hankinson
SE
. 
Collection, processing, and storage of biological samples in epidemiologic studies: sex hormones, carotenoids, inflammatory markers, and proteomics as examples
.
Cancer Epidemiol Biomarkers Prev
2006
;
15
:
1578
81
.

Supplementary data