Most proteomics studies examine one blood specimen per participant; however, it is unknown how well measures at one time point reflect an individual's long-term proteome pattern. Therefore, we examined the stability of the proteome over 3 years in postmenopausal women not taking hormones for at least 3 months using surface-enhanced laser desorption and ionization time-of-flight mass spectrometry. Using the Nurses' Health Study blood cohort, we randomly selected 60 women from a subset providing 2 to 3 blood samples over 3 years. Four different protein chip surfaces/plasma fractions were examined: unfractionated plasma on a CM10 and H50 chip, pH ≥ 9, plasma fraction on a CM10 chip, and the organic fraction on the H50 chip, all with a low- and high-energy transfer protocol. Participant and quality control samples were aligned to a reference sample and then peak intensity was assessed for all peaks identified in the reference sample. The average coefficient of variation (CV) of the peak intensity within conditions ranged from 16% (H50, organic, low protocol) to 63% (CM10, pH ≥ 9, high protocol). Generally, the CV and mean peak intensity of the quality control samples were inversely correlated (median −0.48). The mean intraclass correlation (ICC) within conditions ranged from 0.37 (H50, unfractionated, low protocol) to 0.68 (CM10, unfractionated, high protocol). For a signal-to-noise cutoff of 2.0, we observed 334 peaks, of which 241 (72%) had an ICC of ≥0.40. Although we observed a large range of CVs and ICCs, sufficient numbers of peaks had reasonable ICCs to suggest that protein peak reproducibility over 3 years was reasonable among postmenopausal women not taking hormones. (Cancer Epidemiol Biomarkers Prev 2008;17(6):1480–5)

Mass spectrometry techniques offer methods to analyze multiple proteins simultaneously (i.e., the proteome) in a high-throughput fashion and with a low sample volume. Potential applications of this technology include detection of cancers, such as ovarian cancer, that are difficult to identify at an early stage, biomarker discovery for disease risk in diagnostic and epidemiologic studies, and therapeutic response markers. However, critiques of this technology have noted that reproducibility has not been carefully evaluated within or across laboratories, and that sometimes markers have been identified in areas of the spectrum likely reflecting matrix-associated noise (1-3). Further, recent data have suggested that the type of collection tube for blood (e.g., serum, EDTA plasma, heparin plasma) can alter the observed proteomic profiles (4-7), and a delay in processing by as little as 4 hours substantially changes serum, but not EDTA or heparin plasma, profiles (4, 6). As recent commentaries have noted (1-3), careful quality assessment of the performance of these techniques is needed to improve their reliability and eventual usefulness for discovering disease markers.

Although large projects, including the HUPO Plasma Proteome Project, are exploring the effects of various blood collection methods on proteomic profiles, no studies have assessed within-person variability of plasma protein profiles over time. This information is important for understanding whether one measure of the proteome is reflective of longer-term patterns, and is one key prerequisite for determining whether variations in markers of risk are meaningful.

Large epidemiologic cohorts with prospectively banked blood specimens, such as the Nurses' Health Study (NHS), offer the opportunity to perform studies of sample collection and processing methods as well as assess within-person variation over time. This study had three research goals: (a) to assess laboratory variation in measurement of proteomic peaks, (b) to examine the effect of delayed processing of up to 48 hours on the plasma proteome, and (c) to evaluate the reproducibility of proteomic profiles in postmenopausal NHS women not taking postmenopausal hormones over a 3-year period, using surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF).

Nurses' Health Study

This study used samples from the NHS, which began in 1976, when 121,700 female, registered nurses, 30 to 55 years of age and residing in 11 U.S. states completed an initial questionnaire. The cohort has been followed using biennially mailed questionnaires, with follow-up rates consistently at ∼90%. In 1989 to 1990, heparin blood samples (2-10 mL tubes) from 32,826 NHS participants were obtained and transported via overnight courier with a cold pack to our laboratory; 97% of the samples arrived within 26 h of being drawn (8). A brief questionnaire asked for time of day and date of the blood draw, number of hours since the woman had last eaten before blood draw, and current medication use. On arrival, blood samples were centrifuged and aliquotted into plasma, WBC, and RBC components. Cryotubes have been stored in the vapor phase of liquid nitrogen freezers at less than −130°C since that time.

Mass Spectrometry

All samples for this study were run using a SELDI-TOF (Ciphergen, PBS II) platform. The detailed procedures of sample preparation and preprocessing for the different SELDI-TOF surfaces have been reported previously (9). Briefly, protein detection was based on a spectrum of signals generated when a plasma sample was spotted on a chip surface and subjected to energy transfer (low or high energy) by a nitrogen laser beam. The low-energy protocol was optimized to detect proteins up to ∼20,000 Da and the high-energy protocol was optimized for larger proteins. The laboratory was blinded to quality control (QC) status and to the identity of samples from the same individual; samples from the same participant were run in the same batch.

Assessment of Laboratory Variability

QC samples were obtained from two sources. Individual women (herein called donors) who participated in the 1989 to 1990 NHS blood collected were selected to serve as QCs if they had contributed an extra plasma vial (i.e., the woman sent three 10-mL heparin tubes as opposed to the two we requested). For this type of QC, we sent duplicate samples from individual donors in the subsequent assays. Additionally, two large QC pools were used; these were created shortly after the initial blood collection using discarded plasma from blood donation centers; one pool was of premenopausal women and another pool was of postmenopausal women. Multiple aliquots of each pool were included in subsequent assays.

To assess laboratory error, we calculated the coefficients of variation (CV) for each peak separately by QC donor/pool by dividing the SD by the mean peak intensity (10). Then, we averaged the CVs across the QC types. This method of CV calculation was used for all subsequent analyses of laboratory variation. CVs of <20% are considered desirable (10), although if the between-person variability is very large, higher CVs may be acceptable (11).

Delayed Processing Study

To assess the effect of our blood collection methods on the reproducibility of proteomic profiles, we obtained heparin samples from male and female volunteers that were split into three equal parts. The first part was processed and frozen immediately after collection, whereas the second and third parts were shipped via overnight mail back to the laboratory (with a cold pack), where they were processed 24 and then 48 h after blood collection, thus mimicking the collection conditions of the NHS participants. We then compared profiles in samples processed and frozen immediately to those stored as heparinized whole blood for 24 or 48 h before processing.

For this aim, we included the 0-, 24-, and 48-h samples from 12 donors, in addition to a total of 12 QC samples from 2 plasma pools (2 replicates of each) and 4 individual donors (2 replicates of each), for a total of 48 samples. Samples were spotted in duplicate with no plasma fractionation on an H50 chip and run in one batch. We calculated the CV across the three processing method times within each donor and then averaged across all donors. Finally, we considered the number of shared peaks [e.g., peaks with the same mass/charge (m/z) ratio] across the three processing method times within donors.

Within-Woman Reproducibility Study

To assess within-person stability over time, over 300 NHS women were asked to collect two additional blood samples over the following 2 to 3 years after the initial blood collection. We randomly selected a subset of 60 women for the current study who provided blood samples at baseline and 2 years later; this subset has been used previously to assess reproducibility over time of various hormonal markers (12). Selection criteria included that women were postmenopausal, had no prior diagnosis of cancer (except nonmelanoma skin cancer), the blood samples for both draws were collected in the morning hours after at least an 8 h fast, samples were processed by our laboratory within 24 to 30 h of collection, and women were not using postmenopausal hormones at any blood draw. At the first collection, women had not used postmenopausal hormones for at least 3 months before the blood draw; after this period, sex hormone levels, which are strongly affected by exogenous hormone use, have returned to pre-use concentrations (13, 14). Further, eligible women did not use postmenopausal hormones between the first and third blood draws.

The final assay set for the reproducibility study included 120 participant samples (2 samples each, i.e., year 1 and year 3 sample, from 60 women) and 45 QC samples from 2 plasma pools (12 replicates of each) and 11 individual donors (2 replicates of each). We examined four different protein chip surface conditions on the SELDI, with two batches for each condition: unfractionated plasma on a CM10 chip, unfractionated plasma on an H50 chip, pH≥9, plasma fraction on a CM10 chip, and the organic fraction on the H50 chip. Details of the mass spectrometry protocol together with the raw data are provided on a supplementary Web page.6

We created a reference sample for this study by combining a small amount of plasma from each study participant and QC sample and stirring to create a homogenous mixture. Initial peak detection was then based on the reference sample, resulting in a reference profile, obtained in duplicate. The reference profile was assessed for protein peaks above a 2.0, 2.5, and 3.0 signal-to-noise ratio cutoff, and was used to interrogate and align sample spectra in the study participants and QC samples. Thus, an aligned, quantitative matrix was generated for the reference peaks in the entire data set. All samples were run in duplicate; we averaged the peak intensities for each identified peak from the duplicate runs.

For each of the four protein surface chip conditions used, we obtained data for both the high- and low-energy protocols, using three signal-to-noise ratio cutoffs (2.0, 2.5, and 3.0), for a total of 24 data sets. All analyses were run separately by data set, that is, for each energy protocol, signal-to-noise ratio cut off, and chip type/fractionation condition. The CVs for QC samples were determined as mentioned above (10). Additionally, we determined the mean peak intensity for each peak across all QC specimens. For each data set, we then calculated the Spearman correlation (10) between the CV of a peak, mean peak intensity, and the mass to charge (m/z) ratio of the peak. We also determined the percent of peaks in each data set with CVs <30% or 40%.

For the participant samples, we calculated the intraclass correlation (ICC) within-woman over time for each peak within each data set, using a mixed model with participant as the random variable (10); all peak intensities were natural log transformed for this analysis. This model estimated the within- and between-person variances for each peak. The ICC was calculated as the between-person variance divided by the total variance (10). For each peak, we also determined the mean peak intensity across all study participant specimens and, within data set, the number of peaks with at least a fair ICC (≥0.40; ref. 10). Finally, we calculated the Spearman correlation between the ICC, mean peak intensity, and the m/z ratio of the peak in each data set.

A past pilot study using both QC donors and pools has been published previously (9) and consisted of 14 samples containing seven donors each with two blinded aliquots. The average CVs between blinded replicates for was 20.1% (range 12.9-26.7%), suggesting that reproducible values could be obtained from the NHS specimens.

Delayed Processing Study

Among plasma from healthy donors, we observed an overall CV of 23% across the three conditions of delayed processing of blood samples (immediate to 0-, 24-, or 48-hour delay in processing). To better assess the effect of different processing delays on proteome stability, we calculated the CVs comparing 0- to 24-hour samples and 0- to 48-hour samples. Although 92% of the CVs when comparing samples with a processing time of 0 and 24 hours were ≤20%, only 80% of comparable CVs were ≤20% when examining the 0- and 48-hour samples. Furthermore, the number of shared peaks across the delayed processing times dropped by an average of 10% when including the 48-hour time point.

Within-Woman Reproducibility Study

NHS women in this study ranged in age from 51 to 68 years old (mean age 61 years) and were on average overweight (Table 1). Less than 30% of women reported past use of postmenopausal hormones or ever use of oral contraceptives. Use of other medications at blood collection was low. All samples were fasting, and, except one drawn around 11 a.m., were drawn between 6 a.m. and 10 a.m.

Table 1.

Characteristics at baseline blood collection of 60 postmenopausal women not taking postmenopausal hormones in the reproducibility study

CharacteristicMean (SD)
Age (y) 61.2 (4.5) 
Month of blood draw February 1990 (3.6 mo) 
Time of day of blood draw 8:17 a.m. (1.1 h) 
Body mass index (kg/m225.8 (5.3) 
Age at menarche (y) 12.5 (1.2) 
Age at menopause (y) 47.4 (5.9) 
Years between menopause and blood draw 13.9 (6.9) 
Parity* 3.6 (1.9) 
  

 
n (%)
 
Past postmenopausal hormone use 16 (26.7) 
Ever used oral contraceptives 17 (28.3) 
Used steroids at blood draw 0 (0.0) 
Used thyroid medication at blood draw 4 (6.7) 
Used antidepressants at blood draw 3 (5.0) 
Family history of breast cancer 9 (15) 
Family history of ovarian cancer 2 (3.3) 
CharacteristicMean (SD)
Age (y) 61.2 (4.5) 
Month of blood draw February 1990 (3.6 mo) 
Time of day of blood draw 8:17 a.m. (1.1 h) 
Body mass index (kg/m225.8 (5.3) 
Age at menarche (y) 12.5 (1.2) 
Age at menopause (y) 47.4 (5.9) 
Years between menopause and blood draw 13.9 (6.9) 
Parity* 3.6 (1.9) 
  

 
n (%)
 
Past postmenopausal hormone use 16 (26.7) 
Ever used oral contraceptives 17 (28.3) 
Used steroids at blood draw 0 (0.0) 
Used thyroid medication at blood draw 4 (6.7) 
Used antidepressants at blood draw 3 (5.0) 
Family history of breast cancer 9 (15) 
Family history of ovarian cancer 2 (3.3) 
*

Among women who were parous.

We assessed laboratory variability among the QC samples in the reproducibility study sample set. Average CVs of peak intensity within data sets ranged from 16% (H50, organic fraction, low protocol) to 64% (CM10, pH ≥ 9 fraction, high protocol; Table 2). The intra-assay CVs were similar to the interassay CVs (data not shown); however, the former were based on fewer QC samples, as only 6 of 11 QC donors had their two replicates in the same batch. The mean CVs decreased when increasing the signal-to-noise ratio cutoff; however, these changes were relatively small. In general, there was a strong inverse correlation between the CV and mean peak intensity of the QC samples (median r across data sets = −0.48), whereas there was no consistent correlation between the CV and m/z ratio of the peak. The percentage of peaks with CVs ≤30% ranged from 3% to 96% (median = 30%) across the conditions; for CVs ≤40%, the range was 14% to 100% (median = 57%).

Table 2.

Summary of peak CVs calculated using 45 QC samples from 2 plasma pools and 11 other individual donors

Laser protocolSignal-to-noise cutoffNo. peaksCV, mean (minimum-maximum)Correlation between CV and mean QC peak intensitiesCorrelation between CV and m/z ration (%) peaks, CV ≤30%n (%) peaks, CV ≤40%
CM10 unfractionated        
    Low 2.0 87 49.0 (21.0-108) −0.47 −0.11 9 (10) 35 (40) 
    Low 2.5 73 49.1 (21.5-135) −0.49 −0.12 8 (11) 26 (36) 
    Low 3.0 57 47.9 (21.4-135) −0.45 0.00 7 (12) 24 (42) 
    High 2.0 64 46.5 (21.2-98.6) −0.78 0.26 11 (17) 28 (44) 
    High 2.5 46 39.7 (10.9-88.1) −0.71 0.23 13 (28) 25 (54) 
    High 3.0 33 37.8 (12.4-66.6) −0.70 0.25 9 (27) 19 (58) 
CM10 fractionated pH ≥9        
    Low 2.0 39 56.3 (26.0-122) −0.03 −0.51 2 (5) 8 (21) 
    Low 2.5 30 56.7 (26.0-122) 0.09 −0.54 2 (7) 7 (23) 
    Low 3.0 22 56.9 (26.0-122) −0.07 −0.46 2 (9) 5 (23) 
    High 2.0 35 63.5 (27.4-115) 0.06 0.19 1 (3) 5 (14) 
    High 2.5 21 62.1 (27.6-115) 0.20 0.22 1 (5) 3 (14) 
    High 3.0 14 61.7 (27.3-115) −0.06 0.26 1 (7) 3 (21) 
H50 unfractionated        
    Low 2.0 86 38.7 (17.1-102) −0.35 −0.30 27 (31) 58 (67) 
    Low 2.5 71 36.6 (17.9-91.8) −0.39 −0.07 26 (37) 53 (75) 
    Low 3.0 62 36.4 (16.2-83.6) −0.43 −0.07 22 (35) 48 (77) 
    High 2.0 73 39.3 (11.0-98.1) −0.82 0.07 32 (44) 41 (56) 
    High 2.5 54 33.3 (11.1-87.1) −0.82 0.13 32 (59) 35 (65) 
    High 3.0 43 30.4 (10.2-75.2) −0.83 0.25 26 (60) 29 (67) 
H50 organic fraction        
    Low 2.0 49 18.4 (7.5-62.6) −0.43 −0.43 43 (88) 46 (93) 
    Low 2.5 38 18.7 (7.5-62.6) −0.56 −0.35 34 (89) 36 (95) 
    Low 3.0 28 16.0 (7.5-32.4) −0.49 −0.65 27 (96) 28 (100) 
    High 2.0 60 38.8 (8.3-89.8) −0.79 0.29 24 (40) 35 (58) 
    High 2.5 37 33.3 (8.3-78.1) −0.83 0.42 15 (41) 28 (76) 
    High 3.0 22 25.4 (8.3-52.3) −0.75 0.19 14 (64) 19 (86) 
Laser protocolSignal-to-noise cutoffNo. peaksCV, mean (minimum-maximum)Correlation between CV and mean QC peak intensitiesCorrelation between CV and m/z ration (%) peaks, CV ≤30%n (%) peaks, CV ≤40%
CM10 unfractionated        
    Low 2.0 87 49.0 (21.0-108) −0.47 −0.11 9 (10) 35 (40) 
    Low 2.5 73 49.1 (21.5-135) −0.49 −0.12 8 (11) 26 (36) 
    Low 3.0 57 47.9 (21.4-135) −0.45 0.00 7 (12) 24 (42) 
    High 2.0 64 46.5 (21.2-98.6) −0.78 0.26 11 (17) 28 (44) 
    High 2.5 46 39.7 (10.9-88.1) −0.71 0.23 13 (28) 25 (54) 
    High 3.0 33 37.8 (12.4-66.6) −0.70 0.25 9 (27) 19 (58) 
CM10 fractionated pH ≥9        
    Low 2.0 39 56.3 (26.0-122) −0.03 −0.51 2 (5) 8 (21) 
    Low 2.5 30 56.7 (26.0-122) 0.09 −0.54 2 (7) 7 (23) 
    Low 3.0 22 56.9 (26.0-122) −0.07 −0.46 2 (9) 5 (23) 
    High 2.0 35 63.5 (27.4-115) 0.06 0.19 1 (3) 5 (14) 
    High 2.5 21 62.1 (27.6-115) 0.20 0.22 1 (5) 3 (14) 
    High 3.0 14 61.7 (27.3-115) −0.06 0.26 1 (7) 3 (21) 
H50 unfractionated        
    Low 2.0 86 38.7 (17.1-102) −0.35 −0.30 27 (31) 58 (67) 
    Low 2.5 71 36.6 (17.9-91.8) −0.39 −0.07 26 (37) 53 (75) 
    Low 3.0 62 36.4 (16.2-83.6) −0.43 −0.07 22 (35) 48 (77) 
    High 2.0 73 39.3 (11.0-98.1) −0.82 0.07 32 (44) 41 (56) 
    High 2.5 54 33.3 (11.1-87.1) −0.82 0.13 32 (59) 35 (65) 
    High 3.0 43 30.4 (10.2-75.2) −0.83 0.25 26 (60) 29 (67) 
H50 organic fraction        
    Low 2.0 49 18.4 (7.5-62.6) −0.43 −0.43 43 (88) 46 (93) 
    Low 2.5 38 18.7 (7.5-62.6) −0.56 −0.35 34 (89) 36 (95) 
    Low 3.0 28 16.0 (7.5-32.4) −0.49 −0.65 27 (96) 28 (100) 
    High 2.0 60 38.8 (8.3-89.8) −0.79 0.29 24 (40) 35 (58) 
    High 2.5 37 33.3 (8.3-78.1) −0.83 0.42 15 (41) 28 (76) 
    High 3.0 22 25.4 (8.3-52.3) −0.75 0.19 14 (64) 19 (86) 

For data sets where at least half of the CVs were 40% or less, we calculated ICCs for each peak within the data set. The mean ICC ranged from 0.41 (H50, unfractionated, low protocol) to 0.68 (CM10, unfractionated, high protocol; Table 3). Mean and median ICC values within a data set were similar, suggesting that the ICCs had a normal distribution. For a signal-to-noise ratio cutoff of 2.0, we assessed 334 peaks, of which 241 (72%) had an ICC of ≥0.40. For a signal-to-noise ratio cutoff of 3.0, we assessed 188 peaks, of which 118 (63%) had an ICC of ≥0.40. The ICC was inversely correlated with the average peak intensity in each data set examined. Interestingly, in general, the ICC was inversely correlated with m/z ratio for the low-energy protocol data sets, but positively correlated with the m/z ratio for the high-energy protocol data sets. Adjustment for participant age, time of day of each blood draw, and date of blood draw did not notably change the ICC estimates (data not shown). Furthermore, the mean and median ICCs did not change after excluding peaks with CVs higher than 30% or 40% or when restricting to never postmenopausal hormone users (data not shown).

Table 3.

Summary of reproducibility of peaks over time as measured by the ICC, calculated using plasma from 60 women with two blood samples collected over 3 y

Laser protocolSignal-to-noise cutoffNo. of peaksICC, mean (SD)ICC, median (10th-90th percentile)Correlation between ICC and mean peak intensityCorrelation between ICC and m/z ration (%), peaks with ICC ≥ 0.40
CM10 unfractionated        
    High 2.0 64 0.68 (0.18) 0.70 (0.41-0.92) −0.45 −0.13 58 (91) 
    High 2.5 46 0.66 (0.17) 0.68 (0.41-0.86) −0.40 −0.19 42 (91) 
    High 3.0 33 0.64 (0.19) 0.65 (0.40-0.88) −0.37 −0.24 29 (88) 
H50 unfractionated        
    Low 2.0 88 0.41 (0.17) 0.39 (0.22-0.65) −0.31 −0.24 41 (47) 
    Low 2.5 71 0.38 (0.14) 0.37 (0.21-0.56) −0.34 −0.01 29 (41) 
    Low 3.0 62 0.37 (0.13) 0.33 (0.22-0.56) −0.29 −0.03 22 (35) 
    High 2.0 73 0.58 (0.18) 0.58 (0.36-0.80) −0.34 0.52 60 (82) 
    High 2.5 54 0.54 (0.16) 0.52 (0.35-0.75) −0.26 0.52 44 (81) 
    High 3.0 43 0.53 (0.15) 0.55 (0.36-0.72) −0.17 0.40 34 (79) 
H50 organic fraction        
    Low 2.0 49 0.49 (0.18) 0.50 (0.26-0.74) −0.55 −0.24 31 (63) 
    Low 2.5 38 0.50 (0.21) 0.48 (0.25-0.80) −0.54 −0.28 32 (84) 
    Low 3.0 28 0.47 (0.22) 0.42 (0.25-0.79) −0.52 −0.37 15 (54) 
    High 2.0 60 0.65 (0.20) 0.68 (0.36-0.88) −0.52 0.35 51 (85) 
    High 2.5 37 0.62 (0.22) 0.62 (0.34-0.91) −0.49 0.44 30 (81) 
    High 3.0 22 0.57 (0.16) 0.55 (0.39-0.78) −0.42 0.50 18 (82) 
Laser protocolSignal-to-noise cutoffNo. of peaksICC, mean (SD)ICC, median (10th-90th percentile)Correlation between ICC and mean peak intensityCorrelation between ICC and m/z ration (%), peaks with ICC ≥ 0.40
CM10 unfractionated        
    High 2.0 64 0.68 (0.18) 0.70 (0.41-0.92) −0.45 −0.13 58 (91) 
    High 2.5 46 0.66 (0.17) 0.68 (0.41-0.86) −0.40 −0.19 42 (91) 
    High 3.0 33 0.64 (0.19) 0.65 (0.40-0.88) −0.37 −0.24 29 (88) 
H50 unfractionated        
    Low 2.0 88 0.41 (0.17) 0.39 (0.22-0.65) −0.31 −0.24 41 (47) 
    Low 2.5 71 0.38 (0.14) 0.37 (0.21-0.56) −0.34 −0.01 29 (41) 
    Low 3.0 62 0.37 (0.13) 0.33 (0.22-0.56) −0.29 −0.03 22 (35) 
    High 2.0 73 0.58 (0.18) 0.58 (0.36-0.80) −0.34 0.52 60 (82) 
    High 2.5 54 0.54 (0.16) 0.52 (0.35-0.75) −0.26 0.52 44 (81) 
    High 3.0 43 0.53 (0.15) 0.55 (0.36-0.72) −0.17 0.40 34 (79) 
H50 organic fraction        
    Low 2.0 49 0.49 (0.18) 0.50 (0.26-0.74) −0.55 −0.24 31 (63) 
    Low 2.5 38 0.50 (0.21) 0.48 (0.25-0.80) −0.54 −0.28 32 (84) 
    Low 3.0 28 0.47 (0.22) 0.42 (0.25-0.79) −0.52 −0.37 15 (54) 
    High 2.0 60 0.65 (0.20) 0.68 (0.36-0.88) −0.52 0.35 51 (85) 
    High 2.5 37 0.62 (0.22) 0.62 (0.34-0.91) −0.49 0.44 30 (81) 
    High 3.0 22 0.57 (0.16) 0.55 (0.39-0.78) −0.42 0.50 18 (82) 

Although proteomic methods have been used extensively in medical research over the last several years, a number of methodologic issues surrounding their use are still under debate (1-3). Recent studies have reported on best practices for assay standardization, sample type, freeze-thaw cycles, and sample collection, processing, and storage (4-7, 9). For instance, our group has previously reported on intra-assay and interassay reproducibility of SELDI-TOF mass spectrometry using serum samples, as well as on the advantage of robotic automation and standardization of chip drying time (9). Other issues are less well studied. For example delayed processing of samples is a common challenge in large epidemiologic studies that have participants spread over a wide geographic area. An additional source of variability not addressed in previous studies is that within a person over time. For one blood sample to adequately reflect an individual's average proteomic profile, most protein peaks must be relatively stable over time; if this is true, it would be possible to identify disease markers with only one blood specimen. Thus, we examined the general issues of laboratory variability, effect of delayed processing on proteome peaks, and assessed within-person variability over 3 years within postmenopausal women not using hormones in the Nurses' Health Study.

The first goal of this study was to assess laboratory variability within blinded replicates of QC samples. In general, there was a wide range of CVs across the various protein chip surface types and plasma fractions. In the reproducibility study, the average peak intensity CVs within a data set ranged from 16% to 64%. Interestingly, for most of the data sets, there was a strong inverse correlation between the peak CV and the mean peak intensity of the QC samples for that peak. This is expected given that mass spectrometry platforms have a certain amount of “noise” in the peak spectra, particularly in the low-intensity range, which has been shown to contain matrix-associated variability (1-3). This increase in assay variability at low peak intensities raises an important methodologic issue in choosing QC samples for proteomic profiling studies. For many biomarker studies, only one to three QC pools are included across the sample set (11). However, within each QC pool, only some peaks will exist at a high intensity. Thus, more QC pools (>10) are needed to increase the likelihood that at least some of the QC samples will have reasonably high peak intensities for all peaks. This will allow for a more accurate estimation of assay variability and removal of peaks with a very high CV from further analysis.

The second goal of this study was to examine the effect of delayed sample processing on proteomic profiles. We observed that a 24-hour delay in processing did not substantially affect proteomic profiles in heparin plasma. This is consistent with previous studies of this sample type, although for serum samples even a 4-hour processing delay can substantially alter protein profiles (4, 7). In our study, a 48-hour delay appeared to alter the observable proteins for at least some of the samples and/or peaks, as evidenced by fewer shared peaks between the 0- and 48-hour samples versus the 0- and 24-hour samples. Thus, epidemiologic studies should pilot and standardize their collection methods on proteomic platforms before proceeding with analyses as long delays in processing can alter at least some protein peaks.

The third goal was to examine the amount of within-person variability over time of proteomic profiles. When conducting this analysis, we excluded data sets with extremely high CVs to reduce error in our ICC measures. In general, the majority of peaks had at least fair ICCs. The ICCs in this study are likely underestimated, because the measure of total variability (the denominator of the ICC) includes the assay variability, which was somewhat high. The various sample fractions and chip surfaces differed in the number of peaks with good ICCs (ranging from 35% to 91%). This suggests that certain chip types and plasma fractions may reflect plasma proteins with differing levels of reproducibility within person over time. Thus, future studies should consider including a reproducibility study from the population of interest within the primary study. This will allow the investigator to examine peaks with high reproducibility within a person over time in the analysis.

We observed that plasma protein peaks with either a low or high m/z ratio tended to have higher ICCs. Although it is unclear why this might be, it is possible that peaks in the middle of the m/z range are more likely to (1) represent multiple proteins, which means the ICC would reflect the combined reproducibility of all the proteins in that peak, or (2) are less likely to have strong homeostatic regulation of protein levels. However, it is not possible to definitively determine whether these or other factors are important in this observation. We also observed that the ICC was inversely correlated with the average peak intensity across the participant samples. Given that the CVs tend to be smaller for lower peak intensities, this finding was somewhat counterintuitive and it is unclear whether this was a chance finding or has some biological significance.

The major strength of our study is the relatively homogenous population of postmenopausal women not taking postmenopausal hormones and having had samples prospectively collected over 3 years, although our results can only be directly generalized to this population. Nonetheless, other unmeasured factors, such as changes in medication use or diet, may have increased the within-person variability over time, thus lowering the observed ICCs. However, populations in longitudinal epidemiologic studies rarely maintain exactly the same characteristics over time so our results may better reflect within-person variation that would be observed in such studies. The primary limitation is the collection method used to obtain the samples in this study. Because NHS participants live across the entire United States, we asked them, because they were nurses, to collect their own blood sample and mail it back to our laboratory where it was processed. This means that there was a delay in processing into the plasma, buffy coat, and RBC fractions. Although previous studies have recommended immediate processing (4), our pilot study suggested that delayed processing up to 24 hours was acceptable for a substantial number of proteins. We used samples that were processed within 24 to 30 hours of collection for the reproducibility study to minimize the effect of this issue on the estimated ICCs. We also specifically selected women whose two blood samples were collected in the morning and after fasting for at least 10 hours to reduce effects of circadian rhythms and eating on protein profiles. Another limitation is that we assessed a relatively small subset of the overall proteome because we did few fractionations and only examined peaks with a signal-to-noise ratio of at least 2.0. Future studies should examine additional protein subsets. Also, it is difficult to separate the effects of within-person variability from those of long-term storage in this study. However, few analytes seem to be altered by only a difference of 2 years of storage (11, 15), especially when kept at very cold temperatures (all samples were kept at less than −130°C). Further, the CVs were relatively high in the QC samples of the reproducibility study; although this is not optimal, the between-person variation in peak intensities is very large and seems to outweigh the assay variability for the majority of peaks.

It also should be noted that SELDI-TOF mass spectrometry has been associated with certain strengths and limitations. The major strength is that it lends itself to a high-throughput and technically straightforward application to large-scale epidemiologic studies. Limitations include primarily the inability to directly identify protein IDs within the peak patterns, somewhat limited peak resolution, and issues related to interlaboratory reproducibility (9). However, the purpose of this study was not to directly compare the advantages and disadvantages of different proteomic platforms but rather to evaluate the stability of proteomic profiles over time, as a prerequisite for the design of major biomarker discovery studies.

In conclusion, this is the first study to our knowledge to examine the reproducibility of plasma proteomic profiles within an individual over time. Our results suggest that plasma protein peak reproducibility over 3 years was reasonable for the majority of peaks among postmenopausal women not taking postmenopausal hormones, such that one sample may reflect the proteome pattern over time. We also observed that the laboratory variability of this method is somewhat higher than is desirable; however, this may be acceptable, particularly in discovery studies, given the wide between-person variability in most protein peaks. Further, delayed processing of up to 24 hours seems acceptable for heparin plasma. Future studies should consider eliminating peaks with poor CVs or a low ICC by including multiple QC and reproducibility samples in their design.

No potential conflicts of interest were disclosed.

Grant support: NIH grants CA49449, CA119139, CA006516, and CA017352; the Harvard Medical School Center of Excellence Fund for Women's Health Research and the Ovarian Cancer Specialized Programs of Research Excellence grant P50 CA105009 (Career Development Award).

Note: S.S. Tworoger and D. Spentzos contributed equally and should be considered co-first authors. T.A. Liebermann and S.E. Hankinson contributed equally and should be considered co-senior authors.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement inaccordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank Mr. Christopher Murphy for his invaluable assistance in programming the statistical analysis.

1
Diamandis EP. Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems.
J Natl Cancer Inst
2004
;
96
:
353
–6.
2
Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics.
J Natl Cancer Inst
2005
;
97
:
315
–9.
3
Baggerly KA, Morris JS, Edmonson SR, Coombes KR. Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer.
J Natl Cancer Inst
2005
;
97
:
307
–9.
4
Banks RE, Stanley AJ, Cairns DA, et al. Influences of blood sample processing on low-molecular-weight proteome identified by surface-enhanced laser desorption/ionization mass spectrometry.
Clin Chem
2005
;
51
:
1637
–49.
5
Haab BB, Geierstanger BH, Michailidis G, et al. Immunoassay and antibody microarray analysis of the HUPO Plasma Proteome Project reference specimens: systematic variation between sample types and calibration of mass spectrometry data.
Proteomics
2005
;
5
:
3278
–91.
6
Rai AJ, Zhang Z, Rosenzweig J, et al. Proteomic approaches to tumor marker discovery.
Arch Pathol Lab Med
2002
;
126
:
1518
–26.
7
Tammen H, Schulte I, Hess R, et al. Peptidomic analysis of human blood specimens: comparison between plasma specimens and serum by differential peptide display.
Proteomics
2005
;
5
:
3414
–22.
8
Hankinson SE, Willett WC, Manson JE, et al. Plasma sex steroid hormone levels and risk of breast cancer in postmenopausal women.
J Natl Cancer Inst
1998
;
90
:
1292
–9.
9
Aivado M, Spentzos D, Alterovitz G, et al. Optimization and evaluation of surface-enhanced laser desorption/ionization time of flight mass spectrometry (SELDI-TOF with MS) reversed phase protein arrays for protein profiling.
Clin Chem Lab Med
2005
;
43
:
133
–40.
10
Rosner B. Fundamentals of biostatistics. Belmont (CA): Duxbury Press; 2005.
11
Tworoger SS, Hankinson SE. Use of biomarkers in epidemiologic studies: minimizing the influence of measurement error in the study design and analysis.
Cancer Causes Control
2006
;
17
:
889
–99.
12
Hankinson SE, Manson JE, Spiegelman D, Willett WC, Longcope C, Speizer FE. Reproducibility of plasma hormone levels in postmenopausal women over a 2-3-year period.
Cancer Epidemiol Biomarkers Prev
1995
;
4
:
649
–54.
13
Tworoger SS, Missmer SA, Barbieri RL, Willett WC, Colditz GA, Hankinson SE. Plasma sex hormone concentrations and subsequent risk of breast cancer among women using postmenopausal hormones.
J Natl Cancer Inst
2005
;
97
:
595
–602.
14
Jarvinen A, Kainulainen P, Nissila M, Nikkanen H, Kela M. Pharmacokinetics of estradiol valerate and medroxyprogesterone acetate in different age groups of postmenopausal women.
Maturitas
2004
;
47
:
209
–17.
15
Tworoger SS, Hankinson SE. Collection, processing, and storage of biological samples in epidemiologic studies: sex hormones, carotenoids, inflammatory markers, and proteomics as examples.
Cancer Epidemiol Biomarkers Prev
2006
;
15
:
1578
–81.