The mammographic appearance of the breast depends on its composition of radiolucent fatty tissue and more radiopaque epithelial or stromal tissue. Different radiological grading schemes like the density/pattern classification by Wolfe (1), or relative dense area assessment, show a strong association with subsequent development of breast cancer. In fact, breast density is perhaps the strongest but least recognized risk factor for breast cancer. Many studies have shown that women whose breast X-rays are composed of at least 50% “dense” area have a three to five times greater risk of breast cancer than women with less than 25% dense area (1, 2, 3, 4).

Breast density can be crudely graded using a subjective scale that takes into account the quantitative (amount of density) and qualitative nature of the density (diffuse or associated with ductal structures; Refs. 1, 5, 6). Qualitative methods have a limited number of density categories and can detect only very large changes in density. Because of their subjectivity, qualitative methods have substantial intra- and interobserver variation (7). A more quantitative approach has been used to measure the area of dense breast as a proportion of the total projected breast area, or “mammographic density” (1, 8, 9). Mammographic density is expressed as PD3 defined as PD = (radiographic dense area)/(total breast area) on a scale from 0 to 100% (10).

Mammographic density is not routinely quantified for research studies because current methods are time intensive, manual, and require expert training. On the other hand, a quantitative measurement appears to be superior to qualitative categorical methods such as Wolfe (5) and the American College of Radiology Breast Imaging and Reporting Data System (BI-RADS; Ref. 11). A recent tamoxifen trial measured breast density as a surrogate end point for breast cancer risk and found that the most significant annual changes in breast density were observed with a quantitative measurement (12).

Although, breast density has been shown to be a power indicator of cancer risk, there is no generalized method for training and validating individuals to perform the measure. We also ask the question: does it require a specialist in mammography to delineate the dense regions in the mammogram? If not, this may make the technique more clinically available by increasing the number of available trainees. In this study, we attempt to train people with a formal education in radiology, other fields of medicine, and those with a nonmedical background to quantify mammographic density using a predefined training program. A secondary goal of the study was to demonstrate the association of quantitative PD measurement with the risk of breast cancer.

### Subjects.

For 161 women ages 40 and older (64 with invasive breast cancer or ductal carcinoma in situ and 97 without breast cancer) who underwent screening mammography between April, 1985, and December, 1995, as part of the University of California-San Francisco (UCSF) Mobile Mammography Screening Program, we obtained one craniocaudal view of the right or left breast. For the 64 women that had been diagnosed with cancer, we selected the contralateral craniocaudal image without breast cancer taken at the time of diagnosis or a contralateral craniocaudal image from a screening examination that preceded the diagnosis of breast cancer. Women with bilateral breast cancer were not included in the study. Women without invasive breast cancer or ductal carcinoma in situ were randomly selected from the UCSF Mobile Mammography Screening Program database according to calendar year of the screening examination and age of the breast cancer cases. Institutional review board approval and patient informed consent were obtained.

### Measurements.

All of the films were initially assessed by a radiologist with training in mammography and density reading (gold standard, R.S-B.). Films were viewed directly using a standard radiology light box. A wax pencil was used to outline the breast area and breast densities. Films, with the wax pencil marks, were digitized on a Lumisys LumiScan 200 radiographic film digitizer (Kodak, Inc.) at a resolution of 200 × 200 μm2 and the PD was determined by measuring the total area of the breast and number of pixels outlined in the dense regions using dedicated computer software. The software is based on the commercially available medical image-processing package MEDx (Version 3.31, Sensor Systems, Sterling, VA). Extensions to this package were written in the open-source scripting language Tcl/Tk (Version 8.3, www.tcltk.com).

Using the gold standard PD measurement, films were stratified into deciles of PD. For every density decile, 10 noncancer and 10 cancer films, when available, were selected to be included in a validation data set, resulting in 60 cancer and 84 noncancer images with PD ranging from 0 to 100%.

The wax pencil marks were erased from the validation set and films were redigitized without marks and patient identifiers. The digitized film files were transferred to CD-ROM for review by study readers.

The reading station program randomized the order of all films and consecutively displayed them with a default brightness/contrast setting on a high-resolution radiographic monitor. The reader was prompted to manually trace the breast contour using a polygonal drawing tool (clicking with the computer mouse inserts a polygon vertex at the cursor). After confirming the breast contour outline was correct, the reader proceeded to outline the dense areas of the breast using a “pencil” tool (the mouse cursor acts like the tip of a pencil). The number of dense regions was not limited and included zero for breasts that appeared to have no dense regions at all. PD was calculated as the ratio of the sum of all dense regions (overlaps are not counted twice) to the entire breast area. PD, and all drawn contours, were then stored in a database linked to the reader’s study identification number, film identification number, and reading session number. Fig. 1 shows a screenshot of the main program window during an analysis session.

### Standardized Training and Study Readers.

We enrolled three radiologists with limited background in mammography (group RAD), two physicians with no background in radiology (group MD), and four nonphysicians (group NMD) with various backgrounds (one medical physicist, one research assistant with no experience in radiology, and two research assistants whose radiological experience was limited to bone densitometry.)

All of the readers were trained (by R.S-B.) in an hour-long training session in front of a light box (see Fig. 2). Mammography examinations ranging from very-low to very-high PD, and uniform to very-structured appearance were presented and discussed. After that, the readers were trained on the PD reading workstation (see Fig. 1). The readers could take as long as they wanted to read each film.

### Statistical Analysis.

Odds ratios to determine the association between breast density and breast cancer status were calculated in two ways. First, a fixed threshold of PD = 50% was chosen to discriminate between cancer and noncancer cases, and the odds ratio was calculated from the resulting contingency table. Second, to avoid bias introduced by the arbitrary threshold, we executed unconditional logistic regression analysis with PD as factor and cancer status as outcome. To arrive at meaningful SDs, we calibrated reading results to the gold standard as:

$PD_{\mathrm{calibrated}}\ {=}\ PD{\ast}\mathrm{slope}\ {+}\ \mathrm{intercept}$

with intercept and slope being regression of the particular reader to gold standard for the logistic regression. This is a technique commonly used when comparing performance between readers or devices.

Table 2 shows generalized ANOVA results for the comparison of overall reader-group performance. The RAD group exhibited highest intraclass correlation, followed by the MD group and the NMDs. The same was true when only validated readers were considered, but because only one NMD was validated, intraclass correlation could not be calculated for this group.

Table 3 lists odds ratios resulting from the unconditional logistic regression for a difference of PD = 1 SD, calculated for each study reader (ORSD). The ORSD values were significant and ranged from 1.7 to 2.1, being higher than that of the gold standard reader (ORSD = 1.9).

Interreader correlations for all of the study readers with a high correlation (r > 0.9) with the gold standard ranged from r = 0.82 to 0.94, showing that their readings correlated well with each other.

A principle finding of this study is that nonradiologists were able to be validated to read mammographic density. We also found that readers who had PD readings that were highly correlated with a gold standard could accurately discriminate between noncancer and cancer cases with odds ratios of 1.9 to 2.4 per population SD increase in PD. These results are similar to those reported by Boyd et al.(2), who used a six-category scale for semiquantitative assessment and found per-category risk increments of 1.4. Other groups have compared the extreme ends of the PD range. In a case-control study of 160 paired images, Wolfe et al.(1) compared the extreme ends of the PD range and found odds ratios of 4.3 (95% confidence interval, 1.8–10.4). In one of the definitive studies of breast density as an independent risk factor, with about 2000 controls and as many cases, Byrne et al.(3) found a steady increase of odds ratios associated with PD category when the readers categorized films into either the 0%-PD category or into one of successive 25%-PD categories. Specifically, comparing the odds of breast cancer for PD = 0% versus the odds at PD > 75% yielded a significant odds ratio of 4.5.

In other published studies, inter- and intrareader variability values range from 0.86 to 0.96, similar to the results reported here (13, 14). Jong et al.(13) found an overall correlation of 0.89 between two readers and noted that the type of dense tissue distribution (homogeneous, nodular, linear) had a strong influence. We did not see such an effect in this study. Although we did not evaluate reproducibility by type of dense tissue, we retrospectively categorized the films by their density. We did not observe differences in reproducibility between categories. This might be of particular importance for study populations with a skewed or preselected mammographic density distribution.

A limitation of this study is that not all of the readers were available for a second reading. Therefore, we could not present a complete picture of intra- and interreader variability. Although those readers could have exhibited a performance drop during the second read, three of four readers who did finish a second reading maintained or even improved their performance. In addition, as noted above, the intraclass correlations are similar to those reported by others. Our reproducibility analysis did not consider redigitization. Although we do not expect a large effect from digitization because routine digitizer quality assurance showed that the device is stable and linearly maps film absorbance to pixel gray-scale value, this step needs to be verified. Lastly, because we attempted to provide a full range of density values for all of the decades, this most probably inflated the PD variance and, thus, improved the correlation coefficient. However, it is our view that using this approach (stratified PD values in every decade) will allow others to reproduce our results.

Reader RAD1 in our study had PD readings highly correlated with the gold standard and high odds ratios predicting cancer risk on the first reading. When this reader was retested, he had a markedly lower correlation with the gold standard and slightly different slope and intercept. This suggests that PD readers for research studies may need to be monitored for consistent reading quality. Quantifying reproducibility is highly important because it influences the least significant difference that the breast density measure can detect with confidence. Continuous monitoring may be achieved by including films from the validation set with the study data so that, if necessary, a reader can be retrained, or removed if skills wane over time.

Of note, several readers achieved slightly, albeit nonsignificantly, higher odds ratios than the gold standard reader. This merely reflects that the gold standard itself is subjective but also suggests that the correlation criterion might need to be supplemented with a more objective one such as an odds ratio threshold. Automation would obviously alleviate the subjectivity problem. Efforts in this direction have recently been undertaken by a number of groups (15, 16, 17, 18) but are also based on maximization of the correlation to a human gold standard.

### Conclusions.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

Supported in part by a research grant from Synarc, Inc. Parts of this paper were presented as an InfoRad exhibit at the Radiological Society of North America Annual Meeting 2000, Abstract 9320IMA-i, Title “A Mammographic Density Reading Service for Clinical Drug Trials.”

3

The abbreviations used are: PD, percentage (breast) density; RMSE, root mean square error; RAD (group), radiologists (with limited background in mammography); MD (group), physicians (with no background in radiology); NMD (group), nonphysicians.

Fig. 1.

Screenshot of mammographic breast density analysis program (outlines enhanced for this article).

Fig. 1.

Screenshot of mammographic breast density analysis program (outlines enhanced for this article).

Close modal
Fig. 2.

Training session with the gold standard reader (R. S-B.).

Fig. 2.

Training session with the gold standard reader (R. S-B.).

Close modal
Table 1

The correlation coefficient (r) is with respect to the expert reader. RMSE and intercept are shown in units of PD. Regression is given as: reader value = gold value ∗ slope + intercept. Odds ratio (OR) to discriminate cancer status is based on a simple 50% PD threshold with 95% lower (LCL) and upper (UCL) confidence limits. All of the intercepts and slopes were significantly (P < 0.05) different from 0 and 1, respectively.

RAD1-YC 0.91 1.04 3.09 (1.55, 6.18)
RAD2-XG 0.94 −7 1.17 3.89 (1.57, 9.63)
RAD3 0.92 10 1.16 3.29 (1.36, 7.93)
MD1-BF150 0.89 17 0.94 1.86 (0.93, 3.70)
MD2 0.93 −6 1.15 3.50 (1.65, 7.41)
NMD1 0.93 1.14 2.87 (1.44, 5.71)
NMD2 2a 0.70 14 31 0.73 4.25 (1.79, 10.09)
NMD3 3a 0.81 11 0.75 3.32 (1.32, 8.34)
NMD4 1a 0.81 13 −1 0.90 3.39 (1.48, 7.76)
GOLD     4.88 (2.31, 10.28)
RAD1-YC 0.91 1.04 3.09 (1.55, 6.18)
RAD2-XG 0.94 −7 1.17 3.89 (1.57, 9.63)
RAD3 0.92 10 1.16 3.29 (1.36, 7.93)
MD1-BF150 0.89 17 0.94 1.86 (0.93, 3.70)
MD2 0.93 −6 1.15 3.50 (1.65, 7.41)
NMD1 0.93 1.14 2.87 (1.44, 5.71)
NMD2 2a 0.70 14 31 0.73 4.25 (1.79, 10.09)
NMD3 3a 0.81 11 0.75 3.32 (1.32, 8.34)
NMD4 1a 0.81 13 −1 0.90 3.39 (1.48, 7.76)
GOLD     4.88 (2.31, 10.28)
a

Last read was performed but still did not pass the validation criteria.

Table 2

Intraclass Correlations, r2, by reader groupa

All readers 0.98 (4) 0.88 (12) 0.81 (12)
Validated readers 0.98 (4) 0.88 (11)
All readers 0.98 (4) 0.88 (12) 0.81 (12)
Validated readers 0.98 (4) 0.88 (11)
a

The RMSE in units of PD are shown in parentheses.

Table 3

Logistic regression resultsa

Odds ratios (ORs) and their 95% confidence limits [lower (LCL) and upper (UCL)] of present cancer status for a change in PD of 1 SD for all validated readers (SD calibrated to gold standard) and the gold standard. SD is reader specific.

ReaderUnit (SD, % PD)OR (LCL, UCL)
MD1 21 1.68 (1.18, 2.40)
MD2 22 1.77 (1.24, 2.52)
NMD1 20 2.00 (1.39, 2.90)
Gold standard (R. S-B.) 19 1.87 (1.33, 2.64)
ReaderUnit (SD, % PD)OR (LCL, UCL)
MD1 21 1.68 (1.18, 2.40)
MD2 22 1.77 (1.24, 2.52)
NMD1 20 2.00 (1.39, 2.90)
Gold standard (R. S-B.) 19 1.87 (1.33, 2.64)

We thank Drs. Yen Chen, Xiaoguang Cheng, Bo Fan, Gottfried Schaffler, and Jing Li, Cullen Meade; Kathy Cross; and Victor Torres for participating as readers, and producing 2728 reading results. We are also grateful to Dr. Ying Lu for valuable statistical advice.

1
Wolfe J. N., Saftlas A. F., Salane M. Mammographic parenchymal patterns and quantitative evaluation of mammographic densities: a case-control study.
Am. J. Roentgenol.
,
148
:
1087
-1092,
1987
.
2
Boyd N. F., Byng J. W., Jong R. A., et al Quantitative classification of mammographic densities and breast cancer risk: results from the Canadian National Breast Screening study.
J. Natl. Cancer Inst. (Bethesda)
,
87
:
670
-675,
1995
.
3
Byrne C., Schairer C., Wolfe J. N., et al Mammographic features and breast cancer risk: effects with time, age, and menopause status.
J. Natl. Cancer Inst. (Bethesda)
,
87
:
1622
-1629,
1995
.
4
Saftlas A. F., Wolfe J. N., Hoover R. N., et al Mammographic parenchymal patterns as indicators of breast cancer risk.
Am. J. Epidemiol.
,
129
:
518
-526,
1989
.
5
Wolfe J. N. Risk for breast cancer development determined by mammographic parenchymal pattern.
Cancer (Phila.)
,
37
:
2486
-2492,
1976
.
6
Saftlas A. F., Szklo M. Mammographic parenchymal patterns and breast cancer risk.
Epidemiol. Rev.
,
9
:
146
-174,
1987
.
7
Kerlikowske K., Grady D., Barclay J., et al Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System.
J. Natl. Cancer Inst. (Bethesda)
,
90
:
1801
-1809,
1998
.
8
Brisson J., Merletti F., Sadowsky N. L., Twaddle J. A., Morrison A. S., Cole P. Mammographic features of the breast and breast cancer risk.
Am. J. Epidemiol.
,
115
:
428
-437,
1982
.
9
Boyd N. F., O’Sullivan B., Campbell J. E., et al Mammographic signs as risk factors for breast cancer.
Br. J. Cancer
,
45
:
185
-193,
1982
.
10
Byng J. W., Boyd N. F., Fishell E., John R. A., Yaffe M. J. The quantitative analysis of mammographic densities.
Phys. Med. Biol.
,
39
:
1629
-1638,
1994
.
11
Kopans D. B., D’Orsi C. J., Adler D. D., et al .
, Ed. 3 American College of Radiology Reston, VA
1998
.
12
Chow C. K., Venzon D., Jones E. C., Premkumar A., O’Shaughnessy J., Zujewski J. Effect of tamoxifen on mammographic density.
Cancer Epidemiol. Biomark. Prev.
,
9
:
917
-921,
2000
.
13
Jong R., Fishell E., Little L., Lockwood G., Boyd N. F. Mammographic signs of potential relevance to breast cancer risk: The agreement of radiologists’ classification.
Eur. J. Cancer Prev
,
5
:
281
-286,
1996
.
14
Ursin G., Astrahan M. A., Salane M., et al The detection of changes in mammographic densities.
Cancer Epidemiol. Biomark. Prev.
,
7
:
43
-47,
1998
.
15
Byng J. W., Yaffe M. J., Lockwood G. A., Little L. E., Tritchler D. L., Boyd N. F. Automated analysis of mammographic densities and breast carcinoma risk.
Cancer (Phila.)
,
80
:
66
-74,
1997
.
16
Boone J. M., Lindfors K. K., Beatty C. S., Seibert J. A. A breast density index for digital mammograms based on radiologists’ ranking.
J. Digit. Imaging
,
11
:
101
-115,
1998
.
17
Heine J. J., Velthuizen R. P. A statistical methodology for mammographic density detection.
Med. Phys.
,
27
:
2644
-2651,
2000
.
18
Lou S. L., Fan Y. Automatic evaluation of breast density for full field mammography .
SPIE Medical Imaging 2000
, SPIE San Diego, CA
2000
.