We sought to assess the reliability of information regarding the maternal history of cancer by comparing the medical records of 214 women with breast cancer, ages 26–59 years and diagnosed in 1974–1995, and of their controls with the medical records of their mothers. Subjects were members of Kaiser Permanente, Northern California, selected for a study of early-life predictors of breast cancer. For any type of cancer identified in the mother’s medical record, the proportion noted in the daughter’s medical record at least 6 months before the daughter’s diagnosis or reference date was 56%among cases and 32% among controls. The odds ratio for the association of maternal cancer history with breast cancer risk was 2.1 using the maternal record and 3.5 using the subject’s record. For a maternal history of breast cancer, the proportion noted in the subject’s record was 79% among cases and 57% among controls, and the odds ratios were 4.0 and 6.5, respectively. We believe that the case-control difference in missing information was attributable to higher utilization of breast cancer screening among cases. This study illustrates the need to assess the impact of screening differences on the ascertainment of information from the medical records.

The family history of cancer is an important covariate in etiological studies of breast and other cancers and can be obtained from the medical record if: (a) the study subject was knowledgeable of her family history; (b) she reported her family history to her health care provider; and (c) the provider recorded the information in the record. Previous estimates of the reliability of the family history of disease have compared information recorded in the medical record with self-reported information obtained during an interview (1, 2). As part of a case-control study of early-life predictors of breast cancer, we obtained information not only from the subjects’ but also from the mothers’ medical records. Thus, we were able to estimate the accuracy of medical-record-based maternal cancer history using the maternal record as the standard.

The data were collected for a case-control study designed to assess the relationship of childhood height, weight, and other characteristics with risk of breast cancer. Cases with breast cancer and their controls were selected from the membership of Kaiser Permanente, a large health maintenance organization operating in Northern California since 1946. Women whose cancer was diagnosed during the period 1974–1995 and who had joined Kaiser Permanente when ≤12 years old were identified through the Kaiser Permanente Regional Cancer Registry, which reports directly to the Northern California Cancer Registry, a participant in the SEER3program. Controls were sampled from the health-plan membership after frequency-matching on age and birth year, and they were assigned a reference date so that the distribution of reference dates in the controls corresponded to the distribution of diagnosis dated in the cases. The controls were required to have been members of Kaiser Permanente on their reference date. Of the 214 cases and 214 controls included in the case-control study, 38 were excluded from the present analysis because 9 (6 cases, 3 controls) were adopted and 29 (19 cases,10 controls) did not have a maternal medical record. Thus, 189 cases and 201 controls were included in the present analysis. The study was restricted to women who had first joined Kaiser Permanente at age 12 years or younger because the primary hypothesis of the study concerned early-life characteristics. Hence, maternal records were generally available in the health plan. This restriction and the fact that the earliest possible year of entry into the health plan was 1946 resulted in the upper age limit on cases and controls being 59 years. Trained medical record analysts obtained information on height, weight, family history, reproductive factors, hormone use, and history of benign breast disease recorded before the diagnosis/reference date from the subjects’ medical records. We also obtained information from the parental medical records, including parental height, weight, and personal history of cancer.

The medical record analyst ascertained from the subject’s chart a “history of cancer in the mother” from notes made 6 months or longer before the case’s diagnosis date or the corresponding reference date in the control. Diagnosis year and cancer site were recorded if available. If there was no mention of family history in the medical record, the subject’s maternal history was coded as negative. The medical record analyst ascertained from the mother’s medical record the mother’s history of cancer before the diagnosis or reference date. This information could have come from self-reported information of a personal history of cancer or from records of the cancer diagnosis or treatment. Again, no mention of a personal history was coded as negative, and information on diagnosis date and cancer site were recorded if available. The mother’s cancer history was coded as“unknown” if more than 10 years had elapsed since the last chart entry before the diagnosis date or reference date in controls, or if minimal information was recorded in the medical record as determined by the medical record analyst.

Family history information from the subject’s record was cross-tabulated with personal history information from the mother’s record separately for cases and controls, and the odds ratios for the two sources of family history information were computed and compared to estimate the bias, and the κ statistic was computed to summarize the level of agreement (3). In further analyses,we examined whether case-control differences could be explained by differences in the distributions of: (a) the year of the mother’s diagnosis; (b) the mother’s age at diagnosis; or(c) the daughter’s reference date. We hypothesized that an earlier diagnosis of breast cancer in the mother would have allowed greater time for the information to be transmitted to the daughter and to the provider and, in regard to reference date, that providers may have been more thorough in taking family histories in more recent years. These hypotheses were tested using the Wilcoxon rank-sum test (4) in the 91 subjects whose mothers had a history of cancer.

The women included in the study were relatively young with one-half of them being diagnosed in their 20s and 30s and one-half being diagnosed during the earlier years of the study, 1974–1990. There were striking differences in missing information between cases and controls on some variables including marital status, alcohol use,reproductive history, and menopausal status (Table 1).

Fifty-seven (30%) of the mothers of cases and 34 (17%) of the mothers of controls had a history of cancer noted in their medical record. The proportion of these that were noted in the subject’s(daughter’s) medical record was 56% among cases and 32% among controls. The odds ratio of 2.1 for the association of any maternal cancer history with risk of breast cancer calculated using the maternal medical record changed to 3.5 when the subject’s medical record was used as the source of information. κ a measure of the precision of the subject’s medical record relative to the maternal medical record,was 0.59 for cases and 0.42 for controls. In addition, 128 case mothers had no recorded history of cancer in their records, but 4% of these were recorded as having had a history in the subject’s record. For the 162 control mothers, this proportion was 1% (Table 2). We examined the case-control difference in the mother’s age at diagnosis and the year the mother was diagnosed with cancer to evaluate whether or not the mothers of cases had earlier diagnoses. The mean age of the cancer diagnosis among mothers of cases was 58 years while the mean age among mothers of controls was 61 years (P =0.213). For both cases and controls, the median year of the mother’s diagnosis was 1978 and the median reference year was 1992.

We also examined reports specifically of breast cancer because this was the focus of our case-control study. Twenty-four (13%)mothers of cases and seven (3%) mothers of controls had a history of breast cancer noted in their medical record. The proportion of these that were noted on the subject’s medical record was 79% among cases and 57% among controls (Table 3). This case-control difference in ascertainment changed the odds ratio for the association of maternal history of breast cancer with risk of breast cancer from 4.0 using the maternal record to 6.5 using the subject’s record. The κ was 0.71 for cases and 0.66 for controls. The mean age of the mother’s breast cancer diagnosis was 57 years for cases, compared with 63 years for controls (P = 0.149). The median for the year of mother’s diagnosis was 1981 for cases and 1983 for controls (P = 0.831). The median for daughter’s reference year was 1993 for cases and 1990 for controls(P = 0.340).

We sought to estimate the quality of the medical record to determine a woman’s maternal history of cancer by comparing the subject’s and mother’s medical records. The primary limitation of this study that should be kept in mind when interpreting the results pertains to the generalizability of the sample. Our sample comprised younger women, who were long-term members of a health maintenance organization and who were diagnosed with breast cancer during 1974–1995. The age distribution of the women at diagnosis may differ from other studies using the medical record to ascertain family history. Also, having nearly complete access to the medical records that cover such a long period of a woman’s life is unusual.

With respect to missing information on covariates such as marital status, alcohol use, and presumably family history, we believe, through indirect evidence, that the case-control difference can be attributed to earlier identification of breast cancer among frequent users, who would have undergone more frequent breast cancer screening. Frequent users would also have more opportunity to provide information on the covariates. Evidence for this comes from the comparison of the proportion of DCIS in our sample relative to the proportion in the entire health plan because DCIS is diagnosed through screening. In our sample of women diagnosed during 1974–1995, 20% of the cases had a diagnosis of DCIS compared with 15% among cases region-wide in Kaiser Permanente Northern California during 1990–1995.4In other words, we believe that, had the rate of screening been higher among the controls, more of them would have been identified as cases,and more of them would have been identified as having a maternal history of cancer and had their marital status and alcohol status recorded. We believe that differences in the recording of information because of knowledge of the subject’s cancer status could not explain the case-control difference, because we ascertained family history only if it was noted at least 6 months before the reference date. In addition, we re-abstracted charts to determine to what extent missing information was attributable to error by the medical record abstractor,but we found that the abstractor’s error rate was very low and did not account for the case-control difference. In our sample, it is not necessary to be a user in order to be a case, because non-utilizing cases are noted in the Kaiser Permanente cancer registry through feedback from the local SEER registry. Thus, it was not utilization per se that differed between cases and controls, but it was utilization that increased the probability that a woman’s breast cancer was diagnosed (i.e., breast cancer screening).

In our sample of women ages 26–59 years, 30% of cases and 17% of controls had a maternal cancer history. At best, only 60% of all of the cancers were documented in the daughter’s medical record, although for breast cancer, the percentage was higher. We believe that the case-control difference in the completeness of information in the subjects’ medical records was attributable to greater utilization,especially breast cancer screening, among cases. Although we observed a case-control difference in the quantity of missing data, there was only a small case-control difference in the precision of the information.

We are not aware of any other studies comparing subjects’medical records with the medical records of their mothers. However,there have been comparisons of subjects’ self-reported family history with their own medical record or the medical record of their relatives. Kerber and Slattery (5) compared interview information with geneological information among 125 Utah residents with colon cancer, ages 30–79 years at diagnosis during 1992–1995, and 206 matched controls. The geneology record had been linked to the Utah Cancer Registry. They observed a κ of 0.73 among cases and 0.58 among controls. Two other studies of breast cancer compared the subjects’interview information with the subjects’ medical record and noted somewhat higher levels of agreement than we observed, as would be expected (1, 2).

As the medical record becomes more automated, it will be used more frequently in epidemiological studies. Knowledge of its limitations is crucial in both the design and interpretation of studies. This study allows researchers who obtain data from medical records to estimate the degree of misclassification resulting from missing information on family history. It should be kept in mind that the amount of missing information in our population may be different from that in other populations because these women are long-time members of the health maintenance organization. This study also emphasizes the need to assess the impact of past screening differences and health plan utilization differences on exposure ascertainment from the medical record in analytical studies of cancers for which screening is available.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


Supported by National Cancer Institute Grant K07 CA 70969.


The abbreviations used are: SEER, Surveillance,Epidemiology, and End Results; DCIS, ductal carcinoma in situ.


G. Husson, unpublished data.

Cook L. S., White E., Schwartz S. M., McKnight B., Daling J. R., Weiss N. S. A population-based study of contralateral breast cancer following a first primary breast cancer (Washington, United States).
Cancer Causes Control
Horwitz R. I. Comparison of epidemiologic data from multiple sources.
J. Chronic Dis.
Stokes M. E., Davis C. S., Koch G. G. Categorical Data Analysis Using the SAS System
-100, SAS Institute Inc. Cary, NC  
Cody R. P., Smith J. K. .
Applied Statistics and the SAS Programming Language, Ed.
-130, Prentice-Hall Inc. Englewood Cliffs, NJ  
Kerber R. A., Slattery M. L. Comparison of self-reported and database-linked family history of cancer data in a case-control study.
Am. J. Epidemiol.