Abstract
In conjunction with a pooled analysis of risk factors for advanced adenomas [adenomas with severe dysplasia, carcinoma in situ (CIS), and intramucosal carcinoma], we undertook a reliability study on the pathological diagnosis of advanced adenomas. We assessed intraobserver agreement (using Kappa (κ) as the measure of agreement) across two time periods 10 years apart with a single pathologist and interobserver agreement (using Kappa) between two pathologists rating the same slides concurrently. The study pathologists were blinded to the original case classification. We used the slides of 190 colorectal adenomatous polyp cases (104 originally diagnosed as advanced adenomas, 86 adenomas without advanced lesions) from a colonoscopy-based case-control study conducted in New York City between 1986 and 1988. We also assessed conditional agreement for 71 slides of advanced adenomas from four adenoma case-control studies conducted in different geographic regions of the United States in the 1990s. Intra- and interobserver agreement was only fair to moderate on the classification of both histological type (villous, tubulovillous, and tubular: intraobserver κ = 0.28; 95% confidence interval (CI), 0.17–0.39; interobserver κ = 0.48; 95% CI, 0.33–0.62) and degree of dysplasia (none/mild, moderate, severe, CIS, and intramucosal: intraobserver κ = 0.20; 95% CI, 0.12–0.28; interobserver κ = 0.42; 95% CI, 0.29–0.55). Using broader, rather than finer, classifications for degree of dysplasia substantially improved the reliability (interobserver agreement for high-grade dysplasia (including severe dysplasia, CIS, and intramucosal carcinoma) versus low-grade dysplasia: κ = 0.69; 95% CI, 0.55–0.83). These findings suggest that future epidemiological studies of advanced adenomas should use broad categories, such as high-grade versus low-grade dysplasia, include central review of all slides, and take measurement error into account in sample size calculations.
Introduction
Although most colorectal carcinomas are thought to arise from colorectal adenomas (1, 2, 3), most adenomas, which are quite common, do not progress to invasive carcinoma (3, 4). Therefore, the study of advanced adenomas (those with severe dysplasia, CIS,2 and intramucosal carcinoma), which have a greater likelihood of progressing to cancer, is necessary for understanding risk factors for progression along the adenoma-carcinoma sequence. Determination of the reliability in classifying these lesions is critical to such analyses.
In conjunction with a pooled analysis of risk factors for advanced colorectal adenomas, we undertook a study to assess the reliability of pathologists in classifying severe dysplasia, CIS, and intramucosal carcinoma. These three types of lesions, referred to collectively here as advanced adenomas, are distinct from, and intermediate in risk between, purely benign adenomas and clinically important invasive cancer. Pathologists generally differ in the nomenclature they apply to advanced adenomas (e.g., some use only the term severe dysplasia; others use CIS; relatively few use intramucosal carcinoma). Generally, pathologists use the term severe dysplasia as a synonym for CIS to describe abnormal cellular growth confined to the intraepithelial layer and not extending into the lamina propria; they use the term intramucosal carcinoma to describe abnormal cellular growth that extends into the lamina propria but does not invade the submucosa (5).
Materials and Methods
Study Subjects.
We first considered all of the cases of advanced adenomas and a random sample of nonadvanced adenomas identified in a case-control study of adenomatous polyps conducted in three NYC colonoscopy-based practices from 1986 to 1988 (6, 7). We were able to collect slides to review for 88% of the 215 eligible subjects. These included 104 cases of advanced adenomas and 86 nonadvanced adenomas.
Second, we reviewed slides from 71 subjects identified as having CIS or severe dysplasia from four endoscopy-based case-control studies of adenomatous polyps (University of Minnesota, University of North Carolina, Wake Forest University, and University of Southern California). Slides were not available for 11 (13.4%) of the eligible cases from these four study sites. Three of these studies used subjects attending colonoscopy clinics (University of Minnesota, University of North Carolina, and Wake Forest University); the other (University of Southern California) recruited subjects undergoing sigmoidoscopy screening. Eligibility and characteristics of these study subjects are described in detail elsewhere (8, 9, 10, 11).
Study Design.
We assessed intraobserver agreement (same pathologist (Pathologist A) rating on two different occasions) and interobserver agreement [two different pathologists (Pathologists A and B) rating within the same time interval] using H&E-stained slides from subjects with adenomatous polyps. Two pathologists, blinded to the original classification as well as to each other’s results, classified the specimens by histological type (tubular, tubulovillous, villous) and by degree of dysplasia (none or mild, moderate, severe, CIS, and intramucosal carcinoma). Degree of dyplasia was also examined using the WHO classification: (a) low-grade dysplasia (none, mild, or moderate); and (b) high-grade dysplasia (severe dysplasia, CIS, or intramucosal carcinoma). The two pathologists did not confer over case definitions before the start of the study because our aim was to estimate the reliability for cases combined retrospectively from several different hospitals, i.e., reliability under a more typical scenario for epidemiological studies that pool data from several sites.
Intraobserver Agreement.
The main analysis was conducted using subjects from the NYC study. The same pathologist (Pathologist A) from the original study reviewed slides from subjects with advanced and nonadvanced adenomas blinded to the original diagnosis. We assessed the agreement between two ratings that were made 10 years apart (1988 and 1998). To complement these analyses, we also assessed intraobserver agreement in another group of NYC subjects (n = 89). Several months to 1 year separated these two ratings, which were performed during the original study period 1986–1988.
Interobserver Agreement.
Two pathologists (A and B) rated the same group of subjects (n = 99) in 1998. These 99 subjects represented a subset of the 190 subjects assessed for intraobserver agreement; ∼50% had advanced adenomas and 50% did not. All of the ratings were performed blinded to the original classification as well as to the rating of the other pathologist.
Conditional Agreement.
In addition to the NYC sample, we requested all of the available slides for the cases diagnosed with advanced adenomas at each of the other four study sites (University of Minnesota, University of North Carolina, Wake Forest University, and University of Southern California). Each of these studies used a uniform pathology review for case diagnosis, and intramucosal carcinoma was not a separate category in their review. Pathologist A reviewed all of the slides from these 71 cases. Because slides for the adenoma controls were not available for these four sites, only conditional agreement was estimated for these cases, i.e., given that the study site classified the cases as advanced adenomas, what percentage were similarly classified by Pathologist A?
Comparison with Community Pathologists.
A final comparison was made between the original case classifications made by Pathologist A in 1988 with the diagnosis and classification made in 1988 by “community” pathologists in the NYC study (n = 318). These pathologists were unidentified members of the university hospitals affiliated with the colonoscopy-based practices.
Statistical Methods.
We used the Kappa statistic (12, 13), which reflects the agreement between two measurements (e.g., two observers or a single observer across two time periods) after removing chance agreement, as a measure of reliability. A value close to 1 represents almost perfect agreement whereas values close to or below 0 represent poor agreement. A useful scale for the interpretation of the Kappa estimate was developed by Landis and Koch (13): 0.81–1.00,“almost perfect”; 0.61–0.80, “substantial”; 0.41–0.60, “moderate”; 0.21–0.40, “fair”; 0.00–0.20, “slight”; and <0.0, “poor” agreement. We computed the weighted Kappa when the classification scheme had more than two categories; greater weight was given to differences in nonadjacent categories than to differences in adjacent categories (12). Conditional agreement is reported using percentages. In addition to these analyses, we examined whether the magnitude of agreement for a given pathological characteristic differed by the level of another pathological characteristic.
Results
Intraobserver Agreement.
Tables 1 and 2 report the results for intraobserver agreement for histological type and degree of dysplasia. The Kappa statistic for histological classification (Table 1; κ = 0.28; 95% CI, 0.17–0.39) was in the range of fair agreement (13). Collapsing the histological categories into any villous component (tubulovillous and villous) versus tubular improved the intraobserver agreement (κ = 0.36; 95% CI, 0.24–0.47).
The Kappa statistics for degree of dysplasia are reported in Table 2. Less agreement was seen, as expected, when all five categories of the degree of dysplasia were used (κ = 0.20; 95% CI, 0.12–0.28). The intraobserver agreement was higher when examining high-grade versus low-grade dysplasia (κ = 0.32; 95% CI, 0.19–0.46). Intraobserver agreement was also assessed within 1 year of the original classification in 1988 among 89 subjects (data not shown); results for histology and degree of dysplasia yielded similar findings to those reported in Tables 1 and 2.
Interobserver Agreement.
Tables 3 and 4 report the results for the interobserver agreement of histological type and degree of dysplasia. The interobserver agreement for histological type was moderate (Table 3; κ = 0.48, 95% CI, 0.33–0.62; weighted κ = 0.53; 95% CI, 0.40–0.66). Collapsing the histological categories into any villous component (tubulovillous and villous) versus tubular improved the intraobserver agreement (κ = 0.65; 95% CI, 0.50–0.80).
The interobserver agreement for degree of dysplasia (Table 4) using four categories was also moderate (κ = 0.42; 95% CI, 0.29–0.55; weighted κ = 0.59; 95% CI, 0.47–0.70). Collapsing into two groups of high-grade dysplasia versus low-grade dysplasia improved the agreement substantially (κ = 0.69; 95% CI, 0.55–0.83). If we assume Pathologist A is the “gold standard,” this level of agreement corresponds to a sensitivity for high-grade dysplasia of 75% and a specificity of 92.7%.
Conditional Agreement.
Agreement in classifying cases from four study centers was assessed using 71 cases diagnosed with either severe dysplasia or CIS. Pathologist A classified 82% percent of these 71 cases as severe dysplasia, CIS, or intramucosal carcinoma.
Comparison with Community Pathologists.
The interobserver agreement for histological type was fair (κ = 0.31; 95% CI, 0.23–0.39; and weighted κ = 0.35; 95% CI, 0.27–0.43). The interobserver agreement for degree of dysplasia using five categories was slight (κ = 0.05; 95% CI, 0.02–0.09; and weighted κ = 0.14; 95% CI, 0.08–0.19). Collapsing these groups into high-grade versus low-grade dysplasia improved the agreement only modestly (κ = 0.14; 95% CI, 0.07–0.20).
Interactions among Reliability Estimates.
We further explored differences in reliability estimates between the two study pathologists by other pathological characteristics. Interobserver agreement for dysplasia was very similar across histological subtype and size of adenomas (range for κ, 0.56–0.70). In contrast, interobserver agreement for histological classification was higher for adenomas with low-grade dysplasia versus high-grade dysplasia (κ = 0.61 versus κ = 0.34) and for small (<1 cm) adenomas as opposed to large (≥1 cm) adenomas (κ = 0.55 versus κ = 0.26).
Discussion
We found that intraobserver and interobserver agreement for degree of dysplasia for colorectal adenomas was fair to moderate. Intraobserver agreement was generally lower than interobserver agreement; this may reflect the fact that 10 years elapsed between the observer’s ratings, and, in particular, scientific thinking about the topic of advanced adenomas has changed. For example, some advocate finer distinctions for investigations of biological alterations and broader distinctions, such as low- and high-grade dysplasia, for clinical purposes (14). Interobserver agreement was dramatically improved by using the broader categories of high-and low-grade dysplasia (κ = 0.69; 95% CI, 0.55–0.83).
Estimates for both intra- and interobserver agreement were more similar for histological subtype (tubular, tubulovillous, and villous) and were in the fair to moderate range. Collapsing histological categories into any villous versus only tubular improved both the intra- and interobserver reliability. Intraobserver agreement was lower than interobserver agreement, but again this could reflect the passage of time and changes in rating criteria. Using weights to reflect changes between categories did little to affect the Kappa estimate, primarily because the largest differences in agreement occurred between adjacent categories.
Overall, we found a high degree of conditional agreement (82%) when agreement was defined broadly as high-grade dysplasia (severe dysplasia, CIS, intramucosal carcinoma). These data argue for combining cases of severe dysplasia, CIS, and intramucosal carcinoma into a single case group, because the terminology is used differently across study sites. Agreement is substantially reduced if one were to combine cases identified solely by community pathologists and not through a central review. For example, in contrast to these data from four centers, which used uniform pathological review, only slight agreement was found when comparing the central review with community pathologists.
We also examined whether agreement for one pathological characteristic differed by the level of another characteristic. Such interactions have not been previously evaluated but are important to consider given the strong correlation between size, degree of dysplasia, and histological subtype within an adenoma (15, 16, 17). Although we found no differences in the level of agreement for dysplasia (high- versus low-grade) by size or histological subtype of adenoma, agreement of histological subtype was lower for large adenomas and for adenomas with high-grade dysplasia. Large size, villous histology, and high-grade dysplasia all increase the malignant potential of adenomas (5, 15, 16, 17, 18, 19, 20). If these data reflect true differences, they argue for focusing on dysplasia as opposed to histological subtype, because the reliability for dysplasia may not depend on the other two characteristics.
Although these data indicated only a fair-to-moderate amount of agreement in the classification of colorectal adenomas, they are nevertheless consistent with three earlier studies examining the degree of dysplasia in colorectal adenomas (21, 22, 23). The first of these studies (21) rated 100 adenomas and reported interobserver agreement ranging from 0.04 to 0.28 and intraobserver agreement ranging from 0.33 to 0.45. The second study (22) rated 56 adenomas and reported intraobserver agreement ranging from 0.68 to 0.81 and interobserver agreement ranging from 0.35 to 0.57 (22). The third study (23) rated 187 adenomas and reported intraobserver agreement ranging from 0.31 to 0.91 and interobserver agreement ranging from 0.30 to 0.36 for degree of dysplasia (23). Disagreement in classifying dysplasia is by no means unique to colorectal neoplasia. Studies of breast (24), cervix (25), prostate (26), and oral noninvasive lesions (27) report Kappa statistics in the range of 0.2–0.45 for agreement in degree of dysplasia. The implication of these levels of disagreement for epidemiological research needs to be considered, especially in the context of the rise in cancer screening, which leads to increased identification of early lesions such as CIS.
What are the clinical and epidemiological implications of this magnitude of measurement error? Many clinicians argue logically that because the adenoma is usually removed irrespective of such fine classifications, such fine distinctions are not important. Nevertheless, many data exist to support the fact that size, histological subtype, and degree of dysplasia are related to the risk of adenoma recurrence and, in particular, subsequent cancer (19, 20). A reliable distinction among histological subtypes and degree of dysplasia is, therefore, needed to recommend targeted follow-up strategies for patients with adenomas more in line with future risk (21). However, our data reveal the many obstacles that confront the use of such classifications in a clinical setting. First, only histological subtype and size of adenoma are usually reported by community pathologists. Second, the reliability estimates reported here pertain to two academic pathologists who are highly specialized in the field of minimally invasive tumors. Thus, reliability in a clinical setting would likely be substantially lower (see “Results” for our comparison with community pathologists).
Disagreement in pathological classification also has implications for epidemiological analyses. To understand whether subjects with advanced adenomas differ from the majority of adenoma subjects without advanced adenomas, analyses comparing these groups need to be undertaken. Measurement error can, therefore, hamper detection of such differences. In sum, these data suggest that reliably distinguishing between severe dysplasia, CIS, and intramucosal carcinoma may not be feasible in research studies. The data thus support grouping together advanced adenoma cases with severe dysplasia, CIS, and intramucosal carcinoma into a single case group for epidemiological studies. Such studies require central review, however, because most advanced adenomas are not classified as such by community pathologists and, even among a small set of pathologists, the use of these terms can differ.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The abbreviations used are: CIS, carcinoma in situ; NYC, New York City; CI, confidence interval.
1988 ratings . | 1998 Histology Classification . | . | . | Total . | ||
---|---|---|---|---|---|---|
. | Tubular . | Tubulovillous . | Villous . | . | ||
Tubular | 55 | 9 | 0 | 64 | ||
Tubulovillous | 52 | 54 | 3 | 109 | ||
Villous | 2 | 10 | 1 | 13 | ||
Total | 109 | 73 | 4 | 186b |
1988 ratings . | 1998 Histology Classification . | . | . | Total . | ||
---|---|---|---|---|---|---|
. | Tubular . | Tubulovillous . | Villous . | . | ||
Tubular | 55 | 9 | 0 | 64 | ||
Tubulovillous | 52 | 54 | 3 | 109 | ||
Villous | 2 | 10 | 1 | 13 | ||
Total | 109 | 73 | 4 | 186b |
κ = 0.28; 95% CI, 0.17–0.39. Weighted κ = 0.32; 95% CI, 0.22–0.42.
The totals do not equal the number of cases reviewed because of missing information.
1988 ratings . | 1998 Degree of Dysplasia Classificationa . | . | . | . | . | Total . | ||||
---|---|---|---|---|---|---|---|---|---|---|
. | None or mild . | Moderate . | Severe . | CIS . | Intramucosal . | . | ||||
None or mild | 8 | 13 | 4 | 1 | 1 | 27 | ||||
Moderate | 9 | 16 | 12 | 2 | 0 | 39 | ||||
Severe | 1 | 13 | 8 | 1 | 1 | 24 | ||||
CIS | 2 | 19 | 12 | 9 | 6 | 48 | ||||
Intramucosal | 2 | 6 | 11 | 6 | 27 | 52 | ||||
Total | 22 | 67 | 47 | 19 | 35 | 190 |
1988 ratings . | 1998 Degree of Dysplasia Classificationa . | . | . | . | . | Total . | ||||
---|---|---|---|---|---|---|---|---|---|---|
. | None or mild . | Moderate . | Severe . | CIS . | Intramucosal . | . | ||||
None or mild | 8 | 13 | 4 | 1 | 1 | 27 | ||||
Moderate | 9 | 16 | 12 | 2 | 0 | 39 | ||||
Severe | 1 | 13 | 8 | 1 | 1 | 24 | ||||
CIS | 2 | 19 | 12 | 9 | 6 | 48 | ||||
Intramucosal | 2 | 6 | 11 | 6 | 27 | 52 | ||||
Total | 22 | 67 | 47 | 19 | 35 | 190 |
1988 ratings . | 1998 Degree of Dysplasia Classificationb . | . | Total . | |
---|---|---|---|---|
. | Low-grade dysplasiac . | High-grade dysplasiad . | . | |
Low-grade dysplasia | 46 | 20 | 66 | |
High-grade dysplasia | 43 | 81 | 124 | |
Total | 89 | 101 | 190 |
1988 ratings . | 1998 Degree of Dysplasia Classificationb . | . | Total . | |
---|---|---|---|---|
. | Low-grade dysplasiac . | High-grade dysplasiad . | . | |
Low-grade dysplasia | 46 | 20 | 66 | |
High-grade dysplasia | 43 | 81 | 124 | |
Total | 89 | 101 | 190 |
κ = 0.20; 95% CI, 0.12–0.28. Weighted κ = 0.38; 95% CI, 0.29–0.47.
κ = 0.32; 95% CI, 0.19–0.46.
Includes none, mild, and moderate dysplasia.
Includes severe, CIS, or intramucosal carcinoma.
Pathologist B . | Pathologist A: histology classification . | . | . | Total . | ||
---|---|---|---|---|---|---|
. | Tubular . | Tubulovillous . | Villous . | . | ||
Tubular | 45 | 6 | 0 | 51 | ||
Tubulovillous | 10 | 23 | 1 | 34 | ||
Villous | 1 | 10 | 1 | 12 | ||
Total | 56 | 39 | 2 | 97b |
Pathologist B . | Pathologist A: histology classification . | . | . | Total . | ||
---|---|---|---|---|---|---|
. | Tubular . | Tubulovillous . | Villous . | . | ||
Tubular | 45 | 6 | 0 | 51 | ||
Tubulovillous | 10 | 23 | 1 | 34 | ||
Villous | 1 | 10 | 1 | 12 | ||
Total | 56 | 39 | 2 | 97b |
κ = 0.48; 95% CI, 0.33–0.62. Weighted κ = 0.53; 95% CI, 0.40–0.66.
The totals do not equal the number of cases reviewed because of missing information.
Pathologist B . | Pathologist A: degree of dysplasia classification . | . | . | . | Total . | |||
---|---|---|---|---|---|---|---|---|
. | None or mild . | Moderate . | Severe or CIS . | Intramucosal . | . | |||
None or mild | 13 | 15 | 1 | 0 | 29 | |||
Moderate | 4 | 19 | 9 | 1 | 33 | |||
Severe or CIS | 0 | 3 | 16 | 7 | 26 | |||
Intramucosal | 0 | 1 | 1 | 9 | 11 | |||
Total | 17 | 38 | 27 | 17 | 99 |
Pathologist B . | Pathologist A: degree of dysplasia classification . | . | . | . | Total . | |||
---|---|---|---|---|---|---|---|---|
. | None or mild . | Moderate . | Severe or CIS . | Intramucosal . | . | |||
None or mild | 13 | 15 | 1 | 0 | 29 | |||
Moderate | 4 | 19 | 9 | 1 | 33 | |||
Severe or CIS | 0 | 3 | 16 | 7 | 26 | |||
Intramucosal | 0 | 1 | 1 | 9 | 11 | |||
Total | 17 | 38 | 27 | 17 | 99 |
Pathologist B . | Pathologist A: degree of dysplasia classification . | . | Total . | |
---|---|---|---|---|
. | Low-grade dysplasiac . | High-grade dysplasiad . | . | |
Low-grade dysplasia | 51 | 11 | 62 | |
High-grade dysplasia | 4 | 33 | 37 | |
Total | 55 | 44 | 99 |
Pathologist B . | Pathologist A: degree of dysplasia classification . | . | Total . | |
---|---|---|---|---|
. | Low-grade dysplasiac . | High-grade dysplasiad . | . | |
Low-grade dysplasia | 51 | 11 | 62 | |
High-grade dysplasia | 4 | 33 | 37 | |
Total | 55 | 44 | 99 |
κ = 0.42; 95% CI, 0.29–0.55. Weighted κ = 0.59; 95% CI, 0.47–0.70.
κ = 0.69; 95% CI, 0.55–0.83.
Includes none, mild, and moderate dysplasia.
Includes severe, CIS, or intramucosal carcinoma.
Acknowledgments
We thank Dr. Heidrum Rotterdam for her expertise in pathological review of cases and Dr. Robert Sandler for contributing cases to this study.