Abstract
Background: We assessed the performance and validity of cytology in the Finnish screening program by considering high-grade neoplasia and cervical cancer (CIN3+) rates as detected in the program and by reevaluating cases observed after a negative screening test.
Methods: This retrospective study included 915 screen-detected CIN3+ cases and 421 cases observed after a negative screen. Randomized and blinded reevaluation of potential false-negative screening tests covered 345 archival case smears from women without a referral to colposcopy, as well as 689 control smears for estimating performance and validity measures.
Results: The false-negative rate at the cutoff of low-grade squamous intraepithelial lesion or worse was 35% (95% confidence interval, 30-40%). In the subpopulation with original screening result of Pap I, the false-negative rate was 23% (18-28%). Sensitivity of screening laboratory rereading for detecting low-grade lesions or worse as atypical was 75% (67-82%) and specificity 93% (91-94%). Reproducibility of specific cytologic diagnoses was only fair. False negatives constituted 11% of all CIN3+ diagnoses in the screened population; those false negatives with an original Pap I screening result constituted 5%.
Conclusions: Although screen failures in the form of diagnostic false negatives occur within the Finnish screening program, their effect on cancer incidence is fairly small and cannot be readily decreased without sacrificing the high specificity of screening or without high incremental costs. Feedback for the screening laboratories is needed, however, to improve the reproducibility of cytologic diagnoses to optimize the burden of intensified follow-up and treatment of precancerous lesions. Cancer Epidemiol Biomarkers Prev; 19(2); 381–7
Introduction
Cervical cancer remains an important cause of morbidity and mortality worldwide, with an estimated global incidence of half a million and nearly a quarter of a million deaths annually (1). In Finland, population-based screening with conventional Pap smears has resulted in a nearly 80% decrease in cervical cancer incidence, from 15/100,000 woman years in 1963 to 4/100,000 in 2007 (2, 3). In most other countries, the effect of screening has been smaller, prompting the call for systematic audits of the different aspects of screening (4-8).
Population-based cervical cancer screening has high preventive potential but performance and detection rates of precancerous lesions vary considerably between programs. Reported diagnostic false-negative rates for cervical cytology range from 15% to 63% in different study settings, illustrating potentially differential validity as well as difficulty in nonblinded retrospective assessment of the Pap smear (9-17). There is variation also in specificity and related positive predictive value, but previous rereading studies have not used controls together with potential false-negative smears and have thus not assessed these validity problems (7, 18). The difficulties stem from the lack of an unbiased reference method, as screening primarily targets largely reversible precursors to prevent the future development of cancer. This is the strength of cervical cancer screening but at the same time renders direct evaluation problematic.
The purpose of the current study was to assess the performance and validity, specifically the false-negative rate, sensitivity, specificity, and reproducibility, of the cytologic diagnosis in cytologic screening in Finland and establish a procedure for continuing quality control. We have used clinical outcome to choose potential false-negative screening smears together with control smears from the healthy screening population and, unlike in any earlier study, blinded laboratory rereading and an expert panel to establish final cytologic results.
Materials and Methods
The cervical cancer screening program under study invites all women between the ages of 30 and 60 y with a personal letter for a conventional Pap smear with 5-y intervals. Some municipalities have extended the range by also including women ages 25 and 65 y (19).
Screening invitations and visits, including histologically confirmed findings, are registered in the Mass Screening Registry of the Finnish Cancer Registry. The Cancer Registry itself includes all reported cervical cancer, cervical intraepithelial neoplasia grade 3 (CIN3), and adenocarcinoma in situ (AIS) cases. Practising physicians and pathologists are requested to report these diagnoses directly to the Cancer Registry. In addition, death certificate diagnoses are obtained yearly and additional data are subsequently requested of those cases not previously reported (∼1% of all cancers; ref. 19).
The current study is based on a linkage between screening and cancer registry records over the period of 1990-1999, which involved 2.3 million visits among 1.3 million women. Six of the laboratories active throughout the study period participated with archives representing at least 5 of the 10 y covered by this study. These laboratories are located in Helsinki, Kotka, Kuopio, Oulu, Turku, and Vaasa and represent a good geographic cross-section of Finland. The study material amounted to a total of 828,514 screen samples.
Audit cases were defined as cervical cancers and CIN3/AIS lesions (CIN3+) diagnosed during 1990-1999 among women with Pap class I or class II [equal to reactive changes or atypical squamous intraepithelial lesion with undetermined significance (ASC-US) in Bethesda 2001, and borderline changes in British terminology] at previous screen without a referral for further assessment. Four hundred twenty-one such cases were identified. These included symptomatic cases, cases diagnosed through opportunistic screening, and, in the event of a preceding borderline screening test, cases diagnosed during the intensified follow-up recommended by the program. The participating laboratories were asked to locate the corresponding smears along with two control smears (the one preceding and the one following the case smear in sequence in their smear archives). Controls were thought particularly useful for the assessment of specificity of rereading. Three hundred forty-five (82%) of the requested case smears were eventually retrieved along with 689 controls (Fig. 1).
The smears were collected at the Mass Screening Registry and arranged into a random sequence and blinded to the status of case or control. The rereading was done in the original laboratories as well as in a reference laboratory and recorded using the standard Bethesda 2001 nomenclature (20). When these two reading results were clearly discrepant (Table 1), the final cytologic diagnosis was assessed by an expert panel. This involved 312 of 1,034 smears (180 case smears and 132 controls). The panel consisted of five members: two from the original laboratory (a cytotechnician and a cytopathologist), two from the reference laboratory, and the last member was an external cytopathologist experienced in screening. The panel worked blinded to the case history in question and each member reported his or her findings independently. In cases where the independent assessments were discrepant, the final result was determined in a discussion until a unanimous verdict was achieved. Due to the fact that in the Bethesda system, squamous and glandular cells are separately accounted for, it is possible to have two diagnoses for each smear. For statistical and presentational reasons, only the higher-grade atypia was considered and this being equal, squamous cell lesions took precedence over glandular.
First laboratory reading . | Second laboratory reading . |
---|---|
Atypical glandular cells | No atypical glandular cells |
Atypical cells | No atypical cells |
HSIL | LSIL or lower |
ASC-H | LSIL or lower |
Carcinoma | HSIL or lower |
LSIL | ASC-US or lower |
Adenocarcinoma or AGC-FN | AGC-NOS |
First laboratory reading . | Second laboratory reading . |
---|---|
Atypical glandular cells | No atypical glandular cells |
Atypical cells | No atypical cells |
HSIL | LSIL or lower |
ASC-H | LSIL or lower |
Carcinoma | HSIL or lower |
LSIL | ASC-US or lower |
Adenocarcinoma or AGC-FN | AGC-NOS |
Abbreviation: AGC-FN, atypical glandular cells, favor neoplasia.
Distribution of the cytologic diagnoses, detailed and grouped according to clinical relevance, were separately tabulated among case and control smears and by the different phases of rereading. Sensitivity and specificity were calculated for the rescreening results from the different sets of laboratories (screening and reference) with the final cytology as gold standard. These validity measures were estimated with 95% confidence intervals (95% CI) assuming binomial distribution. Cumulative results of final cytology were tabulated by case and control, as well as by original screening Pap classification (Pap class I or class II); 95% CIs are also reported. The false-negative rate was calculated as the proportion of case smears with a final cytologic diagnosis warranting referral for colposcopy. Reproducibility was assessed by cross-tabulating the cytologic diagnoses of screening and reference laboratories. Linearly weighted and unweighted κ values for interobserver reliability were calculated with 95% CIs.
Results
The attendance rate of the study population in organized screening during the observed time period was 72% (Fig. 1), with a range between laboratories of 66% to 80%. On average, 0.8% of the screened women per one screening round were referred for colposcopy. In addition,5.4% of screened women had borderline cytologic findings, warranting, according to the routine, a repeated smear after 12 months instead of after the normal 5 years. Of a total of 6,949 referrals, 915 cases of CIN3+ were detected,87 of which were invasive cancers. Cases with a preceding negative screen, thus qualifying for the rereading phase, amounted to 421, 115 of which were invasive cancers. Smears for 345 (82%) of these cases, including 54 (47%) of the invasive cancers, were retrieved for rereading. Of the retrieved case smears, 230 (67%) were originally classified as Pap I and the rest as Pap II. The mean time interval from negative smear to CIN3+ diagnosis was 31 months, with a range of 1 to 109 months.
There was a large difference in the rates of high-grade squamous intraepithelial lesion (HSIL) in the case material between the rereading phases, with frequency proportions of 7.5%, 13%, and 19% according to screen laboratory, reference laboratory, and final cytology, respectively. Of 345 case smears, 164 (48%) were classified as normal according to final cytology (Table 2). No smears in this study were deemed unreadable due to poor smear quality.
. | Screening laboratory . | Reference laboratory . | Final cytology . |
---|---|---|---|
Case smears | |||
Ca | 0.9 (3) | 0.6 (2) | 0.6 (2) |
AGC-FN | 2.6 (9) | 4.3 (15) | 2.0 (7) |
HSIL | 7.5 (26) | 13.0 (45) | 19.4 (67) |
ASC-H | 4.3 (15) | 3.5 (12) | 8.1 (28) |
LSIL | 3.8 (13) | 7.0 (24) | 4.6 (16) |
AGC-NOS | 4.6 (16) | 6.7 (23) | 4.1 (14) |
ASC-US | 11.0 (38) | 16.2 (56) | 13.6 (47) |
Normal | 65.2 (225) | 48.7 (168) | 47.5 (164) |
Total | 100.0 (345) | 100.0 (345) | 100.0 (345) |
Control smears | |||
Ca | 0.0 (0) | 0.0 (0) | 0.0 (0) |
AGC-FN | 0.1 (1) | 0.9 (6) | 0.4 (3) |
HSIL | 0.1 (1) | 0.6 (4) | 0.4 (3) |
ASC-H | 0.4 (3) | 1.7 (12) | 1.2 (8) |
LSIL | 0.4 (3) | 0.7 (5) | 0.7 (5) |
AGC-NOS | 0.7 (5) | 4.1 (28) | 1.5 (10) |
ASC-US | 5.4 (37) | 10.7 (74) | 8.1 (56) |
Normal | 92.7 (639) | 81.3 (560) | 87.7 (604) |
Total | 100.0 (689) | 100.0 (689) | 100.0 (689) |
. | Screening laboratory . | Reference laboratory . | Final cytology . |
---|---|---|---|
Case smears | |||
Ca | 0.9 (3) | 0.6 (2) | 0.6 (2) |
AGC-FN | 2.6 (9) | 4.3 (15) | 2.0 (7) |
HSIL | 7.5 (26) | 13.0 (45) | 19.4 (67) |
ASC-H | 4.3 (15) | 3.5 (12) | 8.1 (28) |
LSIL | 3.8 (13) | 7.0 (24) | 4.6 (16) |
AGC-NOS | 4.6 (16) | 6.7 (23) | 4.1 (14) |
ASC-US | 11.0 (38) | 16.2 (56) | 13.6 (47) |
Normal | 65.2 (225) | 48.7 (168) | 47.5 (164) |
Total | 100.0 (345) | 100.0 (345) | 100.0 (345) |
Control smears | |||
Ca | 0.0 (0) | 0.0 (0) | 0.0 (0) |
AGC-FN | 0.1 (1) | 0.9 (6) | 0.4 (3) |
HSIL | 0.1 (1) | 0.6 (4) | 0.4 (3) |
ASC-H | 0.4 (3) | 1.7 (12) | 1.2 (8) |
LSIL | 0.4 (3) | 0.7 (5) | 0.7 (5) |
AGC-NOS | 0.7 (5) | 4.1 (28) | 1.5 (10) |
ASC-US | 5.4 (37) | 10.7 (74) | 8.1 (56) |
Normal | 92.7 (639) | 81.3 (560) | 87.7 (604) |
Total | 100.0 (689) | 100.0 (689) | 100.0 (689) |
NOTE: Values are percentages (numbers). Horizontal lines represent cutoff points for referral (LSIL+) and intensified follow-up (ASC-US+).
Abbreviation: Ca, carcinoma.
In rereading, ASC-US+ was detected in 35% of the audit cases by the screening laboratories, 51% by the reference laboratory, and 52% according to the final cytologic result (Table 2; Fig. 2). The corresponding rates among controls were 7.3%, 19%, and 12%. The proportion of case smears that were classified as low-grade squamous intraepithelial lesion or worse (LSIL+, warranting referral) by final cytology, thus satisfying our criteria for false negatives, amounted to 35% (30-40%). For controls, the corresponding proportion was 2.8% (1.8-4.3%).
Assuming a false-negative rate of 35% to hold for all 421 identified cases with a preceding negative screen, false-negative screening results accounted for 11% (147 of 1,336) of the total observed CIN3+ cases in the screened population. On the other hand, many of the CIN3+ cases with a preceding Pap II screening result were diagnosed within the intensified screening activity done after borderline cytologic findings. The estimate of the false-negative rate of only those CIN3+ cases diagnosed after a Pap I result in the program was 23%. These cases accounted for 5% of the total observed CIN3+ cases.
There was no difference in the false-negative rates according to case severity (CIN3/AIS or invasive cancer). For those case smears that were originally classified as normal (Pap I), the rate of ASC-US+ in final cytology was 39%, whereas it was 80% in those originally classified as borderline (Pap II; Table 3).
. | HSIL+ . | LSIL+ . | ASC-US+ . | Normal+ . | |||
---|---|---|---|---|---|---|---|
n (%) . | 95% CI . | n (%) . | 95% CI . | n (%) . | 95% CI . | n (%) . | |
Case smears | 76 (22.0) | (18.0-26.7) | 120 (34.8) | (29.9-40.0) | 181 (52.5) | (47.2-57.7) | 345 (100.0) |
Pap II | 43 (37.4) | (29.1-46.5) | 68 (59.1) | (50.0-67.7) | 92 (80.0) | (71.7-86.3) | 115 (100.0) |
Cx Ca | 6 (42.9) | (21.3-67.7) | 9 (64.3) | (38.4-83.7) | 14 (100.0) | (78.2-100.0) | 14 (100.0) |
CIN3/AIS | 37 (36.6) | (27.9-46.4) | 59 (58.4) | (48.6-67.6) | 78 (77.2) | (68.1-84.3) | 101 (100.0) |
Pap I | 33 (14.3) | (10.4-19.5) | 52 (22.6) | (17.7-28.4) | 89 (38.7) | (32.6-45.1) | 230 (100.0) |
Cx Ca | 7 (16.7) | (8.4-30.7) | 10 (23.8) | (13.5-38.6) | 16 (38.1) | (25.0-53.3) | 42 (100.0) |
CIN3/AIS | 26 (13.8) | (9.6-19.5) | 42 (22.3) | (17.0-28.8) | 73 (38.8) | (32.2-46.0) | 188 (100.0) |
Control smears | 6 (0.9) | (0.4-1.9) | 19 (2.8) | (1.8-4.3) | 85 (12.3) | (10.1-15.0) | 689 (100.0) |
Pap II | 2 (5.7) | (1.8-18.7) | 8 (22.9) | (12.1-39.2) | 19 (54.3) | (38.1-69.6) | 35 (100.0) |
Pap I | 4 (0.6) | (0.2-1.6) | 11 (1.7) | (1.0-3.0) | 66 (10.1) | (8.0-12.6) | 654 (100.0) |
. | HSIL+ . | LSIL+ . | ASC-US+ . | Normal+ . | |||
---|---|---|---|---|---|---|---|
n (%) . | 95% CI . | n (%) . | 95% CI . | n (%) . | 95% CI . | n (%) . | |
Case smears | 76 (22.0) | (18.0-26.7) | 120 (34.8) | (29.9-40.0) | 181 (52.5) | (47.2-57.7) | 345 (100.0) |
Pap II | 43 (37.4) | (29.1-46.5) | 68 (59.1) | (50.0-67.7) | 92 (80.0) | (71.7-86.3) | 115 (100.0) |
Cx Ca | 6 (42.9) | (21.3-67.7) | 9 (64.3) | (38.4-83.7) | 14 (100.0) | (78.2-100.0) | 14 (100.0) |
CIN3/AIS | 37 (36.6) | (27.9-46.4) | 59 (58.4) | (48.6-67.6) | 78 (77.2) | (68.1-84.3) | 101 (100.0) |
Pap I | 33 (14.3) | (10.4-19.5) | 52 (22.6) | (17.7-28.4) | 89 (38.7) | (32.6-45.1) | 230 (100.0) |
Cx Ca | 7 (16.7) | (8.4-30.7) | 10 (23.8) | (13.5-38.6) | 16 (38.1) | (25.0-53.3) | 42 (100.0) |
CIN3/AIS | 26 (13.8) | (9.6-19.5) | 42 (22.3) | (17.0-28.8) | 73 (38.8) | (32.2-46.0) | 188 (100.0) |
Control smears | 6 (0.9) | (0.4-1.9) | 19 (2.8) | (1.8-4.3) | 85 (12.3) | (10.1-15.0) | 689 (100.0) |
Pap II | 2 (5.7) | (1.8-18.7) | 8 (22.9) | (12.1-39.2) | 19 (54.3) | (38.1-69.6) | 35 (100.0) |
Pap I | 4 (0.6) | (0.2-1.6) | 11 (1.7) | (1.0-3.0) | 66 (10.1) | (8.0-12.6) | 654 (100.0) |
Sensitivity of rereading was estimated as the proportion of screening, or reference, laboratory positives relative to the positives according to the final cytologic results (Table 4). At a threshold of ASC-US relative to final cytology of LSIL+, the sensitivity of the screening laboratories in rereading was 75% (67-82%) with a specificity of 93% (91-94%). Sensitivity with a threshold of LSIL+ was 50% and specificity 99%. The reference laboratory was more sensitive overall at considerable expense of specificity.
. | Final cytology . | ||
---|---|---|---|
HSIL+ . | LSIL+ . | ASC-US+ . | |
Sensitivity | |||
Screening laboratories | |||
ASC-US+ | 83 (73-90) | 75 (67-82) | 58 (52-64) |
LSIL+ | 61 (50-72) | 50 (41-58) | 27 (22-33) |
HSIL+ | 43 (32-54) | 28 (21-36) | 15 (11-20) |
Reference laboratory | |||
ASC-US+ | 90 (82-96) | 88 (81-93) | 86 (81-90) |
LSIL+ | 71 (60-80) | 70 (62-78) | 44 (38-50) |
HSIL+ | 60 (48-70) | 41 (33-50) | 26 (20-31) |
Specificity | |||
Screening laboratories | |||
ASC-US+ | 89 (87-91) | 93 (91-94) | 98 (97-99) |
LSIL+ | 97 (96-98) | 99 (99-100) | 100 (99-100) |
HSIL+ | 100 (99-100) | 100 (99-100) | 100 (100-100) |
Reference laboratory | |||
ASC-US+ | 76 (73-78) | 79 (77-82) | 90 (87-92) |
LSIL+ | 93 (91-95) | 97 (96-98) | 99 (98-100) |
HSIL+ | 98 (96-98) | 98 (97-99) | 99 (99-100) |
. | Final cytology . | ||
---|---|---|---|
HSIL+ . | LSIL+ . | ASC-US+ . | |
Sensitivity | |||
Screening laboratories | |||
ASC-US+ | 83 (73-90) | 75 (67-82) | 58 (52-64) |
LSIL+ | 61 (50-72) | 50 (41-58) | 27 (22-33) |
HSIL+ | 43 (32-54) | 28 (21-36) | 15 (11-20) |
Reference laboratory | |||
ASC-US+ | 90 (82-96) | 88 (81-93) | 86 (81-90) |
LSIL+ | 71 (60-80) | 70 (62-78) | 44 (38-50) |
HSIL+ | 60 (48-70) | 41 (33-50) | 26 (20-31) |
Specificity | |||
Screening laboratories | |||
ASC-US+ | 89 (87-91) | 93 (91-94) | 98 (97-99) |
LSIL+ | 97 (96-98) | 99 (99-100) | 100 (99-100) |
HSIL+ | 100 (99-100) | 100 (99-100) | 100 (100-100) |
Reference laboratory | |||
ASC-US+ | 76 (73-78) | 79 (77-82) | 90 (87-92) |
LSIL+ | 93 (91-95) | 97 (96-98) | 99 (98-100) |
HSIL+ | 98 (96-98) | 98 (97-99) | 99 (99-100) |
Cross-tabulation of the reevaluated specific cytologic diagnoses of rereading in the screening laboratories against those of the reference laboratory revealed only a fair correlation (Table 5). Unweighted κ was 0.24 (0.19-0.29). The unweighted κ value for management recommendation according to the principles of follow-up for ASC-US and referral for colposcopy for LSIL+ was 0.30. The linearly weighted κ for this three-class correlation was 0.38. Reproducibility of the ASC-H (atypical squamous cells, cannot exclude HSIL) diagnosis was particularly poor. Both glandular cell classes also had poor reproduction.
Reference laboratory . | |||||||||
---|---|---|---|---|---|---|---|---|---|
Screening laboratory . | Normal . | ASC-US . | AGC-NOS . | LSIL . | ASC-H . | HSIL . | AGC-FN . | Ca . | Total . |
Normal | 676 | 98 | 38 | 13 | 13 | 19 | 7 | 0 | 864 |
ASC-US | 30 | 20 | 6 | 6 | 4 | 4 | 5 | 0 | 75 |
AGC-NOS | 9 | 3 | 2 | 1 | 1 | 2 | 3 | 0 | 21 |
LSIL | 1 | 1 | 0 | 6 | 1 | 6 | 1 | 0 | 16 |
ASC-H | 9 | 3 | 2 | 1 | 1 | 2 | 0 | 0 | 18 |
HSIL | 3 | 4 | 0 | 1 | 3 | 12 | 3 | 1 | 27 |
AGC-FN | 0 | 1 | 3 | 0 | 1 | 3 | 2 | 0 | 10 |
Ca | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 3 |
Total | 728 | 130 | 51 | 29 | 24 | 49 | 21 | 2 | 1,034 |
Reference laboratory . | |||||||||
---|---|---|---|---|---|---|---|---|---|
Screening laboratory . | Normal . | ASC-US . | AGC-NOS . | LSIL . | ASC-H . | HSIL . | AGC-FN . | Ca . | Total . |
Normal | 676 | 98 | 38 | 13 | 13 | 19 | 7 | 0 | 864 |
ASC-US | 30 | 20 | 6 | 6 | 4 | 4 | 5 | 0 | 75 |
AGC-NOS | 9 | 3 | 2 | 1 | 1 | 2 | 3 | 0 | 21 |
LSIL | 1 | 1 | 0 | 6 | 1 | 6 | 1 | 0 | 16 |
ASC-H | 9 | 3 | 2 | 1 | 1 | 2 | 0 | 0 | 18 |
HSIL | 3 | 4 | 0 | 1 | 3 | 12 | 3 | 1 | 27 |
AGC-FN | 0 | 1 | 3 | 0 | 1 | 3 | 2 | 0 | 10 |
Ca | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 3 |
Total | 728 | 130 | 51 | 29 | 24 | 49 | 21 | 2 | 1,034 |
NOTE: Cohen's unweighted κ value is 0.24 (0.19-0.29). When grouped into three categories corresponding to normal follow-up procedure, intensified follow-up, and immediate referral, the value of unweighted κ is 0.30 (0.24-0.36) and linearly weighted κ 0.38 (0.32-0.44).
Discussion
The current study, in which we calculated the screen-detected CIN3+ cases and reevaluated the smears of women diagnosed following a negative screen, showed that only a small proportion of CIN3+ and particularly of cervical cancer cases were due to screen failure. These results confirm previous conclusions that the low disease burden in Finland is mainly due to organized screening and that further prevention of cervical cancer with new screening methods and technologies has a fairly small potential for additional effect.
We found that the cytologic false-negative rate was 35%, as defined by final cytology of LSIL+, among smears preceding diagnosis of CIN3+ but in which no referral had been recommended. The corresponding false-negative rate in the Pap I subpopulation was lower at 23%. An additional 18% of all case smears had cytologic changes of undetermined significance [ASC-US and atypical glandular cells, not otherwise specified (AGC-NOS)]. Previously reported false-negative rates range from 15% to 63% in comparable studies (9-17). The 48% of case smears that were normal according to final cytology probably represent sampling errors but can also include cases where neoplasia has developed rapidly after the collection of the screening sample.
Assuming that the false-negative rate in the final study material would hold for all the audit cases identified, false-negative screening results accounted for 11% of the total observed CIN3+ cases in the screened population. Those CIN3+ cases diagnosed after a Pap I screening result within the program and with LSIL+ diagnosis in rereading constituted 5% of all CIN3+ cases. Furthermore, every treated CIN3 lesion is not a prevented cancer case. An early Finnish study estimated that 28% to 39% of high-degree (severe) dysplasia and carcinoma in situ cases combined will progress to invasive cancer (21). Progression rates estimated in a study of untreated CIN3 cases in New Zealand (22) and in a study using data from the British Columbia screening program (23) also fall within this range (31% and 38%, respectively). Because the average duration of the precancer phase is 10 to 12 years before progression to invasive cancer, many of the CIN3/AIS cases with a preceding negative screen would not have progressed before the next scheduled invitation. Taking this natural history into account, it is likely that the proportion with a false-negative diagnosis of the progressive CIN3/AIS lesions is less than that among all those lesions.
Of 421 CIN3+ cases identified for rereading, 115 were cervix carcinomas (0.01% of screening samples). Eighty-two percent of all requested case smears were recovered for reevaluation but only 47% of the invasive case cases. An explanation might be that on diagnosis of cervix cancer, the treating hospital often requests any preceding smears, sometimes failing to return them to the screening archives. Validity measures and false-negative rates did not vary with the eventual histologic diagnosis, however.
In the literature, sensitivity of a single Pap smear has been reported to range from 30% to 87% (24). Our estimate for the original screening laboratories during the rereading phase of the study was 75% at the lowest cutoff of ASC-US and 50% at the cutoff of LSIL among “true” LSIL+ cases. Validity measures for the reference laboratory differed significantly from those of the screening laboratories. This highlights the difficulty of finding cytologic truth by using just a single laboratory as gold standard.
Reproducibility of the cytologic diagnoses was fair at best when using the Bethesda 2001 classification system. This was true also when analyzing results using only three outcome categories and the linearly weighted κ value as indicator. Specificity variation between laboratories was important, leading to local variations in adverse effects and cost of screening but not necessarily in effectiveness. The difficulty of diagnosing glandular lesions with the conventional Pap smear seems to be another important issue for poor reproducibility. To achieve and maintain consistent screening quality, it is necessary to continue the evaluation and feedback process between and with the laboratories.
Knowledge that the rereading material contained a large proportion of potential false negatives, differing in this respect from normal screening material, was a possible source of bias. These pre-expectations could have influenced the evaluation by enhancing sensitivity and decreasing specificity, although this effect may be assessed by using controls, randomization, and blinding. The LSIL+ rates in general screening and in the rereading of controls were similar (0.8% versus 1.0%), suggesting the level of bias being low. Another study method would be to embed the study smears in day-to-day screening while keeping the laboratory personnel ignorant of their participation in a study. A panel working blinded and making individual judgements before a consensus discussion was assumed, however, to be the most valid cytologic gold standard and better than a single assessor.
Improving the sensitivity of screening by changing diagnostic criteria should still be possible but hardly feasible, considering the small gain on offer in exchange for a large loss of specificity or large incremental costs in the case of double reading of smears. We conclude that both the sensitivity and the specificity of the cytologic laboratories involved in the program are satisfactory, with very low cancer incidence and low referral rates achieved over the last few decades. Still, attention should be placed on the reproducibility of cytologic diagnoses, as well as on ensuring high attendance.
In a study by Bulk et al. (10), cytologic rereading of originally normal smears preceding a diagnosis of CIN2/3 resulted in an upgrade to borderline/mild dyskaryosis or worse (corresponding to ASC-US+) in 33% of cases. This figure primarily compares with the 39% ASC-US+ results according to final cytology of those case smears in our study that were originally classified as Pap I. In the Dutch study, there were no controls and, hence, no specificity calculations, which probably leads to overestimation of the false-negative cytology rate. They also reported high-risk human papillomavirus (hrHPV) test positivity in 80% of these case smears, which suggests significantly higher sensitivity compared with cytology. Again, no specificity calculations were made. The careful cytologic evaluation of hrHPV positives that Bulk et al. recommend could best be implemented in a primary HPV screening program combined with cytology triage (25). An experimental screening design is ongoing in the Finnish program (26, 27). A systematic cytology triage with careful independent evaluation phases within the cytology laboratory and in a panel can be recommended based on our experiences from the current study.
Considering the large variation in current cervix cancer burden between countries and in historical trends, regular and comprehensive audits, as recommended in the current quality assurance guidelines for cancer screening programs, should be done to monitor cervical cancer screening programs. To optimize the performance and validity of cytologic screening, register-based feedback and audits could also be included in the certifications and accreditation standards of screening units and laboratories.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
We thank Kaija Halonen and Sirkku Vuorma from the Mass Screening Registry of the Finnish Cancer Registry for help with the preparation and handling of the register data; Kaija Knuuti, Merja Herttola-Hämäläinen, and Tuomo Timonen from the Department of Pathology at the University Hospital of Helsinki for providing facilities and participation in the rereading; and the cytotechnologists from the screening laboratories that participated in the panel sessions: Päivi Kelkka, Anna-Maija Korhonen, Päivi Häkkinen, Benita Nyqvist, Regina Lääkkö, Eila Hanelius, and Berit Hällfors.
Grant Support: European Commission and the Finnish Cancer Organisation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.