Background:

Human papillomavirus (HPV)-based screening is rapidly replacing cytology as the cervical screening modality of choice. In addition to being more sensitive than cytology, it can be done on self-collected vaginal or urine samples. This study will compare the high-risk HPV positivity rates and sensitivity of self-collected vaginal samples using four different collection devices and a urine sample.

Methods:

A total of 620 women referred for colposcopy were invited to provide an initial stream urine sample collected with the Colli-Pee device and take two vaginal self-samples, using either a dry flocked swab (DF) and a wet dacron swab (WD), or a HerSwab (HS) and Qvintip (QT) device. HPV testing was performed by the BD Onclarity HPV Assay.

Results:

A total of 600 vaginal sample pairs were suitable for analysis, and 505 were accompanied by a urine sample. Similar positivity rates and sensitivities for CIN2+ and CIN3+ were seen for DF, WD, and urine, but lower values were seen for QT and HS. No clear user preferences were seen between devices, but women found urine easiest to collect, and were more confident they had taken the sample correctly. The lowest confidence in collection was reported for HS.

Conclusions:

Urine, a DF swab, and WD swab all performed well and were well received by the women, whereas the Qvintip and HerSwab devices were less satisfactory.

Impact:

This is the first study to compare five self-sampling methods in the same women taken at the same time. It supports wider use of urine or vaginal self-sampling for cervical screening.

Self-sampling for high-risk human papillomavirus (hrHPV) has been recognized as an alternative to clinician collected samples for cervical screening. In the United Kingdom, the uptake of cervical screening has been declining in recent years. The most recent quarterly figures (March 31, 2019) show 70.2% of women in England ages 25–49 years were screened adequately within the previous 3.5 years, and 76.4% for women ages 50 to 64 years within the previous 5.5 years (1).

The introduction of primary screening by hrHPV testing presents an opportunity for underscreened women to be offered self-sampling at home as an alternative to a speculum examination in a health care setting (2, 3). To date, self-sampling has been suggested primarily for those who are underscreened or who have not been screened. However, given a preference for self-sampling by most women, accumulating data that it is as sensitive for HPV testing as clinician taken samples and the potential for major savings for health service, self-sampling has the potential to become the first choice for screening in all women (4–6).

An updated meta-analysis using a range of assays in a range of settings, found PCR-based assays in screening populations had similar sensitivities for self-sampling compared with clinically taken samples (ref. 7, Table 1) with sensitivities of about 96% for CIN2+ and CIN3+ for both methods, and specificities for <CIN2 of about 79%. Similar results were seen in a large randomized trial (8). In general, it has been suggested that the specificity of self-collected, vaginal samples, may be lower than for clinician taken cervical samples due to passenger hrHPV found in the vagina (9), but this is less clearly established. Increasing interest in self-sampling for HPV has led to the development of a number of kits, and dozens of devices are currently commercially available (10). These include brush-type devices and different types of swabs. Some use a liquid specimen transport medium and some are transported dry. The latter has obvious advantages in terms of safety for home use, especially when small children are present, and also fewer restrictions for posting. A number of kits for urine collection are also available (https://www.amazon.co.uk/Medical-Specimen-Collection-Containers/b?ie=UTF8&node=6285950031). However, none of these provide an easy way to collect initial steam urine, which has been shown to be important (11), and also do not contain a stabilization medium, which has to be added separately to avoid DNA degradation.

Cost and simplicity of use are also important factors—especially in low and middle income countries, where self-sampling may prove more cost effective and be the only practical option. In this study, we compared simple swabs against more complex and expensive devices, and also those transported dry versus wet in a liquid transport medium. As vaginal collection has an invasive element we also included urine testing, which is noninvasive and may be more acceptable to some women or cultures, but has been less extensively studied. A systematic review of 14 studies found that urine testing had good accuracy for the presence of cervical HPV, and that initial stream urine yielded more accurate results than random or midstream sampling (12). In a recent study in a referral population, we found similar performance of the Trovagene test and five other HPV assays in clinical cervical samples, but a lower sensitivity in a urine sample (13). Three other small studies with matched urine, self-collected vaginal and clinician taken cervical samples in a referral population have been reported. Two (14, 15) found similar sensitivities for CIN2+ between all three samples, but the third found a lower sensitivity for urine (16). All used an Evalyn (or similar Viba) brush for the vaginal samples. Urine was collected in a sterile pot in two of these studies (14, 17) before adding a preservative, and the third used the Colli-Pee device (15). Different HPV assays were used in the different studies. Here we compare the performance of four vaginal sampling devices, including wet and dry transport methods, and an initial stream urine sample in a referral population using the Becton Dickinson Onclarity HPV test. We also conducted additional analyses to examine the effect of adjusting for sample cellularity and the use of different positivity thresholds.

Study population and procedures

In all cases, an initial stream urine sample was requested as the first sample, using the Colli-Pee collection kit (Novosanis NV), which collects a 20 mL sample of which 7 mL is a prefilled urine conservation medium (UCM). Two vaginal samples were provided subsequently. Women were randomized to use either the dacron-based digene Female Swab Specimen Collection Kit (Qiagen GmbH) placed in 1 mL liquid specimen transport medium (STM; denoted as wet dacron—WD below) and the Copan FLOQswab (Copan Diagnostics Inc) as a dry sample (dry flocked—DF) or the Qvintip Kit (Aprovix AB; QT) and the HerSwab (Eve Medical) both as dry samples (HS; the sampling devices are illustrated in Supplementary Fig. S1). The order of sampling was also randomized, leading to four equal sized groups. Samples were stored locally at 4°C–8°C before being transported to the laboratory within 2 weeks of collection.

Recruitment took place between April 2016 and August 2017. New referrals to the Royal London Hospital colposcopy clinic (London, UK) who had a positive screening result were sent information about the study and an offer to join it with their appointment letter. This was also sent to women under observation for prior abnormalities, provided that their most recent abnormal smear test was within the last 3 years. Pregnancy or ablation/excision treatment to the cervix within 3 years were exclusion criteria. All samples were taken prior to colposcopic examination. Cervical biopsies were only taken where clinically indicated. As well as having a negative biopsy, a negative colposcopy with satisfactory evaluation of the transformation zone was considered to indicate no disease even without a biopsy. Patient outcomes were collected for 1 year after enrollment, and this period would normally include a follow-up visit for women with low-grade disease at first visit. Histology was reported by hospital pathologists and reviewed internally according to NHS Trust quality control practices. Where cytology or histopathology results were regraded following local review the reviewed result was used.

Women were also asked to complete a simple survey evaluating their preference for a vaginal sample versus urine, their preferred vaginal sampling device, discomfort with sample collection, confidence in sampling and ease, or difficulty of each collection method.

Ethical approval

The study was reviewed and approved in February 2016 by the London City Road & Hampstead Research Ethics Committee (Reference 15/LO/2134), and was conducted in concordance with the principles sent out in the Declaration of Helsinki guidelines. Written informed consent was obtained from all participating women. The trial was registered with the ISRCTN Registry (ISRCTN68980717).

Laboratory methods

All samples were tested in the Wolfson Institute of Preventive Medicine laboratory (London, United Kingdom). On receipt, the WD swab samples, which were transported in STM, were frozen at −20°C until further processing. The dry DF, HS, and QT samples were placed into 15 mL tubes, 8 mL of PreservCyt solution (Hologic) was added and the tubes were vortexed thoroughly at maximum speed to resuspend the cells into the solution. These samples were then split into 700 μL aliquots. The WD samples, collected in 1 mL of STM, were split into 100 μL aliquots before freezing. All the aliquots were kept frozen until tested. Urine was refrigerated at 4°C and split into 2 mL vials within 7 days of receipt before being frozen at −20°C.

HPV testing

All samples were tested for hrHPV using the BD Onclarity HPV Assay. This assay is based on simultaneous PCR amplification and detection of target DNA using amplification primers and fluorescent-labeled detector probes. An internal human β-globin (HBB) gene control was used to measure sample cellularity to judge sample adequacy. The assay detects 14 high-risk HPV types overall: six types are individually genotyped (16, 18, 31, 45, 51, and 52) and the other types are reported in groups (33, 58), (56, 59, 66), and (35, 39, 68). The manufacturer's predefined threshold for positivity was a Ct value ≤38.3 for HPV16 and ≤34.2 for all other genotypes and the β-globin internal control. Use of a Ct threshold of either 34.2 or 38.3 for all genotypes was also explored.

For DF, HS, and QT, 500 μL of sample was added to an LBC Diluent tube. A total of 50 μL of the WD sample was added to a tube containing the cervical brush diluent medium. For urine, 2 mL was added to empty LBC Diluent tubes. The Onclarity software produces an “indeterminate” result when no HPV is detected and the HBB internal control did not meet the amplification threshold, indicating insufficient cellularity of the specimen. Samples producing invalid results were repeat tested once. Twenty samples were indeterminate a second time and excluded from the analyses. One urine sample was of insufficient cellularity to provide a valid result.

Statistical methods

The Ct values were compared between devices using the unpaired Wilcoxon test. Comparisons of the numbers of women who were hrHPV positive were based on binomial statistics. The Fisher exact test was used when any of the four expected cell values was ≤ 10, but otherwise χ2 normal approximations were used.

Agreement in hrHPV positivity between devices was calculated using both an unadjusted approach and also an approach adjusted for cellularity, using the kappa statistic. The direction of discordance (+/− vs. −/+) was calculated by using McNemar OR. Sensitivity for CIN2+ and CIN3+ and specificity for <CIN2 were calculated using standard approaches. Ct values adjusted for cellularity were obtained by first subtracting the internal control Ct value from the raw HPV Ct value. Because of its extensive prior use, WD was selected as the reference device, and a further additive adjustment was made to give the same overall positivity rate as for the raw values for this device at the predefined genotype-specific Ct cutoff (≤38.3 for HPV16 and ≤34.2 for other genotypes) for the primary analysis, and at the 34.3 and 38.3 cutoffs for all genotypes in the secondary analyses. The same additive adjustment was made for all other devices, except for urine, where cellularity was generally higher and a cellularity adjustment was not explored. Bootstrapping was used to estimate the confidence intervals (CI) for the kappa statistic and the difference between the raw and adjusted kappa statistics. A small preliminary study of DNA stability was undertaken using aliquots from 60 women (30 sampled with DF and 30 with HS) comparing baseline levels of DNA and nucleic acid using Nanodrop and RNAse to those of matched aliquots stored at 25°C for 1 or 2 weeks. No significant differences were seen with either assay.

All statistical analyses were performed in R (v3.6.3).

A total of 620 women consented to take part, but 20 were excluded as not eligible and the primary analysis group comprised the remaining 600 women (Fig. 1). Of these, 505 women also provided an initial stream urine sample of between 8 and 20 mL, inclusive of 7 mL UCM preservative.

Figure 1.

Study flowchart for Predictors 5.1 trial. WD, digene Female Swab Specimen Collection Kit; DF, FLOQswab; QT, Qvintip; HS, HerSwab; 1st, first sample collected; 2nd, second sample collected.

Figure 1.

Study flowchart for Predictors 5.1 trial. WD, digene Female Swab Specimen Collection Kit; DF, FLOQswab; QT, Qvintip; HS, HerSwab; 1st, first sample collected; 2nd, second sample collected.

Close modal

Median age was 29 years [interquartile range (IQR): 27–33, range 21–63], and 330 (53.2%) were aged less than 30 years, 261 (42.1%) were aged 30–44 years, and 29 (4.4%) were aged 45 or older. Each woman provided two vaginal self-samples (Fig. 1). Age at enrollment, referral cytology reading, time since most recent abnormal cytology and previous treatment were similar across all four randomization groups (Supplementary Table S1).

There were 66 cases of CIN2 and 68 cases of CIN3+ within the 1-year follow-up period. This included two stage 1A1 squamous cell carcinomas (SCC), one stage 1B SCC, and one stage 1A1 adenocarcinoma. A cross-tabulation of histology and cytology results is shown in Supplementary Table S2.

hrHPV Positivity

The HPV assay could not be performed on 20 samples (HS = 16; QT = 4) and one urine sample due to low cellularity (Fig. 1). These samples all came from different women. There were no significant differences in hrHPV positivity rates according to collection order for any of the devices (Supplementary Table S3), and results from first and second samples have been combined for all analyses. HPV positivity rates for all hrHPV types combined are shown in Table 1. At the predefined HPV test positivity cutoff of Ct ≤ 38.3 for HPV16 and ≤34.2 for all other genotypes, positivity of 71.7% was seen for WD and DF, with nonsignificantly higher values for urine (76.0%; P = 0.17) and nonsignificantly lower rates for QT (69.3%; P = 0.52) and for HS (65.1%; P = 0.09). Agreement on the same sample between WD and DF was high (kappa = 0.801, 95% CI: 0.777-0.826), but lower between QT and HS (kappa = 0.753, 95% CI: 0.723–0.779) and lower still between urine and the four vaginal samples (kappa ranging from 0.568–0.646), largely due to the higher positivity rate with urine (Supplementary Table S4). Positivity by genotype is shown in Supplementary Table S5. There were no statistically significant or obvious differences in genotype-specific positivity by device, although type 45 was lower for Herswab and Quintip, and the positivity for types 35, 39, and 68 was higher for urine and Quintip using the primary cutoffs without a cellularity adjustment.

Table 1.

hrHPV positivity rates by device (combined samples). HPV positive is defined as HPV16 with Ct ≤ 38.3 and/or ≤ 34.2 for all other genotypes. A total of 20 samples could not be evaluated (HS = 16; QT = 4).

All samples
DeviceNPositive (%) (95% CI)P-value vs. WD
WD 300 71.7 (66.2–76.7) — 
DF 300 71.7 (66.2–76.7) 1.00 
QT 296 69.3 (63.7–74.5) 0.52 
HS 284 65.1 (59.3–70.7) 0.090 
Urine 504 76.0 (72.0–79.7) 0.17 
All samples
DeviceNPositive (%) (95% CI)P-value vs. WD
WD 300 71.7 (66.2–76.7) — 
DF 300 71.7 (66.2–76.7) 1.00 
QT 296 69.3 (63.7–74.5) 0.52 
HS 284 65.1 (59.3–70.7) 0.090 
Urine 504 76.0 (72.0–79.7) 0.17 

The highest sensitivity for CIN3+ was obtained with WD (91.2%). Sensitivities for CIN3+ for the other devices were as follows: urine (89.7%), DF (88.2%), QT (81.8%), and HS (77.4%), none of which were significantly lower than WD (Table 2A; Supplementary Fig. S2). A similar order was seen for CIN2+, and again no differences were significant. Increasing the cutoff to ≤38.3 for all HPV types gave higher sensitivities for all devices (Table 2C). The order of sensitivities between the devices was similar to before, but the differences were smaller, especially for HS and no differences were significant. However, there was a substantial loss of specificity for <CIN2 for all devices, for example, dropping from 34.2% to 26.3% for WD in this referral population. Slightly but not significantly better agreement between the different devices in terms of overall hrHPV positivity was seen at the higher 38.3 cutoff (Supplementary Table S4).

Table 2.

Sensitivity and specificity analysis for hrHPV by device at various cutoffs. A) 38.3 cutoff for HPV16 with the other HPV types at 34, B) All HPV types at 34.2 cutoff, and C) All HPV types at 38.3 cutoff. There are two sets of sensitivity and specificity results given—those for the raw values (and those adjusted for cellularity in brackets except urine). Here N denotes number of samples and n the number with CIN2+ or CIN3+.

A) 38.3 HPV16, 34.2 all other genotypesCIN2+CIN3+<CIN2
NnSensitivity (adjusted)nSensitivity (adjusted)Specificity (adjusted)
WD 298 70 90.0 (85.7) 34 91.2 (88.2) 34.2 (32.9) 
DF 298 70 88.6 (84.3) 34 88.2 (88.2) 33.8 (37.7) 
QTa 296 63 82.5 (82.5) 33 81.8 (84.8) 34.3 (35.3) 
HSa 284 61 82.0 (85.2) 31 77.4 (87.1) 39.5 (36.9) 
Urine 502 115 87.0 58 89.7 27.4 
  CIN2+ CIN3+ <CIN2 
B) 34.2 All N n Sensitivity (adjusted) n Sensitivity (adjusted) Specificity (adjusted) 
WD 298 70 85.7 (84.3) 34 88.2 (85.3) 34.6 (34.2) 
DF 298 70 84.3 (82.9) 34 85.3 (85.3) 34.6 (38.6) 
QTa 296 63 77.8 (82.5) 33 72.7 (84.8) 38.2 (37.9) 
HSa 284 61 68.9 (75.4) 31 61.3 (77.4) 41.3 (38.3) 
Urine 502 115 77.4 58 84.5 29.5 
  CIN2+ CIN3+ <CIN2 
C) 38.3 All N n Sensitivity (adjusted) n Sensitivity (adjusted) Specificity (adjusted) 
WD 298 70 92.9 (92.9) 34 94.1 (94.1) 26.3 (26.3) 
DF 298 70 92.9 (92.9) 34 94.1 (94.1) 26.8 (28.9) 
QTa 296 63 92.1 (90.5) 33 90.9 (93.9) 24.5 (25.9) 
HSa 284 61 90.2 (91.8) 31 90.3 (93.5) 30.5 (32.9) 
Urine 502 115 93.9 58 93.1 19.6 
A) 38.3 HPV16, 34.2 all other genotypesCIN2+CIN3+<CIN2
NnSensitivity (adjusted)nSensitivity (adjusted)Specificity (adjusted)
WD 298 70 90.0 (85.7) 34 91.2 (88.2) 34.2 (32.9) 
DF 298 70 88.6 (84.3) 34 88.2 (88.2) 33.8 (37.7) 
QTa 296 63 82.5 (82.5) 33 81.8 (84.8) 34.3 (35.3) 
HSa 284 61 82.0 (85.2) 31 77.4 (87.1) 39.5 (36.9) 
Urine 502 115 87.0 58 89.7 27.4 
  CIN2+ CIN3+ <CIN2 
B) 34.2 All N n Sensitivity (adjusted) n Sensitivity (adjusted) Specificity (adjusted) 
WD 298 70 85.7 (84.3) 34 88.2 (85.3) 34.6 (34.2) 
DF 298 70 84.3 (82.9) 34 85.3 (85.3) 34.6 (38.6) 
QTa 296 63 77.8 (82.5) 33 72.7 (84.8) 38.2 (37.9) 
HSa 284 61 68.9 (75.4) 31 61.3 (77.4) 41.3 (38.3) 
Urine 502 115 77.4 58 84.5 29.5 
  CIN2+ CIN3+ <CIN2 
C) 38.3 All N n Sensitivity (adjusted) n Sensitivity (adjusted) Specificity (adjusted) 
WD 298 70 92.9 (92.9) 34 94.1 (94.1) 26.3 (26.3) 
DF 298 70 92.9 (92.9) 34 94.1 (94.1) 26.8 (28.9) 
QTa 296 63 92.1 (90.5) 33 90.9 (93.9) 24.5 (25.9) 
HSa 284 61 90.2 (91.8) 31 90.3 (93.5) 30.5 (32.9) 
Urine 502 115 93.9 58 93.1 19.6 

aNote that for the adjusted sensitivity and specificity for CIN2+ and <CIN2, there is one less record for both Qvintip and Herswab (N = 295 and 283, respectively). This is due to missing internal control values.

Cellularity adjustment

The number of cells collected in vaginal samples by the different devices and the urine sample were compared using the Ct values for the β-globin internal control as a proxy for cell number (Fig. 2). The greatest number of cells was observed for DF [median Ct = 24.3 (IQR:23.4–25.5)], followed by urine [median Ct = 24.6 (IQR:23.4–26.2)], QT [median Ct = 25.3 (IQR:24.0–27.2)], WD [median Ct = 25.7 (IQR:25.2–26.8)], and HS [median Ct = 26.1 (IQR:23.9–30.0)]. DF provided more cells than QT and HS (P < 0.0001 for both tests). A Ct value ≤34.2 for β-globin was used to determine sample adequacy, and samples with values above this level were deemed inadequate unless a positive value was seen for some HPV type. On average HS samples provided the smallest median number of cells and 16 of 300 samples (5.3%) failed to meet this sample adequacy measure. Four QT samples (1.3%) also failed to meet this requirement, as did one urine sample, but all other samples were deemed adequate (Fig. 1). The largest variation in cellular yield based on β-globin Ct values was observed for samples collected with HS and the smallest with WD.

Figure 2.

Cellularity of samples by device type. The cellularity is represented by the Ct values of the internal control (β-globin). A low Ct value indicates a higher cellularity. and the vertical scale is ordered by decreasing Ct values. Ct values >34.2 without HPV positivity were deemed to be inadequate.

Figure 2.

Cellularity of samples by device type. The cellularity is represented by the Ct values of the internal control (β-globin). A low Ct value indicates a higher cellularity. and the vertical scale is ordered by decreasing Ct values. Ct values >34.2 without HPV positivity were deemed to be inadequate.

Close modal

The hrHPV positivity rate was also examined after adjustment for cellularity for the vaginal devices. This was particularly relevant for HS, because of its generally low and variable cellularity. After adjustment its overall positivity rate, sensitivity for CIN2+ at the predefined genotype-specific cutoff increased to 85.2%, which was comparable with WD and DF (Table 2A). Supplementary Table S4 indicated agreement between pairs of vaginal devices at the predefined type specific cutoff was nonsignificantly improved after cellular adjustment (WD and DF kappa 0.803 before vs. 0.858 after; QT vs. HS 0.739 before vs. 0.850 after).

Questionnaire results

Of the 600 women, 504 provided an adequate urine sample and 454 of these replied regarding device preference (see supplement for the questionnaire). Within this group, 46% preferred one of the two vaginal devices, 32% preferred urine, and 22% expressed no preference (Supplementary Table S6, where further details by device are given). In general, the vaginal device used first was preferred to the one used second, with the average preference for the first device being 29% versus 17% for the second sample (P < 0.0001). Preference for the urine sample was similar to that for the first vaginal sample. More women expressed no device preference in the WD/DF group, but Table 3 shows when restricted to women who expressed a preference between vaginal devices, there was a clear preference for DF over WD (71.4%; P = 0.003), but no clear preference between QT and HS (53.4% preferred HS; P = 0.48). Similar vaginal sampling device preferences were seen when the 95 women who did not provide a urine sample and one who had an inadequate sample were included. No correlation was seen between “quite difficult” or “very difficult” ease of use and cellularity ≥ 30 Ct for either QT or HS (P > 0.5 for both). There was no clear preference for device according to age. Neither was time since last urination predictive of preference.

Table 3.

Device preference by randomization group (row percentages), among 454 of 504 women with adequate urine sample who provided preference information, ignoring sample order. Results are split according to whether a woman was in the WD/DF group or QT/HS group.

Device preference
Randomization groupNVaginal deviceUrineNo preference
  WD DF   
WD/DF 222 22 (9.9%) 55 (24.8%) 80 (36.0%) 65 (29.3%) 
      
  QT HS   
QT/HS 232 61 (26.3%) 70 (30.2%) 65 (28.0) 36 (15.5) 
      
  Either vaginal device   
Total 454 208 (45.8) 145 (31.9) 101 (22.2) 
Device preference
Randomization groupNVaginal deviceUrineNo preference
  WD DF   
WD/DF 222 22 (9.9%) 55 (24.8%) 80 (36.0%) 65 (29.3%) 
      
  QT HS   
QT/HS 232 61 (26.3%) 70 (30.2%) 65 (28.0) 36 (15.5) 
      
  Either vaginal device   
Total 454 208 (45.8) 145 (31.9) 101 (22.2) 

Ease of use was recorded on a five-point scale: very easy, quite easy, neither easy nor difficult, quite difficult, and very difficult. No effect of sample order was seen in this reply and combined results are shown in Table 4A. The HS device emerged as the device least easy to use with 12.6% indicating it was quite difficult or very difficult (P < 0.003 for all comparisons), with no other significant differences.

Table 4.

A) Women's reported ease of use of five collection devices, and B) their confidence of correctly using devices, among 504 women with adequate urine sample, ignoring sample order of vaginal device use.

A) Ease of using device
DeviceNVery easyQuite easyNeither easy nor difficultQuite difficultVery difficultExcludedOverall total
WD 247 128 (51.8%) 91 (36.8%) 19 (7.7%) 9 (3.6%) 0(0.0%) 249 
DF 247 155 (62.8%) 71 (28.7%) 12 (4.9%) 8 (3.2%) 1 (0.4%) 249 
QT 254 119 (46.9%) 87 (34.3%) 35 (13.8%) 13 (5.1%) 0 (0.0%) 255 
HS 254 108 (42.5%) 89 (35.0%) 25 (9.8%) 29 (11.4%) 3 (1.2%) 255 
Urine 501 304 (60.7%) 146 (29.1%) 29 (5.8%) 17 (3.4%) 5 (1.0%) 504 
A) Ease of using device
DeviceNVery easyQuite easyNeither easy nor difficultQuite difficultVery difficultExcludedOverall total
WD 247 128 (51.8%) 91 (36.8%) 19 (7.7%) 9 (3.6%) 0(0.0%) 249 
DF 247 155 (62.8%) 71 (28.7%) 12 (4.9%) 8 (3.2%) 1 (0.4%) 249 
QT 254 119 (46.9%) 87 (34.3%) 35 (13.8%) 13 (5.1%) 0 (0.0%) 255 
HS 254 108 (42.5%) 89 (35.0%) 25 (9.8%) 29 (11.4%) 3 (1.2%) 255 
Urine 501 304 (60.7%) 146 (29.1%) 29 (5.8%) 17 (3.4%) 5 (1.0%) 504 
B) Confidence in correctly using devices
NVery confidentFairly confidentNot confidentExcludedOverall total
WD 247 97 (39.3%) 132 (53.4%) 18 (7.3%) 249 
DF 247 109 (44.1%) 113 (45.7%) 25 (10.1%) 249 
QT 254 88 (34.6%) 146 (57.5%) 20 (7.9%) 255 
HS 253 92 (36.4%) 125 (49.4%) 36 (14.2%) 255 
Urine 500 308 (61.6%) 170 (34.0%) 22 (4.4%) 504 
B) Confidence in correctly using devices
NVery confidentFairly confidentNot confidentExcludedOverall total
WD 247 97 (39.3%) 132 (53.4%) 18 (7.3%) 249 
DF 247 109 (44.1%) 113 (45.7%) 25 (10.1%) 249 
QT 254 88 (34.6%) 146 (57.5%) 20 (7.9%) 255 
HS 253 92 (36.4%) 125 (49.4%) 36 (14.2%) 255 
Urine 500 308 (61.6%) 170 (34.0%) 22 (4.4%) 504 

A major difference in confidence of correct use across devices was seen (Table 4B), with the greatest confidence seen for urine (61.6% as very confident and 4.4% not confident, P < 0.0001 for very confident, all comparisons) and the least confidence for HS (36.4% very confident and 14.2% as not confident, P < 0.03 for not confident for all comparisons except DF). Lower numbers expressing high confidence were also seen for QT than for WD or DF, but these were of marginal significance.

The focus of this study was to compare four different devices for vaginal self-sampling between themselves and also a urine sample, for HPV positivity, user preference, and confidence in collection. We have reported separately results for CIN2+ and CIN3+, because CIN2 is less likely to progress and has greater variability in diagnosis (16, 17) and CIN3 has been shown to have greater reproducibility (18).

Use of the ≤34.2 cutoff, as we previously have done for clinical samples, led to a clinically relevant reduction in sensitivity, but when a higher Ct cutoff was used for HPV16 (≤38.3), as recommended by the manufacturer and used previously (19), sensitivities and were similar to those seen in ours and other previous studies (7, 20–23). However as seen in Table 2, specificity was strongly dependent on the positivity threshold, and large reductions in specificity were seen when a 38.3 threshold was used for all genotypes. Sensitivity can be fairly accurately measured in a referral population, as it focuses on the diseased population. However, specificity is more problematic, as this is highly dependent on the HPV positivity rate, and passenger HPV infections not related to disease and likely to regress naturally are known to be common. While useful information on relative specificity of different tests and devices can be obtained, as was our primary goal here, a study in a screening population is required to assess absolute specificity. We are planning such a study of self versus clinician samples taken at the same visit in a screening population, which has not been previously undertaken on a large scale. In addition to evaluating the sensitivity and specificity in the intended use population, such a study will also examine preference issues in such a population, where the issues could be different than in a referral population, as women are already committed to having screening and other procedures.

High sensitivity is particularly important for self-sampling, especially for women who have declined previous invitations for a clinician taken test, as this may be a rare opportunity to screen them. Thus, technical issues, such as dilution amounts, and less restrictive thresholds for positivity that increase sensitivity, need to be explored. In general, one could imagine three outcomes from a self-sample: (i) HPV negative—return to routine screening, (ii) strongly HPV positive based on higher risk genotypes, methylation profile, viral load and/or other markers identifiable in the self-sample, which would lead to immediate colposcopy, and (iii) weaker positives, based on lower risk genotypes (e.g., 56, 59, 68), lack of methylation or other biomarkers, in which case the next step could be a clinically taken sample where cytology could also be performed, or a repeat self-sample at, say 3–12 months.

When not adjusted for cellularity, the WD swab transported in liquid STM and the dry transported DF gave the highest hrHPV positivity, overall, and for CIN2+ and CIN3+. Women also expressed a preference for the DF swab over the WD device, but a limitation of the study design did not allow for direct comparisons of these devices with QT or HS. Urine samples gave similar sensitivities to WD and DF, and were well received by the women, but specificities were lower. The HS performed least well, apparently largely due to the very large variability in cellularity of the sample between women. For 20 (13%) women, no cells were detected with HS. When adjusted for cellularity, the results were more similar to the other devices. However, women were least confident that they had provided a satisfactory sample with this device. The Evalyn brush has been used in several other studies with good results, but in two previous studies in minority populations (23, 24), we found women were less confident in using it, and so in this study we chose to put more emphasis on simpler devices, which we found the women preferred. Although not included here, we note that the manufacturer of the Evalyn device also supplies a cheaper Viba brush which has the same brush head attached to a simple shaft handle and has reported equivalent performance to clinician-collected samples (25).

At a pragmatic level, the DF swab appeared to be the device of choice—based on high detection rates, more favorable cost, and the important ability to transport the sample dry, which is especially important for home collection where small children may be present and transport via the post, as the STM used for liquid transport contains guanidinium hydrochloride, which is known to be an irritant. We acknowledge a design weakness in that only WD or DF swab versus QT or HS were compared in the same woman. Ideally, we would have randomized all possibilities in which each woman had an equal chance of receiving any two of the four possible vaginal samplers. However, this presented logistic challenges and a danger of errors in a busy clinic and was not implemented, although full randomization of devices between women was achieved.

The urine sample collected using the Colli-Pee device was well received by the women, and gave good results with high sensitivity, but lower specificity. In contrast, a recent study (25), where urine was collected by a different method, lower sensitivity but higher specificity was seen for urine than for self-collected vaginal samples. The liquid UCM preservative medium used here is nontoxic, which makes it suitable for home use, although urine is a body fluid and hence a biohazard.

In summary, there were clear differences in the performance and patient preferences between the five collection methods. Both WD and DF gave good results, and there was very good agreement in HPV positivity. Urine had a similar sensitivity to these two, but its specificity was lower. An important advantage of DF is that it can be transported dry. Sensitivity was very dependent on the positivity threshold and use of a higher Ct cutoff for HPV16 improved results. However, these results need confirmation in a routine screening population, and specificity needs evaluation in that context. The role of triage markers such as genotyping and methylation also require more study.

L. Cadman reports grants from Becton Dickinson, Qiagen, and Genefirst; non-financial support from Novosanis and Eve Medical, during the conduct of the study. L. Ashdown-Barr reports grants from Becton Dickinson, Qiagen, and Genefirst; non-financial support from Novosani, and Eve Medical, during the conduct of the study; J. Cuzick reports personal fees from Hologic, grants from Qiagen, Genera Biosystems, Hologic, GeneFirst and Trovagene; personal fees from Qiagen, Becton Dickinson, Genera Biosystems, and Merck during the conduct of the study. No dislosures were reported by the other authors.

L. Cadman: Resources, supervision, validation, writing–original draft, project administration, writing–review and editing. C. Reuter: Investigation. M. Jitlal: Formal analysis, validation, writing–review and editing. M. Kleeman: Investigation, writing–review and editing. J. Austin: Data curation, writing–review and editing. T. Hollingworth: Supervision, writing–review and editing. A.L. Parberry: Resources. L. Ashdown-Barr: Resources, data curation, writing–review and editing. D. Patel: Resources, data curation, software, writing–review and editing. B. Nedjai: Resources, supervision, validation, investigation, project administration, writing–review and editing. A.T. Lorincz: Conceptualization, resources, supervision, methodology, writing–review and editing. J. Cuzick: Conceptualization, data curation, formal analysis, supervision.

Funding was provided by Cancer Research UK programme grant no. C569/A16891 to J. Cuzick. Assay kits and an additional financial contribution were provided by Becton Dickinson. Colli-Pee urine collection kits and HerSwab kits were donated by Novosanis and Eve Medical, respectively. Commercial companies had no role in study design, data collection, analyses, or writing of this article but were allowed to see the article prior to submission for publication.

We thank the staff of the gynecology outpatients department, The Royal London Hospital, for their invaluable help. We are very grateful to all the women who took part in the study.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Public Health England
. 
Cervical screening standards data report 1 April 2018 to 31 March 2019
; 
2020
.
PHE publications gateway number GW-968
.
Available from:
https://phescreening.blog.gov.uk/2020/01/14/cervical-screening-standards-report/.
2.
Freeman
M
,
Waller
J
,
Sasieni
P
,
Lim
AW
,
Marlow
LA
. 
Acceptability of non-speculum clinician sampling for cervical screening in older women: a qualitative study
.
J Med Screen
2018
;
25
:
205
10
.
3.
Haguenoer
K
,
Sengchanh
S
,
Gaudy-Graffin
C
,
Boyard
J
,
Fontenay
R
,
Marret
H
, et al
Vaginal self-sampling is a cost-effective way to increase participation in a cervical cancer screening programme: a randomised trial
.
Br J Cancer
2014
;
111
:
2187
96
.
4.
Forrest
S
,
McCaffery
K
,
Waller
J
,
Desai
M
,
Szarewski
A
,
Cadman
L
, et al
Attitudes to self-sampling for HPV among Indian, Pakistani, African-Caribbean and white British women in Manchester, UK
.
J Med Screen
2004
;
11
:
85
8
.
5.
Waller
J
,
McCaffery
K
,
Forrest
S
,
Szarewski
A
,
Cadman
L
,
Austin
J
, et al
Acceptability of unsupervised HPV self-sampling using written instructions
.
J Med Screen
2006
;
13
:
208
13
.
6.
Darlin
L
,
Borgfeldt
C
,
Forslund
O
,
Hénic
E
,
Hortlund
M
,
Dillner
J
, et al
Comparison of use of vaginal HPV self-sampling and offering flexible appointments as strategies to reach long-term non-attending women in organized cervical screening
.
J Clin Virol
2013
;
58
:
155
60
.
7.
Arbyn
M
,
Smith
SB
,
Temin
S
,
Sultana
F
,
Castle
P
,
Collaboration on
S-S
, et al
Detecting cervical precancer and reaching underscreened women by using HPV testing on self samples: updated meta-analyses
.
BMJ
2018
;
363
:
k4823
.
8.
Polman
NJ
,
Ebisch
RMF
,
Heideman
DAM
,
Melchers
WJG
,
Bekkers
RLM
,
Molijn
AC
, et al
Performance of human papillomavirus testing on self-collected versus clinician-collected samples for the detection of cervical intraepithelial neoplasia of grade 2 or worse: a randomised, paired screen-positive, non-inferiority trial
.
Lancet Oncol
2019
;
20
:
229
38
.
9.
Belinson
JL
,
Hu
S
,
Niyazi
M
,
Pretorius
RG
,
Wang
H
,
Wen
C
, et al
Prevalence of type-specific human papillomavirus in endocervical, upper and lower vaginal, perineal and vaginal self-collected specimens: Implications for vaginal self-collection
.
Int J Cancer
2010
;
127
:
1151
7
.
10.
Othman
NH
,
Zaki
FHM
. 
Self-collection tools for routine cervical cancer screening: a review
.
Asian Pac J Cancer Prev
2014
;
15
:
8563
69
.
11.
Vorsters
A
,
Van den Bergh
J
,
Micalessi
I
,
Biesmans
S
,
Bogers
J
,
Hens
A
, et al
Optimization of HPV DNA detection in urine by improving collection, storage, and extraction
.
Eur J Clin Microbiol Infect Dis
2014
;
33
:
2005
14
.
12.
Pathak
N
,
Dodds
J
,
Zamora
J
,
Khan
K
. 
Accuracy of urinary human papillomavirus testing for presence of cervical HPV: systematic review and meta-analysis
.
BMJ
2014
;
349
:
g5264
.
13.
Cuzick
JM
,
Cadman
L
,
Ahmad
AS
,
Ho
L
,
Terry
G
,
Kleeman
M
, et al
Performance and diagnostic accuracy of a urine-based human papillomavirus assay in a referral population
.
Cancer Epidemiol Biomarkers Prev
2017
;
26
:
1053
9
.
14.
Sargent
A
,
Fletcher
S
,
Bray
K
,
Kitchener
HC
,
Crosbie
EJ
. 
Cross-sectional study of HPV testing in self-sampled urine and comparison with matched vaginal and cervical samples in women attending colposcopy for the management of abnormal cervical screening
.
BMJ Open
2019
;
9
:
e025388
.
15.
Leeman
A
,
Del Pino
M
,
Molijn
A
,
Rodriguez
A
,
Torné
A
,
de Koning
M
, et al
HPV testing in first-void urine provides sensitivity for CIN2+ detection comparable with a smear taken by a clinician or a brush-based self-sample: cross-sectional data from a triage population
.
BJOG
2017
;
124
:
1356
63
.
16.
Silver
MI
,
Gage
JC
,
Schiffman
M
,
Fetterman
B
,
Poitras
NE
,
Lorey
T
, et al
Clinical outcomes after conservative management of cervical intraepithelial neoplasia grade 2 (CIN2) in women ages 21–39 years
.
Cancer Prev Res
2018
;
11
:
165
70
.
17.
Rohner
E
,
Rahangdale
L
,
Sanusi
B
,
Knittel
AK
,
Vaughan
L
,
Chesko
K
, et al
Test accuracy of human papillomavirus in urine for detection of cervical intraepithelial neoplasia
.
J Clin Microbiol
2020
;
58
:
e01443
19
.
18.
Darragh
TM
,
Colgan
TJ
,
Cox
JT
,
Heller
DS
,
Henry
MR
,
Luff
RD
, et al
Members of the LAST project work groups. the lower anogenital squamous terminology standardization project for HPV-associated lesions: background and consensus recommendations from the college of American pathologists and the American Society for Colposcopy and Cervical Pathology
.
Arch Pathol Lab Med
2012
;
136
:
1266
9716
.
19.
Stoler
MH
,
Wright
TC
 Jr
,
Ferenczy
A
,
Ranger-Moore
J
,
Fang
Q
,
Kapadia
M
, et al
Routine use of adjunctive p16 immunohistochemistry improves diagnostic agreement of cervical biopsy interpretation: results from the CERTAIN study
.
Am J Surg Pathol
2018
;
42
:
1001
9
.
20.
Szarewski
A
,
Mesher
D
,
Cadman
L
,
Austin
J
,
Ashdown-Barr
L
,
Ho
L
, et al
A comparison of seven tests for high grade cervical intraepithelial neoplasia in women with abnormal smears: the predictors 2 study
.
J Clin Microbiol
2012
;
50
:
1867
73
.
21.
Cuzick
J
,
Cadman
L
,
Mesher
D
,
Austin
J
,
Ashdown-Barr
L
,
Ho
L
, et al
Comparing the performance of six human papillomavirus tests in a screening population
.
Br J Cancer
2013
;
108
:
908
13
.
22.
Cuzick
J
,
Ahmad
AS
,
Austin
J
,
Cadman
L
,
Ho
L
,
Terry
G
, et al
A comparison of different human papillomavirus tests in PreservCyt versus SurePath in a referral population-PREDICTORS 4
.
J Clin Virol
2016
;
82
:
145
51
.
23.
Szarewski
A
,
Cadman
L
,
Ashdown-Barr
L
,
Waller
J
. 
Exploring the acceptability of two self-sampling devices for human papillomavirus testing in the cervical screening context: a qualitative study of Muslim women in London
.
J Med Screen
2009
;
16
:
193
8
.
24.
Cadman
L
,
Ashdown-Barr
L
,
Waller
J
,
Szarewski
A
. 
Attitudes towards cytology and human papillomavirus self-sample collection for cervical screening among hindu women in London, UK: a mixed methods study
.
J Fam Plann Reprod Health Care
2015
;
41
:
38
47
.
25.
Szarewski
A
,
Ambroisine
L
,
Cadman
L
,
Austin
J
,
Ho
L
,
Terry
G
, et al
Comparison of predictors for high-grade cervical intraepithelial neoplasia in women with abnormal smears
.
Cancer Epidemiol Biomarkers Prev
2008
;
17
:
3033
42
.