To the Editor: By investigating the accuracy and reproducibility of short tandem repeat assays, Tran et al. address an important issue in epidemiologic research. In their analyses, sequencing and sizing analysis yielded an exact match on repeat number for fewer than 10% of samples. Their conclusion that “… laboratory analysis of dinucleotide STR may not be as reliable as originally thought” is, therefore, very generous. For their sample size, one would normally expect 100% concordance between two assays. This kind of result should have prompted the authors to investigate the reason for the discrepancies.
However, there are several explanations for the high discrepancy between the two assays that the authors apparently did not address:
Because of the stutter bands in sizing assays, the data interpretation can be difficult (1). However, the second chromatogram in Fig. 2 shows a clear example of a 187/189 heterozygote and not a 187 homozygote.
Direct sequencing of an amplified PCR fragment is a poor choice when what is needed is an allele-sizing assay. As Fig. 2 shows, there is substantial—and unavoidable—stutter in the PCR reaction. Sequencing this mix of fragments will inevitably lead to an apparent heterozygous call.
The stutter bands are usually shorter than the original template. By using the number of repeats with unambiguous sequence calls as the allele size, the authors misclassified alleles as shorter than they really were; for example, if the true repeat length is 20, then the most prominent stutter band has 19 repeats. Sequencing will produce unambiguous calls only for 19 repeats. Furthermore, to allow proper comparisons, the authors would have been best advised to use the same method for scoring short and long alleles.
On any gel, DNA fragments usually run within ±10% of their true size. In addition to size, the migration is affected by secondary structure and charge. This applies to the fragment of interest as well as the size standards. Therefore, before launching into a genotyping project, it is essential to establish the correlation between apparent fragment length in a sizing assay and the number of repeats for each system individually. This is best done by cloning individual alleles and sequencing the plasmids without prior PCR amplification. Stutter during sequencing may still be an issue but is less of a problem than adding stutter in sequencing to stutter in PCR.
Because a population is never exclusively heterozygous for polymorphisms, such as the insulin-like growth factor-I CA repeat, the authors might reasonably have been expected to question their sequencing results.
A reassessment of the data that takes into account the source of errors inherent in these assays may result in more meaningful conclusions regarding associations between insulin-like growth factor-I genotype and breast cancer risk.
The comments made by Jeannette Bigler and colleagues are valuable and helpful as one of them reiterates the point that we intend to emphasize in our report, which is that the determination of dinucleotide short tandem repeats using DNA sizing analysis is somehow subjective when the differences between homozygous and heterozygous alleles are not obviously shown in the chromatogram. The difference between the calls made by Jeannette Bigler and one of our investigators who did the sizing analysis without any information on the specimen is a perfect example that shows the inconsistency in determination of dinucleotide short tandem repeat genotype with the use of sizing analysis. Our comparison of sequencing results with those of sizing analysis also indicates that it is possible that some of the heterozygous samples may be misclassified into homozygous genotypes by the sizing analysis. Another interesting finding of our comparison is that sequencing results matched well to the sizing results when concerning the size of short alleles; the discrepancies mainly come from the comparison of the long alleles. Thus, the explanations provided in the letter will not change our findings even if we reassess our laboratory data accordingly.
The second point that we would like to convey in our article is that when using DNA sizing analysis to determine short tandem repeat genotype, one should be aware of the limit of the method in terms of resolution (i.e., the minimal size of nucleotide difference that the method can detect). Although sizing analysis is a perfect method for large short tandem repeats (trinucleotide or larger), the method may not have adequate resolution for a single dinucleotide difference. Under well-controlled laboratory conditions and careful performance of the method by experienced staff, we believe sizing analysis is capable of distinguishing a signal nucleotide difference. However, in large-scale epidemiologic studies, many stringent laboratory conditions are compromised and DNA samples are highly heterogeneous in terms of their quality due to variations in specimen collection, storage, processing, and handling. Under these situations, whether the method still maintains the same resolution is uncertain and needs reassessment.
Although sequencing analysis is not a method of choice for short tandem repeat genotyping, comparison of the results between these methods does reveal several interesting findings. As a result, questions are raised as to whether sizing analysis is sensitive enough to identify a single dinucleotide difference, and if not, whether misclassification generated by the method will affect study results and in which direction.