Use of uracil-DNA glycosylase enzyme to reduce DNA-related artifacts from formalin-fixed and paraffin-embedded tissues in diagnostic routine

Detection of somatic mutations is a mandatory practice for therapeutic definition in precision oncology. However, somatic mutation detection protocols use DNA from formalin-fixed and paraffin-embedded (FFPE) tumor tissues, which can result in detection of nonreproducible sequence artifacts, especially C:G > T:A transitions, in DNA. In recent studies, DNA pretreatment with uracil DNA glycosylase (UDG), an enzyme involved in base excision repair, significantly reduced the number of DNA artifacts after mutation detection by next-generation sequencing (NGS) and other methods, without affecting the capacity to detect real mutations. This study aimed to evaluate the effects of UDG enzymatic pretreatment in reducing the number of DNA sequencing artifacts from FFPE tumor samples, to improve the accuracy of genetic testing in the molecular diagnostic routine. We selected 12 FFPE tumor samples (10 melanoma, 1 lung, and 1 colorectal tumor sample) with different storage times. We compared sequencing results of a 16-hotspot gene panel of NGS libraries prepared with UDG-treated and untreated samples. All UDG-treated samples showed large reductions in the total number of transitions (medium reduction of 80%) and the transition/transversion ratio (medium reduction of 75%). In addition, most sequence artifacts presented a low variant allele frequency (VAF < 10%) which are eliminated with UDG treatment. Including UDG enzymatic treatment before multiplex amplification in the NGS workflow significantly decreased the number of artifactual variants detected in FFPE samples. Thus, including this additional step in the current methodology should improve the rate of true mutation detection in the molecular diagnostic routine.


Background
Next-generation sequencing (NGS) is the term used to describe numerous modern sequencing technologies that have been advancing genomic progress and revolutionizing biomedical research and clinical practice [1]. In oncology, a genetic variant can be used as a biomarker when it influences disease diagnosis or prognosis and predicts sensitivity to specific treatments [2]. Correct and precise mutation detection is an essential step in precision medicine [3]. Although the benefits of NGS are unquestionable, the complexities of these methodologies and the required DNA integrity demand an evolving set of standards to ensure testing quality [4].
Use of formalin fixation and paraffin embedding of solid tumor tissues is an obligatory approach in clinical pathology because this method preserves the cellular morphology (a necessary condition for further pathological analyses) and allows long-term room-temperature sample storage. DNA from formalin-fixed and paraffin-embedded (FFPE) tumor tissues is frequently used when detecting somatic mutations in defining the tumor profile and selecting the proper target therapy in cancer treatment [5]. However, such DNA presents extensive DNA degradation, which reduces the efficiency of polymerase chain reaction (PCR) amplification of DNA templates for further sequencing. Moreover, DNA from FFPE samples presents sequence artifacts that may interfere with detection of true gene mutations and increase the possibility of falsepositive mutation calls [6,7].
Sequence artifacts can arise from several types of DNA damage generated in FFPE tissues, including deamination of cytosine to uracil or deamination of 5methylcytosine (5-mC) to thymine [7]. Hydrolytic deamination of cytosine or 5-mC in DNA, which contributes significantly to the appearance of spontaneous mutations, occurs spontaneously in an aqueous environment and in response to temperature changes [8]. In living cells, uracil lesions in DNA are removed by uracil-DNA glycosylase (UDG/UNG). G-mispaired base lesions, including thymine, are repaired by either methyl-binding domain protein 4 (MBD4) or thymine-DNA glycosylase (TDG), enzymes involved in the base excision repair (BER) pathway [9]. However, in an artificial context, where there are no repair enzymes and lesions are not repaired, deamination of cytosine or 5-mC elicits guanine mispairing, which causes DNA polymerase to incorporate an adenine opposite to both uracil and thymine, causing C:G > T:A transitions [6].
To deal with this mechanism of sequence artifact formation, which could represent an obstacle for mutational analysis and precision medicine, the use of UDG enzymatic treatment before PCR amplification in the NGS routine has been proposed. Studies of UDG treatment have shown significant reductions in the frequency of C:G > T:A artifactual mutations, with no effects on the ability of NGS to detect real mutations [3,6,10,11]. Therefore, the aim of this work was to validate whether pretreatment with commercial UDG enzyme could reduce the number of sequencing artifacts of DNA extracted from FFPE tumors, to improve the genetic testing used in the molecular diagnostic routine.

Sample preparation
Twelve FFPE tissue samples were used in this study, derived as follows: 10 samples from patients with melanoma, 1 sample from a patient with lung adenocarcinoma, and 1 sample from a patient with colorectal cancer. FFPE tissues were subjected to histological analysis by a pathologist to assess the percentage of tumor cells and mark tumor areas. Manual dissection of tumor regions was performed in unstained slides. Genomic DNA was extracted by using the QIAamp DNA FFPE Tissue Kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions. DNA quantity and quality were assessed by using Nanodrop 1000 and Qubit dsDNA HS kit (Thermo Fisher Scientific, Wilmington, DE, USA).

Gene mutation analysis
Tumor somatic mutations were investigated by targetsequencing by using a custom Ion Ampliseq™ Panel (Thermo Fisher Scientific), containing hotspot regions of 16 genes frequently mutated in solid tumors (BRAF, CDH1, EGFR, ERBB2, HRAS, IDH1, IDH2, JAK2, KIT, KRAS, MET, NRAS, PDGFRA, PIK3CA, RET, and ROS1). Multiplex amplification was performed with 10 ng of DNA using Ion AmpliSeq Library Kit 2.0, with NGS performed with the Ion Proton platform (Thermo Fisher Scientific). To compare the effects of UDG treatment before multiplex amplification, 15 ng of DNA was treated with 0.5 μL (1 unit/μL) of a commercial enzyme (Thermo Fisher Scientific) for 30 min at 37°C and inactivated for 5 min at 95°C.
Mapping of sequencing reads and variant calling were performed by using Torrent Suite Browser and TVC (Thermo Fisher Scientific). Somatic mutations were considered if the variant allele was present in more than 2% of the reads, considering a minimum coverage depth of 100×. Called variants were imported, annotated, and filtered in VarSeq software (Golden Helix). The transition/ transversion ratio (Ts/Tv) was calculated by considering all identified variants.

Statistics
Descriptive statistics was used to describe the absolute and relative numbers of each mutation type (transitions and transversions) and the percentage of reduction after UDG treatment. The comparison of the number of transitions and transversions and Ts/Tv ratio between treated and untreated samples was done by using the paired t-test (and nonparametric tests) with 95% confidence intervals in GraphPad V5 software. Variant allele frequency (VAF) values of detected variants were compared between treated and untreated samples using the unpaired t-test. Results were considered statistically significant if the p-value was < 0.05.

Results
We performed NGS sequencing of a 16-gene panel using UDG-treated and untreated DNA obtained from 12 FFPE tumors samples previously known to harbor high levels of transitions in routine mutation analysis, to evaluate the capacity of reducing sequencing artifacts with UDG treatment. The medium number of variants was 82 for untreated samples (range: 2-338) and 12 for treated samples (range: 3-37) ( Table 1).
Samples showed significantly fewer transitions after UDG treatment (mean 18.8 ± 10.3) when compared with no enzyme treatment (mean 157.7 ± 99.1, p < 0.0007), with an average reduction of 80% (range: 21-95%) ( Table 1). UDG treatment had no consistent effect on the number of transversion variants (p = 0.3774), with an average reduction of 5.5%, as some samples presented extra variants and others showed fewer variants after UDG treatment (Table 1). This effect probably occurred due to other sequencing-associated factors that are unrelated to cytosine deamination. Ts/Tv ratio were higher in untreated compared to UDG-treated samples (p < 0.0007) and the decrease in the Ts/Tv ratio varied from 2 to 95% (mean 75%) ( Table 1). One sample (A43) presented a very small, very divergent reduction of the Ts/ Tv ratio compared to other samples (2%). When this value was removed from the analysis, the average Ts/Tv ratio was 83% (data not shown). For one sample (A20), multiplex amplification after UDG treatment failed completely, and sequencing did not result in any mapped reads.
To verify that artifactual variants presented lower allele frequencies than true variants, we compared VAFs of untreated and UDG-treated samples ( Fig. 1a and b), excluding the pair for sample A20. Most sequence artifacts presented a low VAF (< 10%), and most low VAFs were eliminated after UDG treatment. The mean VAF of untreated samples was 10.5% vs. 36% for UDG-treated samples (p < 0.0001; Fig. 1a). We analyzed the distribution of VAFs of each pair of untreated and UDG-treated samples (Fig. 1b). In all sample pairs, there was a significant increase in the mean VAF after enzyme treatment (p value between 0.0001 and 0.0276).
To visualize differences between true and artifactual variants, we aligned and visually inspected sequenced reads from untreated and UDG-treated samples. Figure 2 shows results of sequencing alignments of the KRAS gene from two representative samples, A11 and K2568. A11 is a melanoma sample in which a KRAS c.38G > A (p.Gly13Asp) variant was detected with 8.8% VAF before UDG treatment. K2568 is a colorectal tumor sample in which a KRAS c.35G > A (p.Gly12Asp) variant was detected with 13.0% VAF before treatment. After UDG treatment, only the mutation detected in sample K2568 remained as a result of a true variant; UDG removed all mis-incorporated thymine in sample A11, confirming it as an artifactual variant.

Discussion
Use of NGS techniques has revolutionized the practice of personalized oncology. Identification of real somatic variants, especially in driver genes such as EGFR for lung tumors, KRAS for colorectal cancer, and BRAF for melanoma, is a crucial step for defining the correct molecular target therapy [3]. Here, we evaluated the effects of UDG enzymatic pretreatment in reducing sequencing artifacts of DNA from FFPE tumor samples, to improve genetic testing used in the molecular diagnostic routine. Our results clearly showed that UDG enzymatic pretreatment eliminated most of the sequence artifacts that appeared at a frequency of lower than 10% in a 16-hotspot gene panel, Formalin is a formaldehyde-based fixative solution that is frequently used for the long-term storage of tumor biopsy samples. After fixation, tissue samples are usually embedded in paraffin, which promotes tissue preservation and provides a platform for tissue sectioning. Evaluation of DNA from FFPE solid tumor tissues for detection of somatic mutations is regularly performed for selecting patients to specific molecular target therapies in cancer treatment [5]. The fixation process preserves the tissue ultrastructure and cellular morphology by causing several different types of chemical interactions between adjacent macromolecules, including DNA molecules, within the tissue sample. However, this process can also lead to DNA damage through different mechanisms, including [7]: (i) extensive DNA fragmentation, the presence of which increases with longer storage time and lower pH due to formaldehyde oxidation in unbuffered solutions; (ii) protein-DNA, DNA-DNA, and DNA-formaldehyde crosslinking, which creates adducts; (iii) the formation of abasic sites (e.g., AP-sites) in the presence of water and/or reduced pH, which releases a free base and leaves a gap; and (iv) the deamination of cytosine to uracil or 5-mC to thymine, especially in CpG dinucleotides (where cytosine is commonly methylated). Indeed, some recent works have identified uracil lesions as a major source of sequence artifacts in FFPE DNA [3,6,10,11].
In living cells, BER is the main repair pathway involved in correction of nonbulky lesions produced by oxidation, alkylation, deamination, abasic sites (APsites), and single-strand DNA breaks to prevent the mutagenic effect of these lesions [12]. DNA glycosylases initiate BER by catalyzing cleavage of the Nglycosidic bond between the damaged base and its deoxyribose, resulting in an AP-site that is further processed by other BER enzymes, ending with insertion of the correct nucleotide [9].
UDGs are monofunctional glycosylases that belong to a conserved family of DNA repair enzymes that initiate the BER pathway and remove uracil from both singleand double-stranded DNA, with greater affinity for single-stranded DNA [13], leaving an AP-site. Upon cleavage, UDG appears to remain bound to its AP-sites. Because AP-sites are highly mutagenic and cytotoxic, this process may indicate a protective role of UDG in vivo until further action of the subsequent enzymes in the BER pathway [13]. Excision of 5-mC intermediates is, in turn, preferentially initiated by both TDG and MBD4, which excise thymine from T:G mispairs. Thymine is then replaced by cytosine in the BER pathway [14].
In an in vitro context, where there are no repair enzymes and the lesions artificially formed during experimental process cannot be repaired, deamination of cytosine gives rise to guanine mispaired with uracil, and deamination of 5-mC gives rise to guanine mispaired with thymine. These two situations can lead to the DNA polymerase incorporation of an adenine opposite to both uracil and thymine, causing C:G > T:A transition after PCR amplification. These C:G > T:A mutations can be either intrinsic to the sample before isolation (i.e., biologic) or an artifact of the methodology steps, including DNA isolation, PCR amplification, and/or sequencing [6]. A recent work showed that most of the publicly available datasets have signatures of damage, leading to erroneous calls in at least one third of the G-to-T variant reads. This situation corresponds to almost one incorrect call per cancer gene, thus confounding the identification of real somatic mutations [10].
To deal with this problem, some researchers have begun to incorporate a commercial UDG enzymatic treatment in their NGS protocols, before the PCR amplification step. UDG recognizes and cleaves the uracilcontaining DNA molecules, generating an AP-site without destroying the DNA sugar-phosphodiester backbone. The resulting AP-sites are susceptible to hydrolytic cleavage at the elevated temperatures used in PCR amplification cycles, resulting in DNA fragmentation. This additional step significantly reduces the frequency of C: G > T:A mutations, without affecting the capacity of NGS to detect real mutations [3,6,10,11]. On the other hand, although the thymine lesions generated by deamination of 5-mC are removable from double-stranded DNA by MBD4 and TDG in a cellular environment [14], until now there has been no methodology describing the use of these enzymes as a strategy to reduce sequence artifacts of FFPE DNA in an artificial situation [7].
A recent work showed that subclonal mutations in KRAS with a very low VAF (<3%) detected in FFPE samples of metastatic colorectal carcinoma may be artifactual, reinforcing the notion that UDG pretreatment of DNA is a mandatory step to identify true mutations that can govern the choice of a therapeutic compound [15]. Similarly, our results showed that most sequence artifacts presented a low VAF (<10%), and most of them were eliminated after Fig. 2 Sequencing alignments of true and artifactual variants in the KRAS gene. Two samples (A11 and K2568), untreated and treated with UDG enzyme, were selected as representative samples of identified variants. Artifactual variant c.38G > A in melanoma sample A11 was eliminated with UDG treatment. True variant c.35G > A in colorectal cancer sample K2568 was maintained after UDG treatment. As KRAS is a gene coded in the minus strand, both variants appear as cytosine (C) to thymine (T) changes in the nucleotide sequence UDG treatment. Moreover, we found a large reduction in the number of transitions (mean 80% reduction) by pretreatment with UDG, consistent with published data.
Some reports have demonstrated a high concordance in the reduction of mutation artifacts in fragmented or degraded DNA by UDG enzymatic pretreatment. Nevertheless, one study showed that in the case of low DNA input (30 ng of DNA from FFPE tumor tissue or 10 μL of cell-free DNA obtained from liquid biopsies), use of UDG treatment decreased PCR sensitivity sufficiently to hamper distinction between artifactual and true mutations. Thus, some caution should be exercised in the use of UDG pretreatment because of its potentially deleterious effects under some conditions [16]. Interestingly, in one of our samples, UDG treatment led to a failure of the amplification reaction, possibly due to an increase in the DNA degradation and low amount of starting material.

Conclusions
Here, we validated that pretreatment with UDG enzyme before multiplex amplification in the NGS sequencing workflow significantly decreased the number of artifactual variants, especially transitions, which were detected in FFPE samples. Our results suggest that including this additional step should improve the rate of true mutation detection in the molecular diagnostic routine.