Germline Cancer Genetic Testing Data Automation
The Department of Pediatric Hematology/Oncology has many Excel files across multiple folders containing results from germline cancer genetic testing. CAAI is automating the process of pulling the necessary results from these Excel files so they can be used. If automated, the ability to go through these files would save time by mitigating the need to go through the files manually. Right now, the genomic data is stored in different folders within Excel. They have been going through and manually collecting files, but CAAI is writing an automation script to put them into one big summary file to prevent the need to look at multiple files.
First, filtering is performed to limit the data to only what is important for the final output. This includes limiting the data to mutations that are identified as potentially pathogenic or have an unclear status, and limiting the data to only those mutations that have a frequency of 1% or less in the population. To add additional data for these mutations and clear up uncertainty about mutation classifications, the ClinVar API is used for these mutations to extract necessary data involving the classification of the variants and the certainty of those classifications. These results, using both the data from the original gene panels and ClinVar, are then aggregated together in a single file, with a column identifying the patient ID for each of those mutations. This output file then allows researchers to easily identify potentially pathogenic mutations for a large number of patient files, and the researchers can then cross-reference these results with the original data files easily using filepaths made available in the final output. This allows this process of automation to be easy to use and interpret.
This project is currently in progress.