Explore more publications!

AI Model May Improve RNA Sequencing Research

Rendong Yang, PhD, associate professor of Urology, posing for a headshot.
Rendong Yang, PhD, associate professor of Urology, was senior author of the study published in Nature Communications. 

Scientists in the laboratory of Rendong Yang, PhD, associate professor of Urology, have developed a new large language model that can interpret transcriptomic data in cancer cell lines more accurately than conventional approaches, as detailed in a recent study published in Nature Communications.  

Long-read RNA sequencing technologies have transformed transcriptomics research by detecting complex RNA splicing and gene fusion events that have often been missed by conventional short-read RNA-sequencing methods.  

Among these technologies includes nanopore direct RNA sequencing (dRNA-seq), which can sequence full-length RNA molecules directly and produce more accurate analyses of RNA biology. However, previous work suggests this approach may generate chimera artifacts — in which multiple RNA sequences incorrectly join to form a single RNA sequence— and limit the reliability and utility of the data.  

“If we cannot distinguish this artifact versus the true gene fusion event, that will severely affect your downstream quantification of gene expression values or the gene isoforms. Sometimes you may miss a particular gene isoform due to this chimeric artifact, which can affect how you determine the gene fusion drivers for a particular disease,” Yang said. 

To address this challenge, Yang’s team developed a new genomic large language model, called DeepChopper, to facilitate the detection and removal of chimera artifacts in dRNAseq data.  

“Leveraging recent advances in large language model that can interpret complex genetic patterns, DeepChopper processes long genomic contexts with single-nucleotide resolution. This capability enables precise identification of adapter sequences within base-called long reads, facilitating the detection and removal of chimera artifacts in dRNAseq data,” the authors wrote.  

To validate their tool, the investigators used DeepChopper to sequence the transcriptome of prostate cancer cells line, which further revealed the prevalence of chimera artifacts.  

“We demonstrated that these artifacts significantly impact transcriptomic analysis by complicating gene fusion detection, transcript annotation, and alternative splicing studies,” the authors wrote.  

The approach, Yang said, could be used to identify false-positive RNA biological events in cancer samples.   

“That could be important to understand how underlying genomic instabilities occurred biologically rather than due to a technical artifact,” Yang said. “We can think about ways to utilize AI and large language models to address that rather than using a traditional rule-based approach.”   

Yangyang Li, a student in the Driskill Graduate Program in Life Sciences (DGP), was lead author of the study. Qi Cao, PhD, the Anthony J. Schaeffer, MD, Professor of Urology, was a co-author of the study.  

Yang and Cao are members of the Robert H. Lurie Comprehensive Cancer Center of Northwestern University.  

This work was supported in part by National Institute of Health grants R35GM142441 and R01CA259388, R01CA256741, R01CA278832 and R01CA285684.  

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Share us

on your social networks:
AGPs

Get the latest news on this topic.

SIGN UP FOR FREE TODAY

No Thanks

By signing to this email alert, you
agree to our Terms & Conditions