Gene Mutations and Motifs Detection for Coronavirus  in Biological Sequences of COVID-19  using Deep Learning Models

Gugulothu, Praveen

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/3486

Full metadata record

DC Field	Value	Language
dc.contributor.author	Gugulothu, Praveen	-
dc.date.accessioned	2025-10-28T09:22:06Z	-
dc.date.available	2025-10-28T09:22:06Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/3486	-
dc.description	NITW	en_US
dc.description.abstract	In bioinformatics and computational biology, DNA Genome sequence analysis covers a broad range of research issues, such as identifying homology between sequences, recog nition of intrinsic features, mutation detection, genetic diversity disclosure, and species evolution. Sophisticated sequencing technologies produce enormous DNA sequence data, thereby raising the difficulty of analysing sequences as well. The growth of genomic data is much faster compared to the sequence analysis rate. So, there is an enormous need for faster sequence analysis algorithms. Analysis of genome sequences is useful in disease detection, drug development, agriculture and forensics. Our solution to this problem is a Convolutional Neural Network (CNN) that can handle huge DNA sequences using Covid 19 feature extraction. Given the fast spread of the disease, one of the world’s primary concerns is detecting coronavirus disease 2019 (COVID-19). There have been over 1.6 million confirmed in stances of COVID-19, and the disease is rapidly spreading to numerous nations throughout the world, according to recent figures. An analysis of the global incidence and distribution of COVID-19 is presented. We introduce a deep convolutional neural network (CNN) that can distinguish between the original (non-augmented) dataset and the augmented dataset that were both utilised for the assessment. A variety of COVID-19 datasets, including those for MERS-CoV, SARS-CoV, NL63, Alpha-CoV, BetaCoV-1, HKU1-CoV, and 229E-CoV, have been compiled by us from NCBI and GISAD. Each dataset is annotated with its ac cession number and contains nucleotides in FASTA format. In this study, we compiled a positive and negative dataset consisting of 1582 samples with varying genome sequence lengths. By using one-hot encoding, every categorical variable is transformed into its own feature with a binary value of either 1 or 0. Thus, in one-hot encoding, every nucleotide is represented by a four-dimensional one-hot vector; for example, the letters "A," "C," "G," and "T" are encoded as (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0)„ and (0, 0, 0, 0), respectively. Us ing the top ten most sick coronavirus sequences as a guide, we trained the suggested CNN module to detect underlying patterns associated with the virus. Learned convolutional fil ters produce motif. The activation values for the 20th filter’s entire sub-sequences are less iii than 0.047075363 and close to 1.086 is the highest activation value is obtained. Advanced Deep Learning Method for COVID-19 Point Mutation Rate Optimisation by Coot-Lion Preventing disease or tailoring treatment to an individual’s needs both de pend on an accurate diagnosis. Unfortunately, the processing time is greatly impacted by the enormous quantity of sequences, even though DNA sequence illness detection is safe. Consequently, computational approaches are suggested to enhance diagnostic precision and expedite the diagnostic procedure. Genetic disorders occur when an organism’s DNA be comes aberrant as a result of mutations in exons. Our new Deep Quantum Neural Network (DQNN) called LBCA-based Deep QNN is built on the Lion-based Coot algorithm. It can forecast the COVID-19 virus using the DNA biological sequence pattern and the rates of point mutations. In this step, the genome sequences undergo feature extraction. This pro cess extracts specific features from the genome sequences, such as CpG-based features and numerical mapping for integer and binary data. Additionally, numerical mapping is applied using the Fourier transform to generate features for skewness, kurtosis, and peak to average power ratio. To get the entropy feature, we also use K-mer extraction. We determined the K-group for point mutations in COVID-19 for both the 200- and 400-genome sequence learning sets, respectively. Afterwards, we also focused on COVID-19 DNA sequence repeats for bi-character and tri-character types, among others, and put forward a DNA sequence clustering model called "ERSIT-GRU" (Exponential Robust Scaling-Identity Tanh-Gated Recurrent Unit) to detect COVID-19 DNA sequence repeats in large datasets. In order to address these challenges, such as the fact that the dataset is tiny, imbalanced, and has fasta quality issues, the dataset has been preprocessed in stages using multiple techniques in order to provide a useful train ing dataset. Consequently, computational approaches are suggested to enhance diagnostic precision and expedite the diagnostic procedure.Genetic disorders manifest in organisms when there is an aberration in their genetic composition as a result of exon mutations. The technique that uses the Trie data structure to forecast disease severity by counting the occur rences of repeat patterns in exons. Due to the tiny database, the suggested method can only forecast the condition of a small number of diseases, despite its effectiveness and speed in doing so based on pattern frequency. There is an immediate need to discover other patterns iv that produce varied diseases in order to solve the problem of a small number of pathogenic patterns. There is data in the genetic code that affects how fast and efficient translation is. In this extensive study of coronaviruses (CoVs) of both human and zoonotic origin, we compare and contrast their codon usage bias, relative errors in insertion and substitution, mutation rates in COVID-19, DNA motif sequence, size, feature extraction based on base frequency, dimer count, and feature extraction based on size. The evolutionary relationship between seven coronaviruses can be shown by the model Harris Hawks Optimisation (HHO) anal ysis, which we have presented. There have been many attempts to fix DNA-based errors using tandem repeats. Depending upon Age, symptoms, and chromosomes all have a role in the different patterns that correlate to normal, pre-mutated, and diseased frequencies. Tandem has identified the ATXN2, DMPK, ATN1, and JPH3 genes, among others, that are involved with disease state. The pattern frequency allows us to predict the disease’s progress and treat it at an early stage. Proposed model reached highest Accuracy in terms of the various Parameters like Accuracy, Precision, Recall, F1 Score. The pattern frequency allows us to predict the disease’s progress and treat it at an early stage.	en_US
dc.language.iso	en	en_US
dc.subject	Coronavirus	en_US
dc.subject	Convolution neural network	en_US
dc.title	Gene Mutations and Motifs Detection for Coronavirus in Biological Sequences of COVID-19 using Deep Learning Models	en_US
dc.type	Thesis	en_US
Appears in Collections:	Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
Full Thesis.pdf		3.87 MB	Adobe PDF	View/Open

Show simple item record