Rakesh Sharma1,3, Monika1, Vandana Nunia4, Shailesh Kumar2, S. L. Kothari2, Sumita Kachhwaha1,3*
1Bioinformatics Infrastructure Facility (DBT-BIF), University of Rajasthan, Jaipur- 302004, Rajasthan, India
2Amity Institute of Biotechnology, Amity University Rajasthan, Jaipur- 303002, Rajasthan, India
3Departemnt of Botany, University of Rajasthan, Jaipur- 302004, Rajasthan, India
4Department of Zoology, University of Rajasthan, Jaipur- 302004, Rajasthan, India
*Address for Corresponding Author
Dr. Sumita Kachhwaha*
Bioinformatics Infrastructure Facility (DBT-BIF),
University of Rajasthan, Jaipur- 302004, Rajasthan, India.
Objective: The objective of the work is to develop a Hidden Markov Model (HMM) based approach for finding gene family from RNAseq data in Glycine max. Material and Methods: The publicly available RNAseq data for Glycine max was taken from Sequence Retrieval Archive (SRA) accession number SRR3090710, SRR3090711, SRR3090712 and SRR3090713. This quality of transcriptomics data was observed from FASTQC tool which was further filtered through Trimmomatic tool for filtering adapter and vector noises. Sequences of phred quality score ≥ 20 taken for further analysis where the sequences below this were removed to produce filtered data. The quality sequences were processed through Tuxedo protocol for alignment and assembly to produce transcript. The transcripts were processed by TransDecoder which identifies putative open reading frames and translates it to protein sequence. ClustalW was used for multiple sequence alignment generation. The alignment file for DREB (dehydration responsive element binding) candidate protein was used to build HMM profiles of translated transcripts. The HMM profile was used for finding the homologous sequences from RNAseq data. Results: The model developed through this method was tested by HMM search on E-value < 0.001 and we found 21 gene which have DREB like activity based on their HMM profile analysis. Conclusion: The method applied in this work is a novel method for identification of homologous gens for RNAseq datasets.
Keywords: Transcriptomics, Hidden Markov Model, alignment, bigdata, gene family, homologous