Skip to main content

Showing 1–1 of 1 results for author: Harshith, N

  1. arXiv:2204.11835  [pdf, other

    q-bio.QM cs.AI cs.LG

    A Novel Scalable Apache Spark Based Feature Extraction Approaches for Huge Protein Sequence and their Clustering Performance Analysis

    Authors: Preeti Jha, Aruna Tiwari, Neha Bharill, Milind Ratnaparkhe, Om Prakash Patel, Nilagiri Harshith, Mukkamalla Mounika, Neha Nagendra

    Abstract: Genome sequencing projects are rapidly increasing the number of high-dimensional protein sequence datasets. Clustering a high-dimensional protein sequence dataset using traditional machine learning approaches poses many challenges. Many different feature extraction methods exist and are widely used. However, extracting features from millions of protein sequences becomes impractical because they ar… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.