subscribe to arXiv mailings

Module control of network analysis in psychopathology

Authors: Chunyu Pan, Quan Zhang, Yue Zhu, Shengzhou Kong, Juan Liu, Changsheng Zhang, Fei Wang, Xizhe Zhang

Abstract: The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the contr… ▽ More The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the control relationships between symptoms remain largely unclear. Here, we present a novel systematizing concept, module control, to analyze the control principle of the symptom network at a module level. We introduce Module Control Network (MCN) to identify key modules that regulate the network's behavior. By applying our approach to a multivariate psychological dataset, we discover that non-emotional modules, such as sleep-related and stress-related modules, are the primary controlling modules in the symptom network. Our findings indicate that module control can expose central symptom cluster governing psychopathology network, offering novel insights into the underlying mechanisms of mental disorders and individualized approach to psychological interventions. △ Less

Submitted 30 May, 2024; originally announced July 2024.

arXiv:2405.00128 [pdf, other]

Target-Specific De Novo Peptide Binder Design with DiffPepBuilder

Authors: Fanhao Wang, Yuzhe Wang, Laiyi Feng, Changsheng Zhang, Luhua Lai

Abstract: Despite the exciting progress in target-specific de novo protein binder design, peptide binder design remains challenging due to the flexibility of peptide structures and the scarcity of protein-peptide complex structure data. In this study, we curated a large synthetic dataset, referred to as PepPC-F, from the abundant protein-protein interface data and developed DiffPepBuilder, a de novo target-… ▽ More Despite the exciting progress in target-specific de novo protein binder design, peptide binder design remains challenging due to the flexibility of peptide structures and the scarcity of protein-peptide complex structure data. In this study, we curated a large synthetic dataset, referred to as PepPC-F, from the abundant protein-protein interface data and developed DiffPepBuilder, a de novo target-specific peptide binder generation method that utilizes an SE(3)-equivariant diffusion model trained on PepPC-F to co-design peptide sequences and structures. DiffPepBuilder also introduces disulfide bonds to stabilize the generated peptide structures. We tested DiffPepBuilder on 30 experimentally verified strong peptide binders with available protein-peptide complex structures. DiffPepBuilder was able to effectively recall the native structures and sequences of the peptide ligands and to generate novel peptide binders with improved binding free energy. We subsequently conducted de novo generation case studies on three targets. In both the regeneration test and case studies, DiffPepBuilder outperformed AfDesign and RFdiffusion coupled with ProteinMPNN, in terms of sequence and structure recall, interface quality, and structural diversity. Molecular dynamics simulations confirmed that the introduction of disulfide bonds enhanced the structural rigidity and binding performance of the generated peptides. As a general peptide binder de novo design tool, DiffPepBuilder can be used to design peptide binders for given protein targets with three dimensional and binding site information. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.02360 [pdf, other]

FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction

Authors: Adamo Young, Fei Wang, David Wishart, Bo Wang, Hannes Röst, Russ Greiner

Abstract: The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rate… ▽ More The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted spectra. Unfortunately, many existing C2MS models suffer from problems with prediction resolution, scalability, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately predict high-resolution spectra. FraGNNet uses a structured latent space to provide insight into the underlying processes that define the spectrum. Our model achieves state-of-the-art performance in terms of prediction error, and surpasses existing C2MS models as a tool for retrieval-based MS2C. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 21 pages, 4 figures, 9 tables

arXiv:2402.13714 [pdf, other]

An Evaluation of Large Language Models in Bioinformatics Research

Authors: Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun

Abstract: Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide… ▽ More Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks. In addition, we provide a thorough analysis of their limitations in the context of complicated bioinformatics tasks. In conclusion, we believe that this work can provide new perspectives and motivate future research in the field of LLMs applications, AI for Science and bioinformatics. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2401.15047 [pdf]

Influence of Material Parameter Variability on the Predicted Coronary Artery Biomechanical Environment via Uncertainty Quantification

Authors: Caleb C. Berggren, David Jiang, Y. F. Jack Wang, Jake A. Bergquist, Lindsay C. Rupp, Zexin Liu, Rob S. MacLeod, Akil Narayan, Lucas H. Timmins

Abstract: Central to the clinical adoption of patient-specific modeling strategies is demonstrating that simulation results are reliable and safe. Simulation frameworks must be robust to uncertainty in model input(s), and levels of confidence should accompany results. In this study we applied a coupled uncertainty quantification-finite element (FE) framework to understand the impact of uncertainty in vascul… ▽ More Central to the clinical adoption of patient-specific modeling strategies is demonstrating that simulation results are reliable and safe. Simulation frameworks must be robust to uncertainty in model input(s), and levels of confidence should accompany results. In this study we applied a coupled uncertainty quantification-finite element (FE) framework to understand the impact of uncertainty in vascular material properties on variability in predicted stresses. Univariate probability distributions were fit to material parameters derived from layer-specific mechanical behavior testing of human coronary tissue. Parameters were assumed to be probabilistically independent, allowing for efficient parameter ensemble sampling. In an idealized coronary artery geometry, a forward FE model for each parameter ensemble was created to predict tissue stresses under physiologic loading. An emulator was constructed within the UncertainSCI software using polynomial chaos techniques, and statistics and sensitivities were directly computed. Results demonstrated that material parameter uncertainty propagates to variability in predicted stresses across the vessel wall, with the largest dispersions in stress within the adventitial layer. Variability in stress was most sensitive to uncertainties in the anisotropic component of the strain energy function. Unary and binary interactions within the adventitial layer were the main contributors to stress variance, and the leading factor in stress variability was uncertainty in the stress-like material parameter summarizing contribution of the embedded fibers to the overall artery stiffness. Results from a patient-specific coronary model confirmed many of these findings. Collectively, this highlights the impact of material property variation on predicted artery stresses and presents a pipeline to explore and characterize uncertainty in computational biomechanics. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: To appear: Biomechanics and Modeling in Mechanobiology

arXiv:2310.13913 [pdf, other]

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

Authors: Lihang Liu, Shanzhuo Zhang, Donglong He, Xianbin Ye, Jingbo Zhou, Xiaonan Zhang, Yaoyao Jiang, Weiming Diao, Hang Yin, Hua Chai, Fan Wang, Jingzhou He, Liang Zheng, Yonghui Li, Xiaomin Fang

Abstract: Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises conce… ▽ More Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises concerns regarding the generalizability of these deep learning-based methods due to the limited training data. In this work, we show that by pre-training on a large-scale docking conformation generated by traditional physics-based docking tools and then fine-tuning with a limited set of experimentally validated receptor-ligand complexes, we can obtain a protein-ligand structure prediction model with outstanding performance. Specifically, this process involved the generation of 100 million docking conformations for protein-ligand pairings, an endeavor consuming roughly 1 million CPU core days. The proposed model, HelixDock, aims to acquire the physical knowledge encapsulated by the physics-based docking tools during the pre-training phase. HelixDock has been rigorously benchmarked against both physics-based and deep learning-based baselines, demonstrating its exceptional precision and robust transferability in predicting binding confirmation. In addition, our investigation reveals the scaling laws governing pre-trained protein-ligand structure prediction models, indicating a consistent enhancement in performance with increases in model parameters and the volume of pre-training data. Moreover, we applied HelixDock to several drug discovery-related tasks to validate its practical utility. HelixDock demonstrates outstanding capabilities on both cross-docking and structure-based virtual screening benchmarks. △ Less

Submitted 22 May, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

arXiv:2306.16684 [pdf, other]

Decomposing spiking neural networks with Graphical Neural Activity Threads

Authors: Bradley H. Theilman, Felix Wang, Fred Rothganger, James B. Aimone

Abstract: A satisfactory understanding of information processing in spiking neural networks requires appropriate computational abstractions of neural activity. Traditionally, the neural population state vector has been the most common abstraction applied to spiking neural networks, but this requires artificially partitioning time into bins that are not obviously relevant to the network itself. We introduce… ▽ More A satisfactory understanding of information processing in spiking neural networks requires appropriate computational abstractions of neural activity. Traditionally, the neural population state vector has been the most common abstraction applied to spiking neural networks, but this requires artificially partitioning time into bins that are not obviously relevant to the network itself. We introduce a distinct set of techniques for analyzing spiking neural networks that decomposes neural activity into multiple, disjoint, parallel threads of activity. We construct these threads by estimating the degree of causal relatedness between pairs of spikes, then use these estimates to construct a directed acyclic graph that traces how the network activity evolves through individual spikes. We find that this graph of spiking activity naturally decomposes into disjoint connected components that overlap in space and time, which we call Graphical Neural Activity Threads (GNATs). We provide an efficient algorithm for finding analogous threads that reoccur in large spiking datasets, revealing that seemingly distinct spike trains are composed of similar underlying threads of activity, a hallmark of compositionality. The picture of spiking neural networks provided by our GNAT analysis points to new abstractions for spiking neural computation that are naturally adapted to the spatiotemporally distributed dynamics of spiking neural networks. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Report number: SAND2023-05685O

arXiv:2306.13957 [pdf, other]

DiffDTM: A conditional structure-free framework for bioactive molecules generation targeted for dual proteins

Authors: Lei Huang, Zheng Yuan, Huihui Yan, Rong Sheng, Linjing Liu, Fuzhou Wang, Weidun Xie, Nanjun Chen, Fei Huang, Songfang Huang, Ka-Chun Wong, Yaoyun Zhang

Abstract: Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free… ▽ More Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation to address the above issues. Specifically, DiffDTM receives protein sequences and molecular graphs as inputs instead of protein and molecular conformations and incorporates an information fusion module to achieve conditional generation in a one-shot manner. We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules targeting specific dual proteins, outperforming the state-of-the-art (SOTA) models in terms of multiple evaluation metrics. Furthermore, we utilized DiffDTM to generate molecules towards dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules, addressing the issues of requiring insufficient active molecule data for training as well as the need to retrain when encountering new targets. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2302.06089 [pdf, other]

Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading

Authors: Fei Kong, Xiyue Wang, Jinxi Xiang, Sen Yang, Xinran Wang, Meng Yue, Jun Zhang, Junhan Zhao, Xiao Han, Yuhan Dong, Biyue Zhu, Fang Wang, Yueping Liu

Abstract: Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data.… ▽ More Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data. This study introduces a federated attention-consistent learning (FACL) framework to address challenges associated with large-scale pathological images and data heterogeneity. FACL enhances model generalization by maximizing attention consistency between local clients and the server model. To ensure privacy and validate robustness, we incorporated differential privacy by introducing noise during parameter transfer. We assessed the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole-slide images of prostate cancer from multiple centers. In the diagnosis task, FACL achieved an area under the curve (AUC) of 0.9718, outperforming seven centers with an average AUC of 0.9499 when categories are relatively balanced. For the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards. △ Less

Submitted 28 March, 2024; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: 14 pages

arXiv:2302.00104 [pdf]

NetMoST: A network-based machine learning approach for subtyping schizophrenia using polygenic SNP allele biomarkers

Authors: Xinru Wei, Shuai Dong, Zhao Su, Lili Tang, Pengfei Zhao, Chunyu Pan, Fei Wang, Yanqing Tang, Weixiong Zhang, Xizhe Zhang

Abstract: Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiat… ▽ More Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiatric disorders. NetMoST identifies polygenic risk SNP-allele modules from genome-wide genotyping data as polygenic haplotype biomarkers (PHBs) for disease subtyping. We applied netMoST to subtype a cohort of schizophrenia subjects into three distinct biotypes with differentiable genetic, neuroimaging and functional characteristics. The PHBs of the first biotype (36.9% of all patients) were related to neurodevelopment and cognition, the PHBs of the second biotype (28.4%) were enriched for neuroimmune functions, and the PHBs of the third biotype (34.7%) were associated with the transport of calcium ions and neurotransmitters. Neuroimaging patterns provided additional support to the new biotypes, with unique regional homogeneity (ReHo) patterns observed in the brains of each biotype compared with healthy controls. Our findings demonstrated netMoST's capability for uncovering novel biotypes of complex diseases such as schizophrenia. The results also showed the power of exploring polygenic allelic patterns that transcend the conventional GWAS approaches. △ Less

Submitted 10 March, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: 21 pages,4 figures

arXiv:2301.09245 [pdf, other]

Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks

Authors: Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang

Abstract: Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can… ▽ More Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI. △ Less

Submitted 10 March, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

arXiv:2301.00815 [pdf, other]

NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical Development Patterns of Preterm Infants

Authors: Chenyu Xue, Fan Wang, Yuanzhuo Zhu, Hui Li, Deyu Meng, Dinggang Shen, Chunfeng Lian

Abstract: Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and (even more importantly) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations bo… ▽ More Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and (even more importantly) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies. △ Less

Submitted 25 May, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

Comments: 12 page 4 fig and 2 table

arXiv:2211.16473 [pdf]

doi 10.1177/0962280220909969

Semiparametric integrative interaction analysis for non-small-cell lung cancer

Authors: Yang Li, Fan Wang, Rong Li, Yifan Sun

Abstract: In the genomic analysis, it is significant while challenging to identify markers associated with cancer outcomes or phenotypes. Based on the biological mechanisms of cancers and the characteristics of datasets as well, this paper proposes a novel integrative interaction approach under the semiparametric model, in which the genetic factors and environmental factors are included as the parametric an… ▽ More In the genomic analysis, it is significant while challenging to identify markers associated with cancer outcomes or phenotypes. Based on the biological mechanisms of cancers and the characteristics of datasets as well, this paper proposes a novel integrative interaction approach under the semiparametric model, in which the genetic factors and environmental factors are included as the parametric and nonparametric components, respectively. The goal of this approach is to identify the genetic factors and gene-gene interactions associated with cancer outcomes, and meanwhile, estimate the nonlinear effects of environmental factors. The proposed approach is based on the threshold gradient directed regularization (TGDR) technique. Simulation studies indicate that the proposed approach outperforms in the identification of main effects and interactions, and has favorable estimation and prediction accuracy compared with the alternative methods. The analysis of non-small-cell lung carcinomas (NSCLC) datasets from The Cancer Genome Atlas (TCGA) are conducted, showing that the proposed approach can identify markers with important implications and have favorable performance in prediction accuracy, identification stability, and computation cost. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 16 pages, 4 figures

Journal ref: Statistical Methods in Medical Research, 29: 2865- 2880, 2020

arXiv:2210.05677 [pdf]

Application of Deep Learning on Single-Cell RNA-sequencing Data Analysis: A Review

Authors: Matthew Brendel, Chang Su, Zilong Bai, Hao Zhang, Olivier Elemento, Fei Wang

Abstract: Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during development of complex organisms and improved our understanding o… ▽ More Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during development of complex organisms and improved our understanding of disease states, such as cancer, diabetes, and COVID, among others. Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative, compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analysis tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep algorithms for scRNA-seq data analysis. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2209.13492 [pdf, other]

Unraveling Key Elements Underlying Molecular Property Prediction: A Systematic Study

Authors: Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, Fusheng Wang

Abstract: Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various… ▽ More Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4,200 models on SMILES sequences and 8,400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel. △ Less

Submitted 2 September, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2208.05863 [pdf, other]

GEM-2: Next Generation Molecular Property Prediction Network by Modeling Full-range Many-body Interactions

Authors: Lihang Liu, Donglong He, Xiaomin Fang, Shanzhuo Zhang, Fan Wang, Jingzhou He, Hua Wu

Abstract: Molecular property prediction is a fundamental task in the drug and material industries. Physically, the properties of a molecule are determined by its own electronic structure, which is a quantum many-body system and can be exactly described by the Schr"odinger equation. Full-range many-body interactions between electrons have been proven effective in obtaining an accurate solution of the Schr"od… ▽ More Molecular property prediction is a fundamental task in the drug and material industries. Physically, the properties of a molecule are determined by its own electronic structure, which is a quantum many-body system and can be exactly described by the Schr"odinger equation. Full-range many-body interactions between electrons have been proven effective in obtaining an accurate solution of the Schr"odinger equation by classical computational chemistry methods, although modeling such interactions consumes an expensive computational cost. Meanwhile, deep learning methods have also demonstrated their competence in molecular property prediction tasks. Inspired by the classical computational chemistry methods, we design a novel method, namely GEM-2, which comprehensively considers full-range many-body interactions in molecules. Multiple tracks are utilized to model the full-range interactions between the many-bodies with different orders, and a novel axial attention mechanism is designed to approximate the full-range interaction modeling with much lower computational cost. Extensive experiments demonstrate the overwhelming superiority of GEM-2 over multiple baseline methods in quantum chemistry and drug discovery tasks. The ablation studies also verify the effectiveness of the full-range many-body interactions. △ Less

Submitted 20 October, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

arXiv:2207.13921 [pdf, other]

doi 10.1038/s42256-023-00721-6

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Authors: Xiaomin Fang, Fan Wang, Lihang Liu, Jingzhou He, Dayong Lin, Yingfei Xiang, Xiaonan Zhang, Hua Wu, Hui Li, Le Song

Abstract: AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to ex… ▽ More AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence. HelixFold-Single is validated in datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on the targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions. The code of HelixFold-Single is available at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold-single, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein-single/forecast. △ Less

Submitted 21 February, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

Journal ref: Nature Machine Intelligence, 2023

arXiv:2207.05477 [pdf, other]

HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle

Authors: Guoxia Wang, Xiaomin Fang, Zhihua Wu, Yiqun Liu, Yang Xue, Yingfei Xiang, Dianhai Yu, Fan Wang, Yanjun Ma

Abstract: Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and… ▽ More Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented with Jax) and OpenFold (implemented with PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold's code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast. △ Less

Submitted 13 July, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

arXiv:2207.04457 [pdf, other]

TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Authors: Jie Gao, Jing Hu, Wanqing Sun, Yili Shen, Xiaonan Zhang, Xiaomin Fang, Fan Wang, Guodong Zhao

Abstract: Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical applicatio… ▽ More Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical application. Recent research shows that deep learning is a promising approach to build drug response models by learning alignment patterns between drugs and samples. However, existing studies employed the simple feature fusion strategy and only considered the drug features as a whole representation while ignoring the substructure information that may play a vital role when aligning drugs and genes. Hereby in this paper, we propose TCR (Transformer based network for Cancer drug Response) to predict anti-cancer drug response. By utilizing an attention mechanism, TCR is able to learn the interactions between drug atom/sub-structure and molecular signatures efficiently in our study. Furthermore, a dual loss function and cross sampling strategy were designed to improve the prediction power of TCR. We show that TCR outperformed all other methods under various data splitting strategies on all evaluation matrices (some with significant improvement). Extensive experiments demonstrate that TCR shows significantly improved generalization ability on independent in-vitro experiments and in-vivo real patient data. Our study highlights the prediction power of TCR and its potential value for cancer drug repurpose and precision oncology treatment. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: 11 pages,7 figures

arXiv:2205.08055 [pdf]

doi 10.1093/bioinformatics/btac342

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Authors: Shanzhuo Zhang, Zhiyuan Yan, Yueyang Huang, Lihang Liu, Donglong He, Wei Wang, Xiaomin Fang, Xiaonan Zhang, Fan Wang, Hua Wu, Haifeng Wang

Abstract: Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET sys… ▽ More Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Journal ref: Bioinformatics, 2022

arXiv:2204.04213 [pdf, other]

Structure-aware Protein Self-supervised Learning

Authors: Can Chen, Jingbo Zhou, Fan Wang, Xue Liu, Dejing Dou

Abstract: Protein representation learning methods have shown great potential to yield useful representation for many downstream tasks, especially on protein classification. Moreover, a few recent studies have shown great promise in addressing insufficient labels of proteins with self-supervised learning methods. However, existing protein language models are usually pretrained on protein sequences without co… ▽ More Protein representation learning methods have shown great potential to yield useful representation for many downstream tasks, especially on protein classification. Moreover, a few recent studies have shown great promise in addressing insufficient labels of proteins with self-supervised learning methods. However, existing protein language models are usually pretrained on protein sequences without considering the important protein structural information. To this end, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme. Experiments on several supervised downstream tasks verify the effectiveness of our proposed method.The code of the proposed method is available in \url{https://github.com/GGchen1997/STEPS_Bioinformatics}. △ Less

Submitted 8 April, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Accepted by Bioinformatics; 7 pages 4 figures

arXiv:2203.08820 [pdf, other]

DePS: An improved deep learning model for de novo peptide sequencing

Authors: Cheng Ge, Yi Lu, Jia Qu, Liangxu Xie, Feng Wang, Hong Zhang, Ren Kong, Shan Chang

Abstract: De novo peptide sequencing from mass spectrometry data is an important method for protein identification. Recently, various deep learning approaches were applied for de novo peptide sequencing and DeepNovoV2 is one of the represetative models. In this study, we proposed an enhanced model, DePS, which can improve the accuracy of de novo peptide sequencing even with missing signal peaks or large num… ▽ More De novo peptide sequencing from mass spectrometry data is an important method for protein identification. Recently, various deep learning approaches were applied for de novo peptide sequencing and DeepNovoV2 is one of the represetative models. In this study, we proposed an enhanced model, DePS, which can improve the accuracy of de novo peptide sequencing even with missing signal peaks or large number of noisy peaks in tandem mass spectrometry data. It is showed that, for the same test set of DeepNovoV2, the DePS model achieved excellent results of 74.22%, 74.21% and 41.68% for amino acid recall, amino acid precision and peptide recall respectively. Furthermore, the results suggested that DePS outperforms DeepNovoV2 on the cross species dataset. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: 10 pages, 7 figures

arXiv:2112.04814 [pdf, other]

Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction

Authors: Yang Xue, Zijing Liu, Xiaomin Fang, Fan Wang

Abstract: Protein-protein interactions (PPIs) are essentials for many biological processes where two or more proteins physically bind together to achieve their functions. Modeling PPIs is useful for many biomedical applications, such as vaccine design, antibody therapeutics, and peptide drug discovery. Pre-training a protein model to learn effective representation is critical for PPIs. Most pre-training mod… ▽ More Protein-protein interactions (PPIs) are essentials for many biological processes where two or more proteins physically bind together to achieve their functions. Modeling PPIs is useful for many biomedical applications, such as vaccine design, antibody therapeutics, and peptide drug discovery. Pre-training a protein model to learn effective representation is critical for PPIs. Most pre-training models for PPIs are sequence-based, which naively adopt the language models used in natural language processing to amino acid sequences. More advanced works utilize the structure-aware pre-training technique, taking advantage of the contact maps of known protein structures. However, neither sequences nor contact maps can fully characterize structures and functions of the proteins, which are closely related to the PPI problem. Inspired by this insight, we propose a multimodal protein pre-training model with three modalities: sequence, structure, and function (S2F). Notably, instead of using contact maps to learn the amino acid-level rigid structures, we encode the structure feature with the topology complex of point clouds of heavy atoms. It allows our model to learn structural information about not only the backbones but also the side chains. Moreover, our model incorporates the knowledge from the functional description of proteins extracted from literature or manual annotations. Our experiments show that the S2F learns protein embeddings that achieve good performances on a variety of PPIs tasks, including cross-species PPI, antibody-antigen affinity prediction, antibody neutralization prediction for SARS-CoV-2, and mutation-driven binding affinity change prediction. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: MLCB 2021 Spotlight

arXiv:2112.00905 [pdf, other]

doi 10.1109/BIBM55620.2022.9995561

HelixMO: Sample-Efficient Molecular Optimization in Scene-Sensitive Latent Space

Authors: Zhiyuan Chen, Xiaomin Fang, Zixu Hua, Yueyang Huang, Fan Wang, Hua Wu

Abstract: Efficient exploration of the chemical space to search the candidate drugs that satisfy various constraints is a fundamental task of drug discovery. Advanced deep generative methods attempt to optimize the molecules in the compact latent space instead of the discrete original space, but the mapping between the original and latent spaces is always kept unchanged during the entire optimization proces… ▽ More Efficient exploration of the chemical space to search the candidate drugs that satisfy various constraints is a fundamental task of drug discovery. Advanced deep generative methods attempt to optimize the molecules in the compact latent space instead of the discrete original space, but the mapping between the original and latent spaces is always kept unchanged during the entire optimization process. The unchanged mapping makes those methods challenging to fast adapt to various optimization scenes and leads to the great demand for assessed molecules (samples) to provide optimization direction, which is a considerable expense for drug discovery. To this end, we design a sample-efficient molecular generative method, HelixMO, which explores the scene-sensitive latent space to promote sample efficiency. The scene-sensitive latent space focuses more on modeling the promising molecules by dynamically adjusting the space mapping by leveraging the correlations between the general and scene-specific characteristics during the optimization process. Extensive experiments demonstrate that HelixMO can achieve competitive performance with only a few assessed samples on four molecular optimization scenes. Ablation studies verify the positive impact of the scene-specific latent space, which is capable of identifying the critical characteristics of the promising molecules. We also deployed HelixMO on the website PaddleHelix (https://paddlehelix.baidu.com/app/drug/drugdesign/forecast) to provide drug design service. △ Less

Submitted 16 November, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

Journal ref: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

arXiv:2111.09502 [pdf, other]

Docking-based Virtual Screening with Multi-Task Learning

Authors: Zijing Liu, Xianbin Ye, Xiaomin Fang, Fan Wang, Hua Wu, Haifeng Wang

Abstract: Machine learning shows great potential in virtual screening for drug discovery. Current efforts on accelerating docking-based virtual screening do not consider using existing data of other previously developed targets. To make use of the knowledge of the other targets and take advantage of the existing data, in this work, we apply multi-task learning to the problem of docking-based virtual screeni… ▽ More Machine learning shows great potential in virtual screening for drug discovery. Current efforts on accelerating docking-based virtual screening do not consider using existing data of other previously developed targets. To make use of the knowledge of the other targets and take advantage of the existing data, in this work, we apply multi-task learning to the problem of docking-based virtual screening. With two large docking datasets, the results of extensive experiments show that multi-task learning can achieve better performances on docking score prediction. By learning knowledge across multiple targets, the model trained by multi-task learning shows a better ability to adapt to a new target. Additional empirical study shows that other problems in drug discovery, such as the experimental drug-target affinity prediction, may also benefit from multi-task learning. Our results demonstrate that multi-task learning is a promising machine learning approach for docking-based virtual screening and accelerating the process of drug discovery. △ Less

Submitted 12 December, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: accepted by IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2021)

arXiv:2107.10670 [pdf, other]

Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity

Authors: Shuangli Li, Jingbo Zhou, Tong Xu, Liang Huang, Fan Wang, Haoyi Xiong, Weili Huang, Dejing Dou, Hui Xiong

Abstract: Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural inform… ▽ More Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural information is not fully utilized. The essential long-range interactions among atoms are also neglected in GNN models. To this end, we propose a structure-aware interactive graph neural network (SIGN) which consists of two components: polar-inspired graph attention layers (PGAL) and pairwise interactive pooling (PiPool). Specifically, PGAL iteratively performs the node-edge aggregation process to update embeddings of nodes and edges while preserving the distance and angle information among atoms. Then, PiPool is adopted to gather interactive edges with a subsequent reconstruction loss to reflect the global interactions. Exhaustive experimental study on two benchmarks verifies the superiority of SIGN. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: 11 pages, 8 figures, Accepted by KDD 2021 (Research Track)

arXiv:2106.06130 [pdf, other]

doi 10.1038/s42256-021-00438-4

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Authors: Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, Haifeng Wang

Abstract: Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervise… ▽ More Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of 8.8% on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method. △ Less

Submitted 22 February, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Nature Machine Intelligence, 2022

Journal ref: Nature Machine Intelligence, 2022

arXiv:2105.06049 [pdf, other]

TopoTxR: A Topological Biomarker for Predicting Treatment Response in Breast Cancer

Authors: Fan Wang, Saarthak Kapse, Steven Liu, Prateek Prasanna, Chao Chen

Abstract: Characterization of breast parenchyma on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Current quantitative approaches, including radiomics and deep learning models, do not explicitly capture the complex and subtle parenchymal structures, such as fibroglandular tissue. In this paper, we propose a novel… ▽ More Characterization of breast parenchyma on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Current quantitative approaches, including radiomics and deep learning models, do not explicitly capture the complex and subtle parenchymal structures, such as fibroglandular tissue. In this paper, we propose a novel method to direct a neural network's attention to a dedicated set of voxels surrounding biologically relevant tissue structures. By extracting multi-dimensional topological structures with high saliency, we build a topology-derived biomarker, TopoTxR. We demonstrate the efficacy of TopoTxR in predicting response to neoadjuvant chemotherapy in breast cancer. Our qualitative and quantitative results suggest differential topological behavior of breast tissue on treatment-naïve imaging, in patients who respond favorably to therapy versus those who do not. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 12 pages, 5 figures, 2 tables, accepted to International Conference on Information Processing in Medical Imaging (IPMI) 2021

arXiv:2012.09624 [pdf, other]

Distance-aware Molecule Graph Attention Network for Drug-Target Binding Affinity Prediction

Authors: Jingbo Zhou, Shuangli Li, Liang Huang, Haoyi Xiong, Fan Wang, Tong Xu, Hui Xiong, Dejing Dou

Abstract: Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode t… ▽ More Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode the topological graph structure of drugs and proteins without considering the relative spatial information among their atoms. Whereas, different from other graph datasets such as social networks and commonsense knowledge graphs, the relative spatial position and chemical bonds among atoms have significant impacts on the binding affinity. To this end, in this paper, we propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-target binding affinity prediction. As a dedicated solution, we first propose a position encoding mechanism to integrate the topological structure and spatial position information into the constructed pocket-ligand graph. Moreover, we propose a novel edge-node hierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation. The hierarchical attentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced information with the capability of discriminating multiple spatial relations among atoms. Finally, we conduct extensive experiments on two standard datasets to demonstrate the effectiveness of S-MAN. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:2011.07511 [pdf]

Wide-field Decodable Orthogonal Fingerprints of Single Nanoparticles Unlock Multiplexed Digital Assays

Authors: Jiayan Liao, Jiajia Zhou, Yiliao Song, Baolei Liu, Yinghui Chen, Fan Wang, Chaohao Chen, Jun Lin, Xueyuan Chen, Jie Lu, Dayong Jin

Abstract: The control in optical uniformity of single nanoparticles and tuning their diversity in orthogonal dimensions, dot to dot, holds the key to unlock nanoscience and applications. Here we report that the time-domain emissive profile from single upconversion nanoparticle, including the rising, decay and peak moment of the excited state population (T2 profile), can be arbitrarily tuned by upconversion… ▽ More The control in optical uniformity of single nanoparticles and tuning their diversity in orthogonal dimensions, dot to dot, holds the key to unlock nanoscience and applications. Here we report that the time-domain emissive profile from single upconversion nanoparticle, including the rising, decay and peak moment of the excited state population (T2 profile), can be arbitrarily tuned by upconversion schemes, including interfacial energy migration, concentration dependency, energy transfer, and isolation of surface quenchers. This allows us to significantly increase the coding capacity at the nanoscale. We further implement both time-resolved wide-field imaging and deep-learning techniques to decode these fingerprints, showing high accuracies at high throughput. These high-dimensional optical fingerprints provide a new horizon for applications spanning from sub-diffraction-limit data storage, security inks, to high-throughput single-molecule digital assays and super-resolution imaging. △ Less

Submitted 15 November, 2020; originally announced November 2020.

arXiv:2004.04768 [pdf, other]

Towards Better Opioid Antagonists Using Deep Reinforcement Learning

Authors: Jianyuan Deng, Zhibo Yang, Yao Li, Dimitris Samaras, Fusheng Wang

Abstract: Naloxone, an opioid antagonist, has been widely used to save lives from opioid overdose, a leading cause for death in the opioid epidemic. However, naloxone has short brain retention ability, which limits its therapeutic efficacy. Developing better opioid antagonists is critical in combating the opioid epidemic.Instead of exhaustively searching in a huge chemical space for better opioid antagonist… ▽ More Naloxone, an opioid antagonist, has been widely used to save lives from opioid overdose, a leading cause for death in the opioid epidemic. However, naloxone has short brain retention ability, which limits its therapeutic efficacy. Developing better opioid antagonists is critical in combating the opioid epidemic.Instead of exhaustively searching in a huge chemical space for better opioid antagonists, we adopt reinforcement learning which allows efficient gradient-based search towards molecules with desired physicochemical and/or biological properties. Specifically, we implement a deep reinforcement learning framework to discover potential lead compounds as better opioid antagonists with enhanced brain retention ability. A customized multi-objective reward function is designed to bias the generation towards molecules with both sufficient opioid antagonistic effect and enhanced brain retention ability. Thorough evaluation demonstrates that with this framework, we are able to identify valid, novel and feasible molecules with multiple desired properties, which has high potential in drug discovery. △ Less

Submitted 26 March, 2020; originally announced April 2020.

Comments: 10 pages, 7 figures

arXiv:2004.00825 [pdf, other]

doi 10.1093/nsr/nwaa229

Efficient network immunization under limited knowledge

Authors: Yangyang Liu, Hillel Sanhedrai, GaoGao Dong, Louis M. Shekhtman, Fan Wang, Sergey V. Buldyrev, Shlomo Havlin

Abstract: Targeted immunization or attacks of large-scale networks has attracted significant attention by the scientific community. However, in real-world scenarios, knowledge and observations of the network may be limited thereby precluding a full assessment of the optimal nodes to immunize (or remove) in order to avoid epidemic spreading such as that of current COVID-19 epidemic. Here, we study a novel im… ▽ More Targeted immunization or attacks of large-scale networks has attracted significant attention by the scientific community. However, in real-world scenarios, knowledge and observations of the network may be limited thereby precluding a full assessment of the optimal nodes to immunize (or remove) in order to avoid epidemic spreading such as that of current COVID-19 epidemic. Here, we study a novel immunization strategy where only $n$ nodes are observed at a time and the most central between these $n$ nodes is immunized (or attacked). This process is continued repeatedly until $1-p$ fraction of nodes are immunized (or attacked). We develop an analytical framework for this approach and determine the critical percolation threshold $p_c$ and the size of the giant component $P_{\infty}$ for networks with arbitrary degree distributions $P(k)$. In the limit of $n\to\infty$ we recover prior work on targeted attack, whereas for $n=1$ we recover the known case of random failure. Between these two extremes, we observe that as $n$ increases, $p_c$ increases quickly towards its optimal value under targeted immunization (attack) with complete information. In particular, we find a new scaling relationship between $|p_c(\infty)-p_c(n)|$ and $n$ as $|p_c(\infty)-p_c(n)|\sim n^{-1}\exp(-αn)$. For Scale-free (SF) networks, where $P(k)\sim k^{-γ}, 2<γ<3$, we find that $p_c$ has a transition from zero to non-zero when $n$ increases from $n=1$ to order of $\log N$ ($N$ is the size of network). Thus, for SF networks, knowledge of order of $\log N$ nodes and immunizing them can reduce dramatically an epidemics. △ Less

Submitted 2 April, 2020; originally announced April 2020.

arXiv:2003.05003 [pdf]

doi 10.1136/bmjopen-2020-043863

Impact of Temperature and Relative Humidity on the Transmission of COVID-19: A Modeling Study in China and the United States

Authors: Jingyuan Wang, Ke Tang, Kai Feng, Xin Li, Weifeng Lv, Kun Chen, Fei Wang

Abstract: Objectives: We aim to assess the impact of temperature and relative humidity on the transmission of COVID-19 across communities after accounting for community-level factors such as demographics, socioeconomic status, and human mobility status. Design: A retrospective cross-sectional regression analysis via the Fama-MacBeth procedure is adopted. Setting: We use the data for COVID-19 daily symptom-o… ▽ More Objectives: We aim to assess the impact of temperature and relative humidity on the transmission of COVID-19 across communities after accounting for community-level factors such as demographics, socioeconomic status, and human mobility status. Design: A retrospective cross-sectional regression analysis via the Fama-MacBeth procedure is adopted. Setting: We use the data for COVID-19 daily symptom-onset cases for 100 Chinese cities and COVID-19 daily confirmed cases for 1,005 U.S. counties. Participants: A total of 69,498 cases in China and 740,843 cases in the U.S. are used for calculating the effective reproductive numbers. Primary outcome measures: Regression analysis of the impact of temperature and relative humidity on the effective reproductive number (R value). Results: Statistically significant negative correlations are found between temperature/relative humidity and the effective reproductive number (R value) in both China and the U.S. Conclusions: Higher temperature and higher relative humidity potentially suppress the transmission of COVID-19. Specifically, an increase in temperature by 1 degree Celsius is associated with a reduction in the R value of COVID-19 by 0.026 (95% CI [-0.0395,-0.0125]) in China and by 0.020 (95% CI [-0.0311, -0.0096]) in the U.S.; an increase in relative humidity by 1% is associated with a reduction in the R value by 0.0076 (95% CI [-0.0108,-0.0045]) in China and by 0.0080 (95% CI [-0.0150,-0.0010]) in the U.S. Therefore, the potential impact of temperature/relative humidity on the effective reproductive number alone is not strong enough to stop the pandemic. △ Less

Submitted 30 May, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

Journal ref: BMJ Open 2021;11:e043863

arXiv:2003.00163 [pdf]

doi 10.1093/bioinformatics/btaa645

COVID-19 Docking Server: A meta server for docking small molecules, peptides and antibodies against potential targets of COVID-19

Authors: Ren Kong, Guangbo Yang, Rui Xue, Ming Liu, Feng Wang, Jianping Hu, Xiaoqiang Guo, Shan Chang

Abstract: Motivation: The coronavirus disease 2019 (COVID-19) caused by a new type of coronavirus has been emerging from China and led to thousands of death globally since December 2019. Despite many groups have engaged in studying the newly emerged virus and searching for the treatment of COVID-19, the understanding of the COVID-19 target-ligand interactions represents a key chal-lenge. Herein, we introduc… ▽ More Motivation: The coronavirus disease 2019 (COVID-19) caused by a new type of coronavirus has been emerging from China and led to thousands of death globally since December 2019. Despite many groups have engaged in studying the newly emerged virus and searching for the treatment of COVID-19, the understanding of the COVID-19 target-ligand interactions represents a key chal-lenge. Herein, we introduce COVID-19 Docking Server, a web server that predicts the binding modes between COVID-19 targets and the ligands including small molecules, peptides and anti-bodies. Results: Structures of proteins involved in the virus life cycle were collected or constructed based on the homologs of coronavirus, and prepared ready for docking. The meta platform provides a free and interactive tool for the prediction of COVID-19 target-ligand interactions and following drug discovery for COVID-19. △ Less

Submitted 7 August, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

Journal ref: Bioinformatics, 2020, btaa645

arXiv:1912.02964 [pdf, other]

An Informatics-based Approach to Identify Key Pharmacological Components in Drug-Drug Interactions

Authors: Jianyuan Deng, Fusheng Wang

Abstract: Drug-drug interactions (DDI) can cause severe adverse drug reactions and pose a major challenge to medication therapy. Recently, informatics-based approaches are emerging for DDI studies. In this paper, we aim to identify key pharmacological components in DDI based on large-scale data from DrugBank, a comprehensive DDI database. With pharmacological components as features, logistic regression is u… ▽ More Drug-drug interactions (DDI) can cause severe adverse drug reactions and pose a major challenge to medication therapy. Recently, informatics-based approaches are emerging for DDI studies. In this paper, we aim to identify key pharmacological components in DDI based on large-scale data from DrugBank, a comprehensive DDI database. With pharmacological components as features, logistic regression is used to perform DDI classification with a focus on searching for most predictive features, a process of identifying key pharmacological components. Using univariate feature selection with chi-squared statistic as the ranking criteria, our study reveals that top 10% features can achieve comparable classification performance compared to that using all features. The top 10% features are identified to be key pharmacological components. Furthermore, their importance is quantified by feature coefficients in the classifier, which measures the DDI potential and provides a novel perspective to evaluate pharmacological components. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: Accepted to AMIA 2020 Informatics Summit

arXiv:1805.05008 [pdf]

Integrating Hypertension Phenotype and Genotype with Hybrid Non-negative Matrix Factorization

Authors: Yuan Luo, Chengsheng Mao, Yiben Yang, Fei Wang, Faraz S. Ahmad, Donna Arnett, Marguerite R. Irvin, Sanjiv J. Shah

Abstract: Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements so that patients in different subtypes share similar pathophysiologic mechanisms and respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable… ▽ More Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements so that patients in different subtypes share similar pathophysiologic mechanisms and respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification by introducing Hybrid Non-negative Matrix Factorization (HNMF) on phenotype and genotype matrices. HNMF simultaneously approximates the phenotypic and genetic matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On real-world clinical dataset, we used the patient factor matrix as features to predict main cardiac mechanistic outcomes. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities. △ Less

Submitted 18 May, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

Comments: fixed some presentation errors

arXiv:1711.05779 [pdf]

Large-scale Analysis of Opioid Poisoning Related Hospital Visits in New York State

Authors: Xin Chen, Yu Wang, Xiaxia Yu, Elinor Schoenfeld, Mary Saltz, Joel Saltz, Fusheng Wang

Abstract: Opioid related deaths are increasing dramatically in recent years, and opioid epidemic is worsening in the United States. Combating opioid epidemic becomes a high priority for both the U.S. government and local governments such as New York State. Analyzing patient level opioid related hospital visits provides a data driven approach to discover both spatial and temporal patterns and identity potent… ▽ More Opioid related deaths are increasing dramatically in recent years, and opioid epidemic is worsening in the United States. Combating opioid epidemic becomes a high priority for both the U.S. government and local governments such as New York State. Analyzing patient level opioid related hospital visits provides a data driven approach to discover both spatial and temporal patterns and identity potential causes of opioid related deaths, which provides essential knowledge for governments on decision making. In this paper, we analyzed opioid poisoning related hospital visits using New York State SPARCS data, which provides diagnoses of patients in hospital visits. We identified all patients with primary diagnosis as opioid poisoning from 2010-2014 for our main studies, and from 2003-2014 for temporal trend studies. We performed demographical based studies, and summarized the historical trends of opioid poisoning. We used frequent item mining to find co-occurrences of diagnoses for possible causes of poisoning or effects from poisoning. We provided zip code level spatial analysis to detect local spatial clusters, and studied potential correlations between opioid poisoning and demographic and social-economic factors. △ Less

Submitted 7 May, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

Journal ref: AMIA Annu Symp Proc. 2018;2017:545-554

arXiv:1707.05019 [pdf]

doi 10.1016/j.bpj.2018.03.027

All-atom simulations reveal how single point mutations promote serpin misfolding

Authors: Fang Wang, Simone Orioli, Alan Ianeselli, Giovanni Spagnolli, Silvio a Beccara, Anne Gershenson, Pietro Faccioli, Patrick L. Wintrode

Abstract: Protein misfolding is implicated in many diseases, including the serpinopathies. For the canonical inhibitory serpin α1-antitrypsin (A1AT), mutations can result in protein deficiencies leading to lung disease, and misfolded mutants can accumulate in hepatocytes leading to liver disease. Using all-atom simulations based on the recently developed Bias Functional algorithm we elucidate how wild-type… ▽ More Protein misfolding is implicated in many diseases, including the serpinopathies. For the canonical inhibitory serpin α1-antitrypsin (A1AT), mutations can result in protein deficiencies leading to lung disease, and misfolded mutants can accumulate in hepatocytes leading to liver disease. Using all-atom simulations based on the recently developed Bias Functional algorithm we elucidate how wild-type A1AT folds and how the disease-associated S (Glu264Val) and Z (Glu342Lys) mutations lead to misfolding. The deleterious Z mutation disrupts folding at an early stage, while the relatively benign S mutant shows late stage minor misfolding. A number of suppressor mutations ameliorate the effects of the Z mutation and simulations on these mutants help to elucidate the relative roles of steric clashes and electrostatic interactions in Z misfolding. These results demonstrate a striking correlation between atomistic events and disease severity and shine light on the mechanisms driving chains away from their correct folding routes. △ Less

Submitted 18 June, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

Comments: Final version. Supplementary Information included

Journal ref: Biophys. J. 114, 2083 (2018)

arXiv:1704.00793 [pdf, other]

Seeds Cleansing CNMF for Spatiotemporal Neural Signals Extraction of Miniscope Imaging Data

Authors: Jinghao Lu, Chunyuan Li, Fan Wang

Abstract: Miniscope calcium imaging is increasingly being used to monitor large populations of neuronal activities in freely behaving animals. However, due to the high background and low signal-to-noise ratio of the single-photon based imaging used in this technique, extraction of neural signals from the large numbers of imaged cells automatically has remained challenging. Here we describe a highly accurate… ▽ More Miniscope calcium imaging is increasingly being used to monitor large populations of neuronal activities in freely behaving animals. However, due to the high background and low signal-to-noise ratio of the single-photon based imaging used in this technique, extraction of neural signals from the large numbers of imaged cells automatically has remained challenging. Here we describe a highly accurate framework for automatically identifying activated neurons and extracting calcium signals from the miniscope imaging data, seeds cleansing Constrained Nonnegative Matrix Factorization (sc-CNMF). This sc-CNMF extends the conventional CNMF with two new modules: i) a neural enhancing module to overcome miniscope-specific limitations, and ii) a seeds cleansing module combining LSTM to rigorously select and cleanse the set of seeds for detecting regions-of-interest. Our sc-CNMF yields highly stable and superior performance in analyzing miniscope calcium imaging data compared to existing methods. △ Less

Submitted 3 April, 2017; originally announced April 2017.

Comments: 14 pages, 11 figures

arXiv:1605.01838 [pdf, other]

DeepPicker: a Deep Learning Approach for Fully Automated Particle Picking in Cryo-EM

Authors: Feng Wang, Huichao Gong, Gaochao liu, Meijing Li, Chuangye Yan, Tian Xia, Xueming Li, Jianyang Zeng

Abstract: Particle picking is a time-consuming step in single-particle analysis and often requires significant interventions from users, which has become a bottleneck for future automated electron cryo-microscopy (cryo-EM). Here we report a deep learning framework, called DeepPicker, to address this problem and fill the current gaps toward a fully automated cryo-EM pipeline. DeepPicker employs a novel cross… ▽ More Particle picking is a time-consuming step in single-particle analysis and often requires significant interventions from users, which has become a bottleneck for future automated electron cryo-microscopy (cryo-EM). Here we report a deep learning framework, called DeepPicker, to address this problem and fill the current gaps toward a fully automated cryo-EM pipeline. DeepPicker employs a novel cross-molecule training strategy to capture common features of particles from previously-analyzed micrographs, and thus does not require any human intervention during particle picking. Tests on the recently-published cryo-EM data of three complexes have demonstrated that our deep learning based scheme can successfully accomplish the human-level particle picking process and identify a sufficient number of particles that are comparable to those manually by human experts. These results indicate that DeepPicker can provide a practically useful tool to significantly reduce the time and manual effort spent in single-particle analysis and thus greatly facilitate high-resolution cryo-EM structure determination. △ Less

Submitted 6 May, 2016; originally announced May 2016.

arXiv:1503.00051 [pdf]

doi 10.1080/07391102.2015.1052849

Molecular Dynamics Studies on the Buffalo Prion Protein

Authors: Jiapu Zhang, Feng Wang, Subhojyoti Chatterjee

Abstract: It was reported that buffalo is a low susceptibility species resisting to prion diseases, which are invariably fatal and highly infectious neurodegenerative diseases that affect a wide variety of species. In molecular structures, TSE neurodegenerative diseases are caused by the conversion from a soluble normal cellular prion protein, predominantly with alpha-helices, into insoluble abnormally fold… ▽ More It was reported that buffalo is a low susceptibility species resisting to prion diseases, which are invariably fatal and highly infectious neurodegenerative diseases that affect a wide variety of species. In molecular structures, TSE neurodegenerative diseases are caused by the conversion from a soluble normal cellular prion protein, predominantly with alpha-helices, into insoluble abnormally folded infectious prions, rich in beta-sheets. This paper studies the molecular structure and structural dynamics of buffalo prion protein, in order to reveal the reason why buffalo are resistant to prion diseases. We first did molecular modeling of a homology structure constructed by one mutation at residue 143 from the Nuclear Magnetic Resonance structure of bovine and cattle PrP(124-227); immediately we found for buffalo PrPC(124-227) there are 5 hydrogen bonds at Asn143, but at this position bovine/cattle do not have such hydrogen bonds. Same as that of rabbits, dogs or horses, our molecular dynamics studies also confirmed there is a strong salt bridge ASP178-ARG164 (O-N) keeping the beta2-alpha2 loop linked in buffalo. We also found there is a very strong hydrogen bond SER170-TYR218 linking this loop with the C-terminal end of alpha-helix H3. Other information such as (i) there is a very strong salt bridge HIS187-ARG156 (N-O) linking alpha-helices H2 and H1 (if mutation H187R is made at position 187 then the hydrophobic core of PrPC will be exposed), (ii) at D178, there is a hydrogen bond Y169-D178 and a polar contact R164-D178 for BufPrPC instead of a polar contact Q168-D178 for bovine PrPC, (iii) BufPrPC owns 3-10 helices at 125-127, 152-156 and in the beta2-alpha2 loop respectively, and (iv) in beta2-alpha2 loop there are strong pi-contacts, etc, has been discovered. △ Less

Submitted 20 June, 2015; v1 submitted 27 February, 2015; originally announced March 2015.

Journal ref: Journal of biomolecular Structure & Dynamics 04/2016; 34(4):762-777

arXiv:1409.6104 [pdf, ps, other]

doi 10.2174/1389201015666141103020004

A survey and a molecular dynamics study on the (central) hydrophobic region of prion proteins

Authors: Jiapu Zhang, Feng Wang

Abstract: Prion diseases are invariably fatal neurodegenerative diseases that affect humans and animals. Unlike most other amyloid forming neurodegenerative diseases, these can be highly infectious. Prion diseases occur in a variety of species. They include the fatal human neurodegenerative diseases Creutzfeldt-Jakob Disease (CJD), Fatal Familial Insomnia (FFI), Gerstmann-Straussler-Scheinker syndrome (GSS)… ▽ More Prion diseases are invariably fatal neurodegenerative diseases that affect humans and animals. Unlike most other amyloid forming neurodegenerative diseases, these can be highly infectious. Prion diseases occur in a variety of species. They include the fatal human neurodegenerative diseases Creutzfeldt-Jakob Disease (CJD), Fatal Familial Insomnia (FFI), Gerstmann-Straussler-Scheinker syndrome (GSS), Kuru, the bovine spongiform encephalopathy (BSE or 'mad-cow' disease) in cattle, the chronic wasting disease (CWD) in deer and elk, and scrapie in sheep and goats, etc. Transmission across the species barrier to humans, especially in the case of BSE in Europe, CWD in North America, and variant CJDs (vCJDs) in young people of UK, is a major public health concern. Fortunately, scientists reported that the (central) hydrophobic region of prion proteins (PrP) controls the formation of diseased prions. This article gives a detailed survey on PrP hydrophobic region and does molecular dynamics studies of human PrP(110-136) to confirm some findings from the survey. The structural bioinformatics presented in this article can be helpful as a reference in three-dimensional images for laboratory experimental works to study PrP hydrophobic region. △ Less

Submitted 22 September, 2014; originally announced September 2014.

Journal ref: Curr Pharm Biotechnol 2014 Nov;15(11) pp:1026-1048

arXiv:1409.0305 [pdf]

doi 10.1063/1.4894752

A coarse-grained model with implicit salt for RNAs: predicting 3D structure, stability and salt effect

Authors: Ya-Zhou Shi, Feng-Hua Wang, Yuan-Yan Wu, Zhi-Jie Tan

Abstract: To bridge the gap between the sequences and 3-dimensional (3D) structures of RNAs, some computational models have been proposed for predicting RNA 3D structures. However, the existed models seldom consider the conditions departing from the room/body temperature and high salt (1M NaCl), and thus generally hardly predict the thermodynamics and salt effect. In this study, we propose a coarse-grained… ▽ More To bridge the gap between the sequences and 3-dimensional (3D) structures of RNAs, some computational models have been proposed for predicting RNA 3D structures. However, the existed models seldom consider the conditions departing from the room/body temperature and high salt (1M NaCl), and thus generally hardly predict the thermodynamics and salt effect. In this study, we propose a coarse-grained model with implicit salt for RNAs to predict 3D structures, stability and salt effect. Combined with Monte Carlo simulated annealing algorithm and a coarse-grained force field, the model folds 46 tested RNAs (less than or equal to 45 nt) including pseudoknots into their native-like structures from their sequences, with an overall mean RMSD of 3.5 Å and an overall minimum RMSD of 1.9 Å from the experimental structures. For 30 RNA hairpins, the present model also gives the reliable predictions for the stability and salt effect with the mean deviation ~ 1.0 degrees Celsius of melting temperatures, as compared with the extensive experimental data. In addition, the model could provide the ensemble of possible 3D structures for a short RNA at a given temperature/salt condition. △ Less

Submitted 1 September, 2014; originally announced September 2014.

Comments: 47 pages, 8 figures. Journal of Chemical Physics, in press

Journal ref: J. Chem. Phys. 141, 105102 (2014)

arXiv:1408.6662 [pdf]

doi 10.1088/1674-1056/23/7/078701

RNA structure prediction: progress and perspective

Authors: Ya-Zhou Shi, Yuan-Yan Wu, Feng-Hua Wang, Zhi-Jie Tan

Abstract: Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing f… ▽ More Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling. △ Less

Submitted 28 August, 2014; originally announced August 2014.

Comments: 23 pages

Journal ref: Chinese Physics B Vol. 23, No. 7 (2014) 078701

arXiv:1408.5269 [pdf]

doi 10.1080/07391102.2015.1064832

A Review on the Salt Bridge Between ASP177 and ARG163 of Wild-Type Rabbit Prion Protein

Authors: Jiapu Zhang, Feng Wang

Abstract: Prion diseases are invariably fatal and highly infectious neurodegenerative diseases that affect a wide variety of mammalian species such as sheep and goats, cattle, deer, elks, humans and mice etc., but rabbits have a low susceptibility to be infected by prion diseases with respect to other species. The stability of rabbit prion protein is due to its highly ordered β2-α2 loop (PLoS One 5(10) e132… ▽ More Prion diseases are invariably fatal and highly infectious neurodegenerative diseases that affect a wide variety of mammalian species such as sheep and goats, cattle, deer, elks, humans and mice etc., but rabbits have a low susceptibility to be infected by prion diseases with respect to other species. The stability of rabbit prion protein is due to its highly ordered β2-α2 loop (PLoS One 5(10) e13273 (2010); Journal of Biological Chemistry 285(41) 31682-31693 (2010)) and a hydrophobic staple helix-capping motif (PNAS 107(46) 19808-19813 (2010); PLoS One 8 (5) e63047 (2013)). The β2-α2 loop and the tail of Helix 3 it interacts with have been a focus in prion protein structure studies. For this loop we found a salt bridge linkage ASP177-ARG163 (O-N) (Journal of Theoretical Biology 342 (7 February 2014) 70-82 (2014)). Some scientists said on the 2FJ3.pdb NMR file of the rabbit prion protein, the distance of ASP177-ARG163 (O-N) gives the salt bridge of about 10 Å which is nearly null in terms of energy and such a salt bridge is not observed in their work. But, from the 3O79.pdb X-ray file of the rabbit prion protein, we can clearly observe this salt bridge. This article analyses the NMR and X-ray structures and gives an answer to the above question: the salt bridge presents at pH 6.5 in the X-ray structure is simply gone at pH 4.5 in the NMR structure is simply due to the different pH values that impact electrostatics at the salt bridge and hence also impact the structures. Moreover, some molecular dynamics simulation results of the X-ray structure are reported in this article to reveal the secrets of the structural stability of rabbit prion protein. △ Less

Submitted 29 January, 2015; v1 submitted 22 August, 2014; originally announced August 2014.

Comments: arXiv admin note: text overlap with arXiv:1407.6221

Journal ref: J Biolmol Struct Dyn 34(5) 1020-8 (2016)

arXiv:1407.6221 [pdf]

doi 10.1080/07391102.2014.947325

Molecular dynamics studies on the NMR structures of rabbit prion protein wild-type and mutants: surface electrostatic charge distributions

Authors: Jiapu Zhang, Feng Wang

Abstract: Prion is a misfolded protein found in mammals that causes infectious diseases of the nervous system in humans and animals. Prion diseases are invariably fatal and highly infectious neurodegenerative diseases that affect a wide variety of mammalian species such as sheep and goats, cattle, deer, elk and humans etc. Recent studies have shown that rabbits have a low susceptibility to be infected by pr… ▽ More Prion is a misfolded protein found in mammals that causes infectious diseases of the nervous system in humans and animals. Prion diseases are invariably fatal and highly infectious neurodegenerative diseases that affect a wide variety of mammalian species such as sheep and goats, cattle, deer, elk and humans etc. Recent studies have shown that rabbits have a low susceptibility to be infected by prion diseases with respect to other animals including humans. The present study employs molecular dynamics (MD) means to unravel the mechanism of rabbit prion proteins (RaPrPC) based on the recently available rabbit NMR structures (of the wild-type and its two mutants of two surface residues). The electrostatic charge distributions on the protein surface are the focus when analysing the MD trajectories. It is found that we can conclude that surface electrostatic charge distributions indeed contribute to the structural stability of wild-type RaPrPC; this may be useful for the medicinal treatment of prion diseases. △ Less

Submitted 23 July, 2014; originally announced July 2014.

Report number: PMID: 25105226

Journal ref: J Biomol Struct Dyn 33(6) 1326-35 (2015)

arXiv:1305.6666 [pdf]

In Silico Design, Extended Molecular Dynamic Simulations and Binding Energy Calculations for a New Series of Dually Acting Inhibitors against EGFR and HER2

Authors: Marawan Ahmed, Maiada M. Sadek, Khaled A. Abouzid, Feng Wang

Abstract: Starting from the lead structure we have identified in our previous works, we are extending our insight understanding of its potential inhibitory effect against both EGFR and HER2 receptors. Herein and using extended molecular dynamic simulations and different scoring techniques, we are providing plausible explanations for the observed inhibitory effect. Also, we are comparing the binding mechanis… ▽ More Starting from the lead structure we have identified in our previous works, we are extending our insight understanding of its potential inhibitory effect against both EGFR and HER2 receptors. Herein and using extended molecular dynamic simulations and different scoring techniques, we are providing plausible explanations for the observed inhibitory effect. Also, we are comparing the binding mechanism in addition to the dynamics of binding with two other approved inhibitors against EGFR (Lapatinib) and HER2 (SYR). Based on this information, we are also designing and in silico screening new potential inhibitors sharing the same scaffold of the lead structure. We have chosen the best scoring inhibitor for additional in silico investigation against both the wild-type and T790M mutant strain of EGFR. It seems that certain substitution pattern guarantees the binding to the conserved water molecule commonly observed with kinase crystal structures. Also, the new inhibitors seem to form a stable interaction with the mutant strain as a direct consequence of their enhanced ability to form additional interactions with binding site residues. △ Less

Submitted 28 May, 2013; originally announced May 2013.

Comments: 37 pages, 5 tables and 6 figures

arXiv:1305.3691 [pdf]

In silico investigation of lactone and thiolactone inhibitors in bacterial quorum sensing using molecular modeling

Authors: Marawan Ahmed, Stefanie Bird, Feng Wang, Enzo A. Palombo

Abstract: In the present study, the origin of the anti-quorum sensing (QS) activities of several members of a recently synthesized and in vitro tested class of lactone and thiolactone based inhibitors were computationally investigated. Docking and molecular dynamic (MD) simulations and binding free energy calculations were carried out to reveal the exact binding and inhibitory profiles of these compounds. T… ▽ More In the present study, the origin of the anti-quorum sensing (QS) activities of several members of a recently synthesized and in vitro tested class of lactone and thiolactone based inhibitors were computationally investigated. Docking and molecular dynamic (MD) simulations and binding free energy calculations were carried out to reveal the exact binding and inhibitory profiles of these compounds. The higher in vitro activity of the lactone series relative to their thiolactone isosteres was verified based on estimating the binding energies, the docking scores and monitoring the stability of the complexes produced in the MD simulations. The strong electrostatic contribution to the binding energies may be responsible for the higher inhibitory activity of the lactone with respect to the thiolactone series. The results of this study help to understand the anti-QS properties of lactone-based inhibitors and provide important information that may assist in the synthesis of novel QS inhibitors. △ Less

Submitted 16 May, 2013; originally announced May 2013.

arXiv:1303.2333 [pdf]

Warburg Effect due to Exposure to Different Types of Radiation

Authors: Zhitong Bing, Bin Ao, Yanan Zhang, Fengling Wang, Caiyong Ye, Jinpeng He, Jintu Sun, Jie Xiong, Nan Ding, Xiao-fei Gao, Ji Qi, Sheng Zhang, Guangming Zhou, Lei Yang

Abstract: Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental s… ▽ More Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental stimulus? Herein, we report an interesting phenomenon in which cells alternated between glycolysis and mitochondrial respiration depending on the type of radiation they were exposed to. We observed enhanced glycolysis and mitochondrial respiration in HeLa cells exposed to 2-Gy X-ray and 2-Gy carbon ion radiation, respectively. This discovery may provide novel insights for tumor therapy. △ Less

Submitted 10 March, 2013; originally announced March 2013.

arXiv:1211.5408 [pdf, other]

Salt Contribution to the Flexibility of Single-stranded Nucleic Acid of Finite Length

Authors: Feng-Hua Wang, Yuan-Yan Wu, Zhi-Jie Tan

Abstract: Nucleic acids are negatively charged macromolecules and their structure properties are strongly coupled to metal ions in solutions. In this paper, the salt effects on the flexibility of single stranded (ss) nucleic acid chain ranging from 12 to 120 nucleotides are investigated systematically by the coarse grained Monte Carlo simulations where the salt ions are considered explicitly and the ss chai… ▽ More Nucleic acids are negatively charged macromolecules and their structure properties are strongly coupled to metal ions in solutions. In this paper, the salt effects on the flexibility of single stranded (ss) nucleic acid chain ranging from 12 to 120 nucleotides are investigated systematically by the coarse grained Monte Carlo simulations where the salt ions are considered explicitly and the ss chain is modeled with the virtual bond structural model. Our calculations show that, the increase of ion concentration causes the structural collapse of ss chain and multivalent ions are much more efficient in causing such collapse, and trivalent and small divalent ions can both induce more compact state than a random relaxation state. We found that monovalent, divalent and trivalent ions can all overcharge ss chain, and the dominating source for such overcharging changes from ion exclusion volume effect to ion Coulomb correlations. In addition, the predicted Na and Mg dependent persistence length lp of ss nucleic acid are in accordance with the available experimental data, and through systematic calculations, we obtained the empirical formulas for lp as a function of Na, Mg and chain length. △ Less

Submitted 22 November, 2012; originally announced November 2012.

Comments: This is pre-peer reviewed version of the article, which has been accepted by Biopolymers and we have signed the copyright transfer agreement of Wiley-Blackwell; Biopolymers 2013 impress

Showing 1–50 of 51 results for author: Wang, F