Skip to main content

Showing 1–50 of 60 results for author: Zitnik, M

  1. arXiv:2407.06483  [pdf, other

    cs.LG cs.CL

    Composable Interventions for Language Models

    Authors: Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

    Abstract: Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.00631  [pdf, other

    cs.LG cs.AI

    TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

    Authors: Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

    Abstract: Clinical trials are pivotal for developing new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2406.03403  [pdf, other

    cs.LG cs.AI q-bio.QM

    Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

    Authors: Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu

    Abstract: Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2406.02059  [pdf, other

    cs.LG

    Graph Adversarial Diffusion Convolution

    Authors: Songtao Liu, Jinghui Chen, Tianfan Fu, Lu Lin, Marinka Zitnik, Dinghao Wu

    Abstract: This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  5. arXiv:2405.09594  [pdf, other

    eess.IV cs.CV cs.LG

    Learning Generalized Medical Image Representations through Image-Graph Contrastive Pretraining

    Authors: Sameer Khanna, Daniel Michael, Marinka Zitnik, Pranav Rajpurkar

    Abstract: Medical image interpretation using deep learning has shown promise but often requires extensive expert-annotated datasets. To reduce this annotation burden, we develop an Image-Graph Contrastive Learning framework that pairs chest X-rays with structured report knowledge graphs automatically extracted from radiology notes. Our approach uniquely encodes the disconnected graph components via a relati… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted into Machine Learning for Health (ML4H) 2023

  6. arXiv:2404.02831  [pdf, other

    cs.AI

    Empowering Biomedical Discovery with AI Agents

    Authors: Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik

    Abstract: We envision 'AI scientists' as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate machine learning tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces,… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  7. arXiv:2403.01628  [pdf, ps, other

    cs.LG

    Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium

    Authors: Hyewon Jeong, Sarah Jabbour, Yuzhe Yang, Rahul Thapta, Hussein Mozannar, William Jongwon Han, Nikita Mehandru, Michael Wornow, Vladislav Lialin, Xin Liu, Alejandro Lozano, Jiacheng Zhu, Rafal Dariusz Kocielnik, Keith Harrigian, Haoran Zhang, Edward Lee, Milos Vukadinovic, Aparna Balagopalan, Vincent Jeanselme, Katherine Matton, Ilker Demirel, Jason Fries, Parisa Rashidi, Brett Beaulieu-Jones, Xuhai Orson Xu , et al. (18 additional authors not shown)

    Abstract: The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four vir… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: ML4H 2023, Research Roundtables

  8. arXiv:2403.00131  [pdf, other

    cs.LG cs.AI

    UNITS: A Unified Multi-Task Time Series Model

    Authors: Shanghua Gao, Teddy Koker, Owen Queen, Thomas Hartvigsen, Theodoros Tsiligkaridis, Marinka Zitnik

    Abstract: Advances in time series models are driving a shift from conventional deep learning methods to pre-trained foundational models. While pre-trained transformers and reprogrammed text-based LLMs report state-of-the-art results, the best-performing architectures vary significantly across tasks, and models often have limited scope, such as focusing only on time series forecasting. Models that unify pred… ▽ More

    Submitted 29 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

  9. arXiv:2401.05561  [pdf, other

    cs.CL

    TrustLLM: Trustworthiness in Large Language Models

    Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

    Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More

    Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: This work is still under work and we welcome your contribution

  10. arXiv:2312.05690  [pdf, other

    cs.HC

    Is Ignorance Bliss? The Role of Post Hoc Explanation Faithfulness and Alignment in Model Trust in Laypeople and Domain Experts

    Authors: Tessa Han, Yasha Ektefaie, Maha Farhat, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and… ▽ More

    Submitted 11 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  11. arXiv:2312.02439  [pdf, other

    cs.AI cs.CL cs.CV

    Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

    Authors: Shanshan Zhong, Zhongzhan Huang, Shanghua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou

    Abstract: Chain-of-Thought (CoT) guides large language models (LLMs) to reason step-by-step, and can motivate their logical reasoning ability. While effective for logical tasks, CoT is not conducive to creative problem-solving which often requires out-of-box thoughts and is crucial for innovation advancements. In this paper, we explore the Leap-of-Thought (LoT) abilities within LLMs -- a non-sequential, cre… ▽ More

    Submitted 21 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Technical report

  12. arXiv:2310.13767  [pdf, other

    cs.LG

    Graph AI in Medicine

    Authors: Ruth Johnson, Michelle M. Li, Ayush Noori, Owen Queen, Marinka Zitnik

    Abstract: In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks (GNNs), stands out for its capability to capture intricate relationships within structured clinical datasets. With diverse data -- from patient records to imaging -- GNNs process data holistically by viewing modalities as nodes interconnected by their relationships. Graph AI facilitates mo… ▽ More

    Submitted 11 December, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  13. arXiv:2310.02553  [pdf, other

    q-bio.BM

    Full-Atom Protein Pocket Design via Iterative Refinement

    Authors: Zaixi Zhang, Zepu Lu, Zhongkai Hao, Marinka Zitnik, Qi Liu

    Abstract: The design of \emph{de novo} functional proteins that bind specific ligand molecules is paramount in therapeutics and bio-engineering. A critical yet formidable task in this endeavor is the design of the protein pocket, which is the cavity region of the protein where the ligand binds. Current methods are plagued by inefficient generation, inadequate context modeling of the ligand molecule, and the… ▽ More

    Submitted 19 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 Spotlight

  14. arXiv:2309.08478  [pdf, other

    q-bio.MN

    Current and future directions in network biology

    Authors: Marinka Zitnik, Michelle M. Li, Aydin Wells, Kimberly Glass, Deisy Morselli Gysi, Arjun Krishnan, T. M. Murali, Predrag Radivojac, Sushmita Roy, Anaïs Baudot, Serdar Bozdag, Danny Z. Chen, Lenore Cowen, Kapil Devkota, Anthony Gitter, Sara Gosline, Pengfei Gu, Pietro H. Guzzi, Heng Huang, Meng Jiang, Ziynet Nesibe Kesimoglu, Mehmet Koyuturk, Jian Ma, Alexander R. Pico, Nataša Pržulj , et al. (12 additional authors not shown)

    Abstract: Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These challenges stem from various fa… ▽ More

    Submitted 11 June, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 52 pages, 6 figures, 1 table

  15. arXiv:2307.08423  [pdf, other

    cs.LG physics.comp-ph

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

    Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More

    Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  16. arXiv:2306.11768  [pdf, other

    q-bio.QM cs.CE cs.LG

    A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design

    Authors: Zaixi Zhang, Jiaxian Yan, Qi Liu, Enhong Chen, Marinka Zitnik

    Abstract: Structure-based drug design (SBDD) utilizes the three-dimensional geometry of proteins to identify potential drug candidates. Traditional methods, grounded in physicochemical modeling and informed by domain expertise, are resource-intensive. Recent developments in geometric deep learning, focusing on the integration and processing of 3D geometric data, coupled with the availability of accurate pro… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 20 pages, under review

  17. arXiv:2306.02109  [pdf, other

    cs.LG

    Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency

    Authors: Owen Queen, Thomas Hartvigsen, Teddy Koker, Huan He, Theodoros Tsiligkaridis, Marinka Zitnik

    Abstract: Interpreting time series models is uniquely challenging because it requires identifying both the location of time series signals that drive model predictions and their matching to an interpretable temporal pattern. While explainers from other modalities can be applied to time series, their inductive biases do not transfer well to the inherently challenging interpretation of time series. We present… ▽ More

    Submitted 24 October, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023 (spotlight)

  18. arXiv:2302.13406  [pdf, other

    cs.LG cs.AI

    GNNDelete: A General Strategy for Unlearning in Graph Neural Networks

    Authors: Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik

    Abstract: Graph unlearning, which involves deleting graph elements such as nodes, node labels, and relationships from a trained graph neural network (GNN) model, is crucial for real-world applications where data elements may become irrelevant, inaccurate, or privacy-sensitive. However, existing methods for graph unlearning either deteriorate model weights shared across all nodes or fail to effectively delet… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: Accepted to ICLR2023

  19. arXiv:2302.03133  [pdf, other

    cs.LG cs.AI

    Domain Adaptation for Time Series Under Feature and Label Shifts

    Authors: Huan He, Owen Queen, Teddy Koker, Consuelo Cuevas, Theodoros Tsiligkaridis, Marinka Zitnik

    Abstract: Unsupervised domain adaptation (UDA) enables the transfer of models trained on source domains to unlabeled target domains. However, transferring complex time series models presents challenges due to the dynamic temporal structure variations across domains. This leads to feature shifts in the time and frequency representations. Additionally, the label distributions of tasks in the source and target… ▽ More

    Submitted 18 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted by ICML 2023; 29 pages (14 pages main paper + 15 pages supplementary materials). Code: see https://github.com/mims-harvard/Raincoat

  20. arXiv:2209.03299  [pdf, other

    cs.LG cs.AI

    Multimodal learning with graphs

    Authors: Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, Marinka Zitnik

    Abstract: Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases: the set of assumptions that algorithms use to make predictions for inputs they have not enc… ▽ More

    Submitted 23 January, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: 27 pages, 5 figures, 2 boxes

  21. arXiv:2208.09339  [pdf, other

    cs.LG cs.AI

    Evaluating Explainability for Graph Neural Networks

    Authors: Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

    Abstract: As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, S… ▽ More

    Submitted 16 January, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

  22. arXiv:2206.11104  [pdf, other

    cs.LG cs.AI

    OpenXAI: Towards a Transparent Evaluation of Model Explanations

    Authors: Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: While several types of post hoc explanation methods have been proposed in recent literature, there is very little work on systematically benchmarking these methods. Here, we introduce OpenXAI, a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator a… ▽ More

    Submitted 13 March, 2024; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Newer version with updated results and code

  23. arXiv:2206.08496  [pdf, other

    cs.LG cs.AI

    Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency

    Authors: Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, Marinka Zitnik

    Abstract: Pre-training on time series poses a unique challenge due to the potential mismatch between pre-training and target domains, such as shifts in temporal dynamics, fast-evolving trends, and long-range and short-cyclic effects, which can lead to poor downstream performance. While domain adaptation methods can mitigate these shifts, most methods need examples directly from the target domain, making the… ▽ More

    Submitted 15 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted by NeruIPS 2022; 32pages (14 pages main paper + 18 pages supplementary materials). Code: https://github.com/mims-harvard/TFC-pretraining

    Journal ref: NeurIPS 2022

  24. arXiv:2203.08893  [pdf, other

    cs.LG cs.AI cs.CL cs.SI

    Multimodal Learning on Graphs for Disease Relation Extraction

    Authors: Yucong Lin, Keming Lu, Sheng Yu, Tianxi Cai, Marinka Zitnik

    Abstract: Objective: Disease knowledge graphs are a way to connect, organize, and access disparate information about diseases with numerous benefits for artificial intelligence (AI). To create knowledge graphs, it is necessary to extract knowledge from multimodal datasets in the form of relationships between disease concepts and normalize both concepts and relationship types. Methods: We introduce REMAP,… ▽ More

    Submitted 30 August, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

  25. arXiv:2203.06877  [pdf, other

    cs.LG

    Rethinking Stability for Attribution-based Explanations

    Authors: Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e.g., robust to infinitesimal perturbations to an input. However, previous works have shown that state-of-the-art explanation methods generate unstable explanations. Here, we introduce metrics to quantify the stabi… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  26. arXiv:2112.12582  [pdf

    q-bio.OT cs.LG

    Beyond Low Earth Orbit: Biological Research, Artificial Intelligence, and Self-Driving Labs

    Authors: Lauren M. Sanders, Jason H. Yang, Ryan T. Scott, Amina Ann Qutub, Hector Garcia Martin, Daniel C. Berrios, Jaden J. A. Hastings, Jon Rask, Graham Mackintosh, Adrienne L. Hoarfrost, Stuart Chalk, John Kalantari, Kia Khezeli, Erik L. Antonsen, Joel Babdor, Richard Barker, Sergio E. Baranzini, Afshin Beheshti, Guillermo M. Delgado-Aparicio, Benjamin S. Glicksberg, Casey S. Greene, Melissa Haendel, Arif A. Hamid, Philip Heller, Daniel Jamieson , et al. (31 additional authors not shown)

    Abstract: Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabilize the ecosystem of plants, crops, microbes, animals, and humans for sustained multi-planetary life. To advance these aims, the field leverages experiments, platforms, data, and mode… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 28 pages, 4 figures

  27. arXiv:2112.12554  [pdf

    q-bio.OT cs.LG

    Beyond Low Earth Orbit: Biomonitoring, Artificial Intelligence, and Precision Space Health

    Authors: Ryan T. Scott, Erik L. Antonsen, Lauren M. Sanders, Jaden J. A. Hastings, Seung-min Park, Graham Mackintosh, Robert J. Reynolds, Adrienne L. Hoarfrost, Aenor Sawyer, Casey S. Greene, Benjamin S. Glicksberg, Corey A. Theriot, Daniel C. Berrios, Jack Miller, Joel Babdor, Richard Barker, Sergio E. Baranzini, Afshin Beheshti, Stuart Chalk, Guillermo M. Delgado-Aparicio, Melissa Haendel, Arif A. Hamid, Philip Heller, Daniel Jamieson, Katelyn J. Jarvis , et al. (31 additional authors not shown)

    Abstract: Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are necessary to enable Earth-independence, rather than Earth-reliance. Promising developments in the fields of artificial intelligence and machine learning for biology and health can address… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 31 pages, 4 figures

  28. Time-resolved quantum beats in the fluorescence of helium resonantly excited by XUV radiation

    Authors: AC LaForge, A Benediktovitch, V Sukharnikov, Š Krušič, M Žitnik, M Debatin, RW Falcone, JD Asmussen, M Mudrich, R Michiels, F Stienkemeier, L Badano, C Callegari, M Di Fraia, M Ferianis, L Giannessi, O Plekan, KC Prince, C Spezzani, N Rohringer, N Berrah

    Abstract: We report on the observation of time-resolved quantum beats in the helium fluorescence from the transition 1s3p -> 1s2s, where the initial state is excited by XUV free-electron laser radiation. The quantum beats originate from the Zeeman splitting of the magnetic substates due to an external magnetic field. We perform a systematic study of this effect and discuss the possibilities of studying this… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: 7 pages, 4 figures

    Journal ref: J. Phys. B: At. Mol. Opt. Phys. 53 244012 (2020)

  29. arXiv:2111.06247  [pdf, other

    q-bio.QM q-bio.GN q-bio.MN

    Sparse dictionary learning recovers pleiotropy from human cell fitness screens

    Authors: Joshua Pan, Jason J. Kwon, Jessica A. Talamas, Ashir A. Borah, Francisca Vazquez, Jesse S. Boehm, Aviad Tsherniak, Marinka Zitnik, James M. McFarland, William C. Hahn

    Abstract: In high-throughput functional genomic screens, each gene product is commonly assumed to exhibit a singular biological function within a defined protein complex or pathway. In practice, a single gene perturbation may induce multiple cascading functional outcomes, a genetic principle known as pleiotropy. Here, we model pleiotropy in fitness screen collections by representing each gene perturbation a… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: Accepted to the 16th Machine Learning in Computational Biology (MLCB) meeting 2021, and the Learning Meaningful Representations of Life (LMRL) Workshop at NeurIPS 2021

  30. arXiv:2110.05357  [pdf, other

    cs.LG cs.AI

    Graph-Guided Network for Irregularly Sampled Multivariate Time Series

    Authors: Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, Marinka Zitnik

    Abstract: In many domains, including healthcare, biology, and climate science, time series are irregularly sampled with varying time intervals between successive readouts and different subsets of variables (sensors) observed at different time points. Here, we introduce RAINDROP, a graph neural network that embeds irregularly sampled and multivariate time series while also learning the dynamics of sensors pu… ▽ More

    Submitted 15 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted by ICLR 2022; https://github.com/mims-harvard/Raindrop

  31. arXiv:2106.09078  [pdf, other

    cs.LG

    Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

    Authors: Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models. However, there has been little to no work on systematically analyzing the reliability of these methods. Here, we introduce the first-ever theoretical analysis of the reliability of state-of-the-art G… ▽ More

    Submitted 22 February, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to AISTATS 2022

  32. arXiv:2106.02246  [pdf, other

    cs.LG q-bio.MN q-bio.QM q-bio.TO

    Deep Contextual Learners for Protein Networks

    Authors: Michelle M. Li, Marinka Zitnik

    Abstract: Spatial context is central to understanding health and disease. Yet reference protein interaction networks lack such contextualization, thereby limiting the study of where protein interactions likely occur in the human body and how they may be altered in disease. Contextualized protein interactions could better characterize genes with disease-specific interactions and elucidate diseases' manifesta… ▽ More

    Submitted 16 July, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Accepted to the 2021 International Conference on Machine Learning (ICML) Workshop on Computational Biology (WCB)

  33. arXiv:2104.04883  [pdf, other

    cs.LG cs.SI q-bio.BM q-bio.GN q-bio.MN

    Graph Representation Learning in Biomedicine

    Authors: Michelle M. Li, Kexin Huang, Marinka Zitnik

    Abstract: Biomedical networks (or graphs) are universal descriptors for systems of interacting elements, from molecular interactions and disease co-morbidity to healthcare systems and scientific knowledge. Advances in artificial intelligence, specifically deep learning, have enabled us to model, analyze, and learn with such networked data. In this review, we put forward an observation that long-standing pri… ▽ More

    Submitted 10 June, 2022; v1 submitted 10 April, 2021; originally announced April 2021.

  34. arXiv:2103.10334  [pdf, other

    cs.LG

    Structure Inducing Pre-Training

    Authors: Matthew B. A. McDermott, Brendan Yap, Peter Szolovits, Marinka Zitnik

    Abstract: Language model pre-training and derived methods are incredibly impactful in machine learning. However, there remains considerable uncertainty on exactly why pre-training helps improve performance for fine-tuning tasks. This is especially true when attempting to adapt language-model pre-training to domains outside of natural language. Here, we analyze this problem by exploring how existing pre-trai… ▽ More

    Submitted 4 August, 2022; v1 submitted 18 March, 2021; originally announced March 2021.

  35. arXiv:2102.13186  [pdf, other

    cs.LG

    Towards a Unified Framework for Fair and Stable Graph Representation Learning

    Authors: Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik

    Abstract: As the representations output by Graph Neural Networks (GNNs) are increasingly employed in real-world applications, it becomes important to ensure that these representations are fair and stable. In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any G… ▽ More

    Submitted 16 June, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: Accepted to UAI'21

    Report number: Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI 2021),

    Journal ref: PMLR 161:2114-2124, 2021

  36. arXiv:2102.09548  [pdf, other

    cs.LG cs.CY q-bio.BM q-bio.QM

    Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

    Authors: Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik

    Abstract: Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeuti… ▽ More

    Submitted 28 August, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: Published at NeurIPS 2021 Datasets and Benchmarks

  37. arXiv:2101.04013  [pdf

    cs.LG

    Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients

    Authors: Tingyi Wanyan, Hossein Honarvar, Suraj K. Jaladanki, Chengxi Zang, Nidhi Naik, Sulaiman Somani, Jessica K. De Freitas, Ishan Paranjpe, Akhil Vaid, Riccardo Miotto, Girish N. Nadkarni, Marinka Zitnik, ArifulAzad, Fei Wang, Ying Ding, Benjamin S. Glicksberg

    Abstract: Machine Learning (ML) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing ML models for the coronavirus-disease 2019 (COVID-19) pandemic where data is highly imbalanced, particularly within electronic health records (EHR) research. Conventional approaches in ML use cross-ent… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

  38. arXiv:2010.03951  [pdf, other

    q-bio.QM cs.HC cs.LG

    MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning

    Authors: Kexin Huang, Tianfan Fu, Dawood Khan, Ali Abid, Ali Abdalla, Abubakar Abid, Lucas M. Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun

    Abstract: The efficacy of a drug depends on its binding affinity to the therapeutic target and pharmacokinetics. Deep learning (DL) has demonstrated remarkable progress in predicting drug efficacy. We develop MolDesigner, a human-in-the-loop web user-interface (UI), to assist drug developers leverage DL predictions to design more effective drugs. A developer can draw a drug molecule in the interface. In the… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020 Demonstration Track

  39. arXiv:2006.10538  [pdf, other

    cs.LG cs.SI stat.ML

    Subgraph Neural Networks

    Authors: Emily Alsentzer, Samuel G. Finlayson, Michelle M. Li, Marinka Zitnik

    Abstract: Deep learning methods for graphs achieve remarkable performance on many node-level and graph-level prediction tasks. However, despite the proliferation of the methods and their success, prevailing Graph Neural Networks (GNNs) neglect subgraphs, rendering subgraph prediction tasks challenging to tackle in many impactful applications. Further, subgraph prediction tasks present several unique challen… ▽ More

    Submitted 6 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: E.A. and S.G.F. contributed equally

  40. arXiv:2006.08149  [pdf, other

    cs.LG stat.ML

    GNNGuard: Defending Graph Neural Networks against Adversarial Attacks

    Authors: Xiang Zhang, Marinka Zitnik

    Abstract: Deep learning methods for graphs achieve remarkable performance across a variety of domains. However, recent findings indicate that small, unnoticeable perturbations of graph structure can catastrophically reduce performance of even the strongest and most popular Graph Neural Networks (GNNs). Here, we develop GNNGuard, a general algorithm to defend against a variety of training-time attacks that p… ▽ More

    Submitted 28 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted by NeurIPS 2020. More info about GNNGuard: https://zitniklab.hms.harvard.edu/projects/GNNGuard/

  41. arXiv:2006.07889  [pdf, other

    cs.LG stat.ML

    Graph Meta Learning via Local Subgraphs

    Authors: Kexin Huang, Marinka Zitnik

    Abstract: Prevailing methods for graphs require abundant label and edge information for learning. When data for a new task are scarce, meta-learning can learn from prior experiences and form much-needed inductive biases for fast adaption to new tasks. Here, we introduce G-Meta, a novel meta-learning algorithm for graphs. G-Meta uses local subgraphs to transfer subgraph-specific information and learn transfe… ▽ More

    Submitted 8 January, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  42. arXiv:2005.00687  [pdf, other

    cs.LG cs.SI stat.ML

    Open Graph Benchmark: Datasets for Machine Learning on Graphs

    Authors: Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec

    Abstract: We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source c… ▽ More

    Submitted 24 February, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: Fix dataset bug in ogbg-code

  43. arXiv:2004.14949  [pdf, other

    q-bio.MN cs.LG

    SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks

    Authors: Kexin Huang, Cao Xiao, Lucas Glass, Marinka Zitnik, Jimeng Sun

    Abstract: Molecular interaction networks are powerful resources for the discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are optimized for prediction on the basis of direct similarity between interacting nodes. In biolo… ▽ More

    Submitted 9 December, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: Published in Nature Scientific Reports: https://www.nature.com/articles/s41598-020-77766-9

  44. arXiv:2004.08919  [pdf, other

    cs.LG q-bio.QM stat.ML

    DeepPurpose: a Deep Learning Library for Drug-Target Interaction Prediction

    Authors: Kexin Huang, Tianfan Fu, Lucas Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun

    Abstract: Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use deep learning lib… ▽ More

    Submitted 9 December, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

    Comments: Published in Bioinformatics (2020)

  45. arXiv:2004.07229  [pdf

    q-bio.MN cs.LG q-bio.QM stat.ML

    Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19

    Authors: Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Susan Dina Ghiassian, JJ Patten, Robert Davey, Joseph Loscalzo, Albert-László Barabási

    Abstract: The current pandemic has highlighted the need for methodologies that can quickly and reliably prioritize clinically approved compounds for their potential effectiveness for SARS-CoV-2 infections. In the past decade, network medicine has developed and validated multiple predictive algorithms for drug repurposing, exploiting the sub-cellular network-based relationship between a drug's targets and di… ▽ More

    Submitted 9 August, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  46. arXiv:2002.08596  [pdf

    cs.LG stat.ML

    Interpretability of machine learning based prediction models in healthcare

    Authors: Gregor Stiglic, Primoz Kocbek, Nino Fijacko, Marinka Zitnik, Katrien Verbert, Leona Cilar

    Abstract: There is a need of ensuring machine learning models that are interpretable. Higher interpretability of the model means easier comprehension and explanation of future predictions for end-users. Further, interpretable machine learning models allow healthcare experts to make reasonable and data-driven decisions to provide personalized decisions that can ultimately lead to higher quality of service in… ▽ More

    Submitted 14 August, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: 12 pages, 2 figures, published in Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

    Journal ref: WIREs Data Mining Knowl Discov (2020)

  47. arXiv:1905.12265  [pdf, other

    cs.LG stat.ML

    Strategies for Pre-training Graph Neural Networks

    Authors: Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec

    Abstract: Many applications of machine learning require a model to make accurate pre-dictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training. An effective approach to this challenge is to pre-train a model on related tasks where data is abundant, and then fine-tune it on a downstream task of interest. While pre-training has been… ▽ More

    Submitted 18 February, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted as a spotlight to ICLR 2020

  48. arXiv:1903.03894  [pdf, other

    cs.LG stat.ML

    GNNExplainer: Generating Explanations for Graph Neural Networks

    Authors: Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec

    Abstract: Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs.GNNs combine node feature information with the graph structure by recursively passing neural messages along edges of the input graph. However, incorporating both graph structure and feature information leads to complex models, and explaining predictions made by GNNs remains unsolved. Here we propose GNNExplainer, the f… ▽ More

    Submitted 13 November, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

  49. arXiv:1808.01743  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    NIMFA: A Python Library for Nonnegative Matrix Factorization

    Authors: Marinka Zitnik, Blaz Zupan

    Abstract: NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. It supports both dense and sparse matrix representation. NIMFA's component-based implementation and hierarchical design should help the users to employ already… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Journal ref: Journal of Machine Learning Research 13 (2012) 849-853

  50. arXiv:1807.00123  [pdf, other

    q-bio.QM cs.CE cs.LG stat.ML

    Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

    Authors: Marinka Zitnik, Francis Nguyen, Bo Wang, Jure Leskovec, Anna Goldenberg, Michael M. Hoffman

    Abstract: New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integ… ▽ More

    Submitted 10 October, 2018; v1 submitted 30 June, 2018; originally announced July 2018.

    Journal ref: Information Fusion 50 (2019) 71-91