Skip to main content

Showing 1–50 of 68 results for author: Wolf, G

  1. arXiv:2407.09618  [pdf, other

    cs.LG cs.SI

    The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka

    Abstract: Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance com… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Suggestions and comments are welcomed at sitao.luan@mail.mcgill.ca!

  2. arXiv:2407.05385  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

    Authors: Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, Guy Wolf

    Abstract: Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural net… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Proceedings of the Forty-first International Conference on Machine Learning (ICML 2024)

  3. arXiv:2406.04421  [pdf, other

    cs.LG stat.ML

    Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

    Authors: Shuang Ni, Adrien Aumon, Guy Wolf, Kevin R. Moon, Jake S. Rhodes

    Abstract: The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Common dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction met… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 7 pages, 3 figures

  4. arXiv:2406.03396  [pdf, other

    cs.LG math.FA stat.ML

    Noisy Data Visualization using Functional Data Analysis

    Authors: Haozhe Chen, Andres Felipe Duque Correa, Guy Wolf, Kevin R. Moon

    Abstract: Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many existing methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating al… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2405.20543  [pdf, other

    cs.LG cs.AI cs.DM

    Towards a General GNN Framework for Combinatorial Optimization

    Authors: Frederik Wenkel, Semih Cantürk, Michael Perlmutter, Guy Wolf

    Abstract: Graph neural networks (GNNs) have achieved great success for a variety of tasks such as node classification, graph classification, and link prediction. However, the use of GNNs (and machine learning more generally) to solve combinatorial optimization (CO) problems is much less explored. Here, we introduce a novel GNN architecture which leverages a complex filter bank and localized attention mechan… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 15 pages, 1 figure

    MSC Class: 68T07 (Primary) 68T20; 90C35; 05C62 (Secondary) ACM Class: F.2.2; I.2.6

  6. arXiv:2405.16397  [pdf, other

    cs.LG math.OC

    AdaFisher: Adaptive Second Order Optimization via Fisher Information

    Authors: Damien Martins Gomes, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, Mahdi S. Hosseini

    Abstract: First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order coun… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  7. arXiv:2403.07786  [pdf, other

    physics.optics cs.CV

    Generative deep learning-enabled ultra-large field-of-view lens-free imaging

    Authors: Ronald B. Liu, Zhe Liu, Max G. A. Wolf, Krishna P. Purohit, Gregor Fritz, Yi Feng, Carsten G. Hansen, Pierre O. Bagnaninchi, Xavier Casadevall i Solvas, Yunjie Yang

    Abstract: Advancements in high-throughput biomedical applications necessitate real-time, large field-of-view (FOV) imaging capabilities. Conventional lens-free imaging (LFI) systems, while addressing the limitations of physical lenses, have been constrained by dynamic, hard-to-model optical fields, resulting in a limited one-shot FOV of approximately 20 $mm^2$. This restriction has been a major bottleneck i… ▽ More

    Submitted 22 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  8. arXiv:2402.09799  [pdf

    cs.HC

    Co-Designing a wiki-based community knowledge management system for personal science

    Authors: Katharina Kloppenborg, Mad Price Ball, Steven Jonas, Gary Isaac Wolf, Bastian Greshake Tzovaras

    Abstract: Personal science is the practice of addressing personally relevant health questions through self-research. Implementing personal science can be challenging, due to the need to develop and adopt research protocols, tools, and methods. While online communities can provide valuable peer support, tools for systematically accessing community knowledge are lacking. The objective of this study is to appl… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: supplementary files are on Zenodo at https://zenodo.org/records/10659150

  9. arXiv:2402.04958  [pdf, other

    cs.CV

    Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

    Authors: Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky

    Abstract: Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time a… ▽ More

    Submitted 29 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2024

  10. arXiv:2402.03675  [pdf, other

    q-bio.BM cs.AI cs.CE cs.LG

    Effective Protein-Protein Interaction Exploration with PPIretrieval

    Authors: Chenqing Hua, Connor Coley, Guy Wolf, Doina Precup, Shuangjia Zheng

    Abstract: Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learn… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  11. arXiv:2312.04823  [pdf, other

    cs.CV cs.AI cs.IT cs.LG

    Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy

    Authors: Danqi Liao, Chen Liu, Benjamin W. Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy

    Abstract: Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to compute reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying m… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Journal ref: ICML 2023 Workshop on Topology, Algebra, and Geometry in Machine Learning

  12. arXiv:2312.00966  [pdf, other

    cs.LG cs.AI

    Spectral Temporal Contrastive Learning

    Authors: Sacha Morin, Somjit Nath, Samira Ebrahimi Kahou, Guy Wolf

    Abstract: Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, particularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear p… ▽ More

    Submitted 7 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted to Self-Supervised Learning - Theory and Practice, NeurIPS Workshop, 2023

  13. arXiv:2310.04292  [pdf, other

    cs.LG

    Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

    Authors: Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, Ali Parviz, Michael Craig, Michał Koziarski, Jiarui Lu, Zhaocheng Zhu, Cristian Gabellini, Kerstin Klaser, Josef Dean, Cas Wognum, Maciej Sypetkowski, Guillaume Rabusseau, Reihaneh Rabbany, Jian Tang, Christopher Morris , et al. (10 additional authors not shown)

    Abstract: Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by… ▽ More

    Submitted 18 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  14. arXiv:2309.09924  [pdf, other

    cs.LG eess.SP stat.ML

    Learning graph geometry and topology using dynamical systems based message-passing

    Authors: Dhananjay Bhaskar, Yanlei Zhang, Charles Xu, Xingzhi Sun, Oluwadamilola Fasina, Guy Wolf, Maximilian Nickel, Michael Perlmutter, Smita Krishnaswamy

    Abstract: In this paper we introduce DYMAG: a message passing paradigm for GNNs built on the expressive power of continuous, multiscale graph-dynamics. Standard discrete-time message passing algorithms implicitly make use of simplistic graph dynamics and aggregation schemes which limit their ability to capture fundamental graph topological properties. By contrast, DYMAG makes use of complex graph dynamics b… ▽ More

    Submitted 7 July, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

  15. arXiv:2307.07107  [pdf, other

    cs.LG

    Graph Positional and Structural Encoder

    Authors: Semih Cantürk, Renming Liu, Olivier Lapointe-Gagné, Vincent Létourneau, Guy Wolf, Dominique Beaini, Ladislav Rampášek

    Abstract: Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, rendering them essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for all graph prediction tasks is a challenging and unsolved problem. Here, we present the Graph Positional and Structural Encoder (GPSE), the first-ever graph en… ▽ More

    Submitted 10 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted at ICML 2024; 34 pages, 6 figures

  16. arXiv:2307.03672  [pdf, other

    cs.LG

    Simulation-free Schrödinger bridges via score and flow matching

    Authors: Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio

    Abstract: We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired samples drawn from arbitrary source and target distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]… ▽ More

    Submitted 11 March, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: AISTATS 2024. Code: https://github.com/atong01/conditional-flow-matching

  17. arXiv:2306.07803  [pdf, other

    cs.LG

    Inferring dynamic regulatory interaction graphs from time series data with perturbations

    Authors: Dhananjay Bhaskar, Sumner Magruder, Edward De Brouwer, Aarthi Venkat, Frederik Wenkel, Guy Wolf, Smita Krishnaswamy

    Abstract: Complex systems are characterized by intricate interactions between entities that evolve dynamically over time. Accurate inference of these dynamic relationships is crucial for understanding and predicting system behavior. In this paper, we propose Regulatory Temporal Interaction Network Inference (RiTINI) for inferring time-varying interaction graphs in complex systems using a novel combination o… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  18. arXiv:2306.06062  [pdf, other

    cs.CV cs.LG

    Neural FIM for learning Fisher Information Metrics from point cloud data

    Authors: Oluwadamilola Fasina, Guillaume Huguet, Alexander Tong, Yanlei Zhang, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy

    Abstract: Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the underlying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifol… ▽ More

    Submitted 11 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 13 pages, 11 figures, 1 table

  19. arXiv:2306.02508  [pdf, other

    cs.LG stat.ML

    Graph Fourier MMD for Signals on Graphs

    Authors: Samuel Leone, Aarthi Venkat, Guillaume Huguet, Alexander Tong, Guy Wolf, Smita Krishnaswamy

    Abstract: While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little attention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  20. arXiv:2305.19043  [pdf, other

    cs.LG q-bio.GN q-bio.QM stat.ML

    A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction

    Authors: Guillaume Huguet, Alexander Tong, Edward De Brouwer, Yanlei Zhang, Guy Wolf, Ian Adelstein, Smita Krishnaswamy

    Abstract: Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoret… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 31 pages, 13 figures, 10 tables

  21. arXiv:2302.00482  [pdf, other

    cs.LG

    Improving and generalizing flow-based generative models with minibatch optimal transport

    Authors: Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, Yoshua Bengio

    Abstract: Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their simulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow… ▽ More

    Submitted 11 March, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: TMLR. Code: https://github.com/atong01/conditional-flow-matching

  22. arXiv:2211.00805  [pdf, other

    cs.LG q-bio.QM

    Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds

    Authors: Guillaume Huguet, Alexander Tong, María Ramos Zapatero, Christopher J. Tape, Guy Wolf, Smita Krishnaswamy

    Abstract: Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods are currently the state-of-the-art for such computations, but require $O(n^2)$ computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, i… ▽ More

    Submitted 26 September, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: A shorter version without the appendix appeared in the IEEE International Workshop on Machine Learning for Signal Processing (2023)

  23. arXiv:2210.16156  [pdf, other

    cs.LG cs.AI cs.CV

    Reliability of CKA as a Similarity Measure in Deep Learning

    Authors: MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky

    Abstract: Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained… ▽ More

    Submitted 16 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

  24. arXiv:2210.12774  [pdf, other

    stat.ML cs.LG

    Manifold Alignment with Label Information

    Authors: Andres F. Duque, Myriam Lizotte, Guy Wolf, Kevin R. Moon

    Abstract: Multi-domain data is becoming increasingly common and presents both challenges and opportunities in the data science community. The integration of distinct data-views can be used for exploratory data analysis, and benefit downstream analysis including machine learning related tasks. With this in mind, we present a novel manifold alignment method called MALI (Manifold alignment with label informati… ▽ More

    Submitted 30 October, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

  25. arXiv:2208.07458  [pdf, other

    cs.LG

    Learnable Filters for Geometric Scattering Modules

    Authors: Alexander Tong, Frederik Wenkel, Dhananjay Bhaskar, Kincaid Macdonald, Jackson Grady, Michael Perlmutter, Smita Krishnaswamy, Guy Wolf

    Abstract: We propose a new graph neural network (GNN) module, based on relaxations of recently proposed geometric scattering transforms, which consist of a cascade of graph wavelet filters. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations. The incorporation of our LEGS-module in GNNs enables the lear… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: 14 pages, 3 figures, 10 tables. arXiv admin note: substantial text overlap with arXiv:2010.02415

  26. arXiv:2206.14928  [pdf, other

    cs.LG

    Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

    Authors: Guillaume Huguet, D. S. Magruder, Alexander Tong, Oluwadamilola Fasina, Manik Kuchroo, Guy Wolf, Smita Krishnaswamy

    Abstract: We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow) that learns stochastic, continuous population dynamics from static snapshot samples taken at sporadic timepoints. MIOFlow combines dynamic models, manifold learning, and optimal transport by training neural ordinary differential equations (Neural ODE) to interpolate between static population snapshots as penalized b… ▽ More

    Submitted 3 November, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Presented at NeurIPS 2022, 24 pages, 7 tables, 14 figures

  27. arXiv:2206.08164  [pdf, other

    cs.LG

    Long Range Graph Benchmark

    Authors: Vijay Prakash Dwivedi, Ladislav Rampášek, Mikhail Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, Dominique Beaini

    Abstract: Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of T… ▽ More

    Submitted 28 November, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Added reference to Tönshoff et al., 2023 in Sec. 4.1; NeurIPS 2022 Track on D&B; Open-sourced at: https://github.com/vijaydwivedi75/lrgb

  28. arXiv:2206.07729  [pdf, other

    cs.LG

    Taxonomy of Benchmarks in Graph Representation Learning

    Authors: Renming Liu, Semih Cantürk, Frederik Wenkel, Sarah McGuire, Xinyi Wang, Anna Little, Leslie O'Bray, Michael Perlmutter, Bastian Rieck, Matthew Hirn, Guy Wolf, Ladislav Rampášek

    Abstract: Graph Neural Networks (GNNs) extend the success of neural networks to graph-structured data by accounting for their intrinsic geometry. While extensive research has been done on developing GNN models with superior performance according to a collection of graph representation learning benchmarks, it is currently not well understood what aspects of a given model are probed by them. For example, to w… ▽ More

    Submitted 30 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: In Proceedings of the First Learning on Graphs Conference (LoG 2022)

  29. arXiv:2206.07305  [pdf, other

    stat.ML cs.LG

    Diffusion Transport Alignment

    Authors: Andres F. Duque, Guy Wolf, Kevin R. Moon

    Abstract: The integration of multimodal data presents a challenge in cases when the study of a given phenomena by different instruments or conditions generates distinct but related domains. Many existing data integration methods assume a known one-to-one correspondence between domains of the entire dataset, which may be unrealistic. Furthermore, existing manifold alignment methods are not suited for cases w… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  30. arXiv:2206.01506  [pdf, other

    cs.LG

    Can Hybrid Geometric Scattering Networks Help Solve the Maximum Clique Problem?

    Authors: Yimeng Min, Frederik Wenkel, Michael Perlmutter, Guy Wolf

    Abstract: We propose a geometric scattering-based graph neural network (GNN) for approximating solutions of the NP-hard maximum clique (MC) problem. We construct a loss function with two terms, one which encourages the network to find highly connected nodes and the other which acts as a surrogate for the constraint that the nodes form a clique. We then use this loss to train an efficient GNN architecture th… ▽ More

    Submitted 28 November, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022

  31. arXiv:2205.12454  [pdf, other

    cs.LG

    Recipe for a General, Powerful, Scalable Graph Transformer

    Authors: Ladislav Rampášek, Mikhail Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, Dominique Beaini

    Abstract: We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encod… ▽ More

    Submitted 15 January, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: In Proceedings of NeurIPS 2022

  32. arXiv:2203.14860  [pdf, other

    cs.LG stat.ML

    Time-inhomogeneous diffusion geometry and topology

    Authors: Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

    Abstract: Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator t… ▽ More

    Submitted 5 January, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

  33. arXiv:2201.08932  [pdf, other

    stat.ML cs.LG

    Overcoming Oversmoothness in Graph Convolutional Networks via Hybrid Scattering Networks

    Authors: Frederik Wenkel, Yimeng Min, Matthew Hirn, Michael Perlmutter, Guy Wolf

    Abstract: Geometric deep learning has made great strides towards generalizing the design of structure-aware neural networks from traditional domains to non-Euclidean ones, giving rise to graph neural networks (GNN) that can be applied to graph-structured data arising in, e.g., social networks, biochemistry, and material science. Graph convolutional networks (GCNs) in particular, inspired by their Euclidean… ▽ More

    Submitted 14 August, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    MSC Class: 68T07

  34. arXiv:2201.00622  [pdf, other

    q-bio.NC cs.LG eess.SP

    Learning shared neural manifolds from multi-subject FMRI data

    Authors: Jessie Huang, Erica L. Busch, Tom Wallenstein, Michal Gerasimiuk, Andrew Benz, Guillaume Lajoie, Guy Wolf, Nicholas B. Turk-Browne, Smita Krishnaswamy

    Abstract: Functional magnetic resonance imaging (fMRI) is a notoriously noisy measurement of brain activity because of the large variations between individuals, signals marred by environmental differences during collection, and spatiotemporal averaging required by the measurement resolution. In addition, the data is extremely high dimensional, with the space of the activity typically having much lower intri… ▽ More

    Submitted 22 December, 2021; originally announced January 2022.

  35. arXiv:2111.10452  [pdf, other

    cs.LG cs.AI

    MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data

    Authors: Michal Gerasimiuk, Dennis Shung, Alexander Tong, Adrian Stanley, Michael Schultz, Jeffrey Ngu, Loren Laine, Guy Wolf, Smita Krishnaswamy

    Abstract: A major challenge in embedding or visualizing clinical patient data is the heterogeneity of variable types including continuous lab values, categorical diagnostic codes, as well as missing or incomplete data. In particular, in EHR data, some variables are {\em missing not at random (MNAR)} but deliberately not collected and thus are a source of information. For example, lab tests may be deemed nec… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  36. arXiv:2111.04033  [pdf, other

    cs.LG stat.ME

    Positivity Validation Detection and Explainability via Zero Fraction Multi-Hypothesis Testing and Asymmetrically Pruned Decision Trees

    Authors: Guy Wolf, Gil Shabat, Hanan Shteingart

    Abstract: Positivity is one of the three conditions for causal inference from observational data. The standard way to validate positivity is to analyze the distribution of propensity. However, to democratize the ability to do causal inference by non-experts, it is required to design an algorithm to (i) test positivity and (ii) explain where in the covariate space positivity is lacking. The latter could be u… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: Talk accepted to Causal Data Science Meeting, 2021

  37. arXiv:2110.14809  [pdf, other

    cs.LG

    Towards a Taxonomy of Graph Learning Datasets

    Authors: Renming Liu, Semih Cantürk, Frederik Wenkel, Dylan Sandfelder, Devin Kreuzer, Anna Little, Sarah McGuire, Leslie O'Bray, Michael Perlmutter, Bastian Rieck, Matthew Hirn, Guy Wolf, Ladislav Rampášek

    Abstract: Graph neural networks (GNNs) have attracted much attention due to their ability to leverage the intrinsic geometries of the underlying data. Although many different types of GNN models have been developed, with many benchmarking procedures to demonstrate the superiority of one GNN model over the others, there is a lack of systematic understanding of the underlying benchmarking datasets, and what a… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: in Data-Centric AI Workshop at NeurIPS 2021

  38. arXiv:2107.12334  [pdf, other

    cs.LG eess.SP

    Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

    Authors: Alexander Tong, Guillaume Huguet, Dennis Shung, Amine Natik, Manik Kuchroo, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy

    Abstract: In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying… ▽ More

    Submitted 28 March, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: 5 pages, 5 figures, ICASSP 2022

  39. arXiv:2107.09539  [pdf, other

    cs.LG eess.SP

    Parametric Scattering Networks

    Authors: Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

    Abstract: The wavelet scattering transform creates geometric invariants and deformation stability. In multiple signal domains, it has been shown to yield more discriminative representations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering tra… ▽ More

    Submitted 15 August, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    ACM Class: F.2.2; I.2.7

  40. arXiv:2107.07432  [pdf, other

    cs.LG stat.ML

    Hierarchical graph neural nets can capture long-range interactions

    Authors: Ladislav Rampášek, Guy Wolf

    Abstract: Graph neural networks (GNNs) based on message passing between neighboring nodes are known to be insufficient for capturing long-range interactions in graphs. In this project we study hierarchical message passing models that leverage a multi-resolution representation of a given graph. This facilitates learning of features that span large receptive fields without loss of local information, an aspect… ▽ More

    Submitted 15 August, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

  41. arXiv:2102.12833  [pdf, other

    cs.LG

    Diffusion Earth Mover's Distance and Distribution Embeddings

    Authors: Alexander Tong, Guillaume Huguet, Amine Natik, Kincaid MacDonald, Manik Kuchroo, Ronald Coifman, Guy Wolf, Smita Krishnaswamy

    Abstract: We propose a new fast method of measuring distances between large numbers of related high dimensional datasets called the Diffusion Earth Mover's Distance (EMD). We model the datasets as distributions supported on common data graph that is derived from the affinity matrix computed on the combined data. In such cases where the graph is a discretization of an underlying Riemannian closed manifold, w… ▽ More

    Submitted 27 July, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: Presented at ICML 2021

  42. arXiv:2102.06757  [pdf, other

    cs.LG cs.HC

    Multimodal Data Visualization and Denoising with Integrated Diffusion

    Authors: Manik Kuchroo, Abhinav Godavarthi, Alexander Tong, Guy Wolf, Smita Krishnaswamy

    Abstract: We propose a method called integrated diffusion for combining multimodal datasets, or data gathered via several different measurements on the same system, to create a joint data diffusion operator. As real world data suffers from both local and global noise, we introduce mechanisms to optimally calculate a diffusion operator that reflects the combined information from both modalities. We show the… ▽ More

    Submitted 3 March, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

  43. arXiv:2102.00485  [pdf, other

    cs.LG stat.ML

    Exploring the Geometry and Topology of Neural Network Loss Landscapes

    Authors: Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy

    Abstract: Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have… ▽ More

    Submitted 26 January, 2022; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted at the 20th Symposium on Intelligent Data Analysis (IDA) 2022

  44. Geometric Scattering Attention Networks

    Authors: Yimeng Min, Frederik Wenkel, Guy Wolf

    Abstract: Geometric scattering has recently gained recognition in graph representation learning, and recent work has shown that integrating scattering features in graph convolution networks (GCNs) can alleviate the typical oversmoothing of features in node representation learning. However, scattering often relies on handcrafted design, requiring careful selection of frequency bands via a cascade of wavelet… ▽ More

    Submitted 19 January, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8518-8522, June 2021

  45. arXiv:2010.02415  [pdf, other

    cs.LG stat.ML

    Data-Driven Learning of Geometric Scattering Networks

    Authors: Alexander Tong, Frederik Wenkel, Kincaid MacDonald, Smita Krishnaswamy, Guy Wolf

    Abstract: We propose a new graph neural network (GNN) module, based on relaxations of recently proposed geometric scattering transforms, which consist of a cascade of graph wavelet filters. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations. The incorporation of our LEGS-module in GNNs enables the lear… ▽ More

    Submitted 28 March, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: 6 pages, 2 figures, 3 tables, Presented at IEEE MLSP 2021

  46. Extendable and invertible manifold learning with geometry regularized autoencoders

    Authors: Andrés F. Duque, Sacha Morin, Guy Wolf, Kevin R. Moon

    Abstract: A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, especially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Au… ▽ More

    Submitted 22 November, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: 10 pages, 6 figures

    Journal ref: IEEE International Conference on Big Data, pp. 5027-5036, Dec. 2020

  47. arXiv:2006.12253  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Advantages of biologically-inspired adaptive neural activation in RNNs during learning

    Authors: Victor Geadah, Giancarlo Kerg, Stefan Horoi, Guy Wolf, Guillaume Lajoie

    Abstract: Dynamic adaptation in single-neuron response plays a fundamental role in neural coding in biological neural networks. Yet, most neural activation functions used in artificial networks are fixed and mostly considered as an inconsequential architecture choice. In this paper, we investigate nonlinear activation function adaptation over the large time scale of learning, and outline its impact on seque… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  48. arXiv:2006.08701  [pdf, other

    stat.ML cs.HC cs.LG stat.AP

    Supervised Visualization for Data Exploration

    Authors: Jake S. Rhodes, Adele Cutler, Guy Wolf, Kevin R. Moon

    Abstract: Dimensionality reduction is often used as an initial step in data exploration, either as preprocessing for classification or regression or for visualization. Most dimensionality reduction techniques to date are unsupervised; they do not take class labels into account (e.g., PCA, MDS, t-SNE, Isomap). Such methods require large amounts of data and are often sensitive to noise that may obfuscate impo… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: 21 pages, 9 figures

  49. arXiv:2006.07882  [pdf, other

    q-bio.NC cs.LG eess.IV math.AT stat.ML

    Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence

    Authors: Bastian Rieck, Tristan Yates, Christian Bock, Karsten Borgwardt, Guy Wolf, Nicholas Turk-Browne, Smita Krishnaswamy

    Abstract: Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challeng… ▽ More

    Submitted 22 October, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: Accepted at the Conference on Neural Information Processing Systems (NeurIPS) 2020; camera-ready version

  50. arXiv:2006.06885  [pdf, other

    cs.LG stat.ML

    Uncovering the Folding Landscape of RNA Secondary Structure with Deep Graph Embeddings

    Authors: Egbert Castro, Andrew Benz, Alexander Tong, Guy Wolf, Smita Krishnaswamy

    Abstract: Biomolecular graph analysis has recently gained much attention in the emerging field of geometric deep learning. Here we focus on organizing biomolecular graphs in ways that expose meaningful relations and variations between them. We propose a geometric scattering autoencoder (GSAE) network for learning such graph embeddings. Our embedding network first extracts rich graph features using the recen… ▽ More

    Submitted 28 March, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 10 pages, 10 figures, 4 tables, Presented at IEEE Big Data 2020