Skip to main content

Showing 1–50 of 57 results for author: Mroueh, Y

  1. arXiv:2406.06425  [pdf, other

    stat.ML cs.LG math.ST

    Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

    Authors: Gabriel Rioux, Apoorva Nitsure, Mattia Rigotti, Kristjan Greenewald, Youssef Mroueh

    Abstract: Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little has been done in the multivariate scenario, wherein an agent has to decide between different multivariate outcomes. By exploiting a characterization of multiva… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 27 pages, 2 figures

  2. arXiv:2406.05883  [pdf, other

    cs.LG cs.IT stat.ML

    Information Theoretic Guarantees For Policy Alignment In Large Language Models

    Authors: Youssef Mroueh

    Abstract: Policy alignment of large language models refers to constrained policy optimization, where the policy is optimized to maximize a reward while staying close to a reference policy with respect to an $f$-divergence such as the $\mathsf{KL}$ divergence. The best of $n$ alignment policy selects a sample from the reference policy that has the maximum reward among $n$ independent samples. For both cases… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2406.05882  [pdf, other

    cs.LG stat.ML

    Distributional Preference Alignment of LLMs via Optimal Transport

    Authors: Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

    Abstract: Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  4. arXiv:2405.04912  [pdf, other

    q-bio.BM cs.LG physics.chem-ph

    GP-MoLFormer: A Foundation Model For Molecular Generation

    Authors: Jerret Ross, Brian Belgodere, Samuel C. Hoffman, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das

    Abstract: Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFo… ▽ More

    Submitted 4 April, 2024; originally announced May 2024.

  5. arXiv:2310.07132  [pdf, other

    cs.LG math.ST q-fin.RM stat.ML

    Risk Aware Benchmarking of Large Language Models

    Authors: Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross

    Abstract: We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and math… ▽ More

    Submitted 9 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  6. arXiv:2304.10819  [pdf, other

    cs.LG cs.AI stat.ML

    Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

    Authors: Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti, Jerret Ross, Yair Schiff, Radhika Vedpathak, Richard A. Young

    Abstract: Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framew… ▽ More

    Submitted 9 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: submitted

  7. arXiv:2212.04580  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    Effective Dynamics of Generative Adversarial Networks

    Authors: Steven Durr, Youssef Mroueh, Yuhai Tu, Shenshen Wang

    Abstract: Generative adversarial networks (GANs) are a class of machine-learning models that use adversarial training to generate new samples with the same (potentially very complex) statistics as the training samples. One major form of training failure, known as mode collapse, involves the generator failing to reproduce the full diversity of modes in the target probability distribution. Here, we present an… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: 19 pages, 21 figures

  8. arXiv:2208.06665  [pdf, other

    cs.LG

    Cloud-Based Real-Time Molecular Screening Platform with MolFormer

    Authors: Brian Belgodere, Vijil Chenthamarakshan, Payel Das, Pierre Dognin, Toby Kurien, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young

    Abstract: With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The pla… ▽ More

    Submitted 13 August, 2022; originally announced August 2022.

    Comments: Paper accepted at ECML PKDD 2022 demo track

  9. arXiv:2205.13941  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Auditing Differential Privacy in High Dimensions with the Kernel Quantum Rényi Divergence

    Authors: Carles Domingo-Enrich, Youssef Mroueh

    Abstract: Differential privacy (DP) is the de facto standard for private data release and private machine learning. Auditing black-box DP algorithms and mechanisms to certify whether they satisfy a certain DP guarantee is challenging, especially in high dimension. We propose relaxations of differential privacy based on new divergences on probability distributions: the kernel Rényi divergence and its regular… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: Code at https://github.com/CDEnrich/kernel_renyi_dp

  10. arXiv:2205.13684  [pdf, other

    stat.ML cs.LG math.PR math.ST

    Learning with Stochastic Orders

    Authors: Carles Domingo-Enrich, Yair Schiff, Youssef Mroueh

    Abstract: Learning high-dimensional distributions is often done with explicit likelihood modeling or implicit modeling via minimizing integral probability metrics (IPMs). In this paper, we expand this learning paradigm to stochastic orders, namely, the convex or Choquet order between probability measures. Towards this end, exploiting the relation between convex orders and optimal transport, we introduce the… ▽ More

    Submitted 9 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Code available at https://github.com/yair-schiff/stochastic-orders-ICMN

  11. arXiv:2111.11328  [pdf, other

    cs.LG stat.ML

    Cycle Consistent Probability Divergences Across Different Spaces

    Authors: Zhengxin Zhang, Youssef Mroueh, Ziv Goldfeld, Bharath K. Sriperumbudur

    Abstract: Discrepancy measures between probability distributions are at the core of statistical inference and machine learning. In many applications, distributions of interest are supported on different spaces, and yet a meaningful correspondence between data points is desired. Motivated to explicitly encode consistent bidirectional maps into the discrepancy measure, this work proposes a novel unbalanced Mo… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

    Comments: 35 pages

  12. arXiv:2111.05841  [pdf, other

    cs.LG physics.app-ph

    Physics-enhanced deep surrogates for partial differential equations

    Authors: Raphaël Pestourie, Youssef Mroueh, Chris Rackauckas, Payel Das, Steven G. Johnson

    Abstract: Many physics and engineering applications demand Partial Differential Equations (PDE) property evaluations that are traditionally computed with resource-intensive high-fidelity numerical solvers. Data-driven surrogate models provide an efficient alternative but come with a significant cost of training. Emerging applications would benefit from surrogates with an improved accuracy-cost tradeoff, whi… ▽ More

    Submitted 14 December, 2023; v1 submitted 10 November, 2021; originally announced November 2021.

  13. arXiv:2110.03673  [pdf, other

    stat.ML cs.LG math.ST

    Tighter Sparse Approximation Bounds for ReLU Neural Networks

    Authors: Carles Domingo-Enrich, Youssef Mroueh

    Abstract: A well-known line of work (Barron, 1993; Breiman, 1993; Klusowski & Barron, 2018) provides bounds on the width $n$ of a ReLU two-layer neural network needed to approximate a function $f$ over the ball $\mathcal{B}_R(\mathbb{R}^d)$ up to error $ε$, when the Fourier based quantity $C_f = \frac{1}{(2π)^{d/2}} \int_{\mathbb{R}^d} \|ξ\|^2 |\hat{f}(ξ)| \ dξ$ is finite. More recently Ongie et al. (2019)… ▽ More

    Submitted 25 November, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  14. arXiv:2106.09553  [pdf, other

    cs.LG cs.CL q-bio.BM

    Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

    Authors: Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, Payel Das

    Abstract: Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based languag… ▽ More

    Submitted 14 December, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: NMI 2022

  15. arXiv:2106.05739  [pdf, other

    stat.ML cs.LG math.PR math.ST

    Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics

    Authors: Carles Domingo-Enrich, Youssef Mroueh

    Abstract: Several works in implicit and explicit generative modeling empirically observed that feature-learning discriminators outperform fixed-kernel discriminators in terms of the sample quality of the models. We provide separation results between probability metrics with fixed-kernel and feature-learning discriminators using the function classes $\mathcal{F}_2$ and $\mathcal{F}_1$ respectively, which wer… ▽ More

    Submitted 31 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  16. arXiv:2106.03314  [pdf, other

    cs.LG stat.ML

    Measuring Generalization with Optimal Transport

    Authors: Ching-Yao Chuang, Youssef Mroueh, Kristjan Greenewald, Antonio Torralba, Stefanie Jegelka

    Abstract: Understanding the generalization of deep neural networks is one of the most important tasks in deep learning. Although much progress has been made, theoretical error bounds still often behave disparately from empirical observations. In this work, we develop margin-based generalization bounds, where the margins are normalized with optimal transport costs between independent random subsets sampled f… ▽ More

    Submitted 7 November, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  17. arXiv:2106.00774  [pdf, other

    stat.ML cs.LG math.NA

    Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks

    Authors: David Alvarez-Melis, Yair Schiff, Youssef Mroueh

    Abstract: Gradient flows are a powerful tool for optimizing functionals in general metric spaces, including the space of probabilities endowed with the Wasserstein metric. A typical approach to solving this optimization problem relies on its connection to the dynamic formulation of optimal transport and the celebrated Jordan-Kinderlehrer-Otto (JKO) scheme. However, this formulation involves optimization ove… ▽ More

    Submitted 30 November, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  18. arXiv:2103.06503  [pdf, other

    cs.LG cs.CY stat.ML

    Fair Mixup: Fairness via Interpolation

    Authors: Ching-Yao Chuang, Youssef Mroueh

    Abstract: Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predictions between the groups. Nevertheless, even though the constraints are satisfied during training, they might not generalize at evaluation time. To improve the generalizability of fair classifiers, we propose fair mixup, a new data augmentation strategy for imposing the fairness constraint.… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Journal ref: ICLR 2021

  19. arXiv:2012.11696  [pdf, other

    cs.CV cs.LG

    Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

    Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young, Brian Belgodere

    Abstract: Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on the… ▽ More

    Submitted 18 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: In submission to JAIR. Copyright may be transferred without notice, after which this version may no longer be accessible

  20. arXiv:2012.11691  [pdf, other

    cs.CV cs.LG

    Alleviating Noisy Data in Image Captioning with Cooperative Distillation

    Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff

    Abstract: Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

    Comments: CVPR 2020 VizWiz Challenge

  21. arXiv:2011.02402  [pdf, other

    cs.LG stat.ML

    On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient Flow

    Authors: Youssef Mroueh, Truyen Nguyen

    Abstract: We consider the maximum mean discrepancy ($\mathrm{MMD}$) GAN problem and propose a parametric kernelized gradient flow that mimics the min-max game in gradient regularized $\mathrm{MMD}$ GAN. We show that this flow provides a descent direction minimizing the $\mathrm{MMD}$ on a statistical manifold of probability distributions. We then derive an explicit condition which ensures that gradient desc… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

  22. arXiv:2011.01843  [pdf, other

    cs.LG cs.AI

    Tabular Transformers for Modeling Multivariate Time Series

    Authors: Inkit Padhi, Yair Schiff, Igor Melnyk, Mattia Rigotti, Youssef Mroueh, Pierre Dognin, Jerret Ross, Ravi Nair, Erik Altman

    Abstract: Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learn… ▽ More

    Submitted 11 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to ICASSP, 2021; https://github.com/IBM/TabFormer

  23. arXiv:2009.14148  [pdf, other

    cs.LG stat.ML

    Unbalanced Sobolev Descent

    Authors: Youssef Mroueh, Mattia Rigotti

    Abstract: We introduce Unbalanced Sobolev Descent (USD), a particle descent algorithm for transporting a high dimensional source distribution to a target distribution that does not necessarily have the same mass. We define the Sobolev-Fisher discrepancy between distributions and show that it relates to advection-reaction transport equations and the Wasserstein-Fisher-Rao metric between distributions. USD tr… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2020

  24. arXiv:2008.12649  [pdf, other

    cs.LG physics.app-ph

    Active learning of deep surrogates for PDEs: Application to metasurface design

    Authors: Raphaël Pestourie, Youssef Mroueh, Thanh V. Nguyen, Payel Das, Steven G. Johnson

    Abstract: Surrogate models for partial-differential equations are widely used in the design of meta-materials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonic-device models, we find that this training becomes especially challenging as design regions grow larger than… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: submitted to npj

    Journal ref: npj Computational Materials (2020)6:164

  25. arXiv:2007.03074  [pdf, other

    stat.ML cs.CV cs.LG

    Kernel Stein Generative Modeling

    Authors: Wei-Cheng Chang, Chun-Liang Li, Youssef Mroueh, Yiming Yang

    Abstract: We are interested in gradient-based Explicit Generative Modeling where samples can be derived from iterative gradient updates based on an estimate of the score function of the data distribution. Recent advances in Stochastic Gradient Langevin Dynamics (SGLD) demonstrates impressive results with energy-based models on high-dimensional and complex data distributions. Stein Variational Gradient Desce… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  26. arXiv:2006.11166  [pdf, other

    stat.ML cs.LG

    Fast Mixing of Multi-Scale Langevin Dynamics under the Manifold Hypothesis

    Authors: Adam Block, Youssef Mroueh, Alexander Rakhlin, Jerret Ross

    Abstract: Recently, the task of image generation has attracted much attention. In particular, the recent empirical successes of the Markov Chain Monte Carlo (MCMC) technique of Langevin Dynamics have prompted a number of theoretical advances; despite this, several outstanding problems remain. First, the Langevin Dynamics is run in very high dimension on a nonconvex landscape; in the worst case, due to the N… ▽ More

    Submitted 22 June, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

  27. arXiv:2005.03588  [pdf, other

    cs.CL cs.LG

    Learning Implicit Text Generation via Feature Matching

    Authors: Inkit Padhi, Pierre Dognin, Ke Bai, Cicero Nogueira dos Santos, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das

    Abstract: Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks. In this paper, we present new GFMN formulations that are effective for sequential data. Our experimental results show the effectiveness of the proposed method, SeqGFMN, for three distinct generation tasks in English… ▽ More

    Submitted 8 May, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  28. arXiv:2002.01119  [pdf, other

    cs.LG cs.DC stat.ML

    Improving Efficiency in Large-Scale Decentralized Distributed Training

    Authors: Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny

    Abstract: Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks. One drawback of (A)D-PSGD is that the spectral gap of the mixing matrix decreases when the number of learners in the system increases, which hampers convergence. In this p… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Journal ref: 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP'2020) Oral

  29. arXiv:2002.00107  [pdf, ps, other

    stat.ML cs.LG math.PR

    Generative Modeling with Denoising Auto-Encoders and Langevin Sampling

    Authors: Adam Block, Youssef Mroueh, Alexander Rakhlin

    Abstract: We study convergence of a generative modeling method that first estimates the score function of the distribution using Denoising Auto-Encoders (DAE) or Denoising Score Matching (DSM) and then employs Langevin diffusion for sampling. We show that both DAE and DSM provide estimates of the score of the Gaussian smoothed population density, allowing us to apply the machinery of Empirical Processes.… ▽ More

    Submitted 11 October, 2022; v1 submitted 31 January, 2020; originally announced February 2020.

  30. arXiv:1912.11940  [pdf, other

    math.OC cs.LG

    Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

    Authors: Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

    Abstract: Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical an… ▽ More

    Submitted 24 December, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

    Comments: Accepted by ICLR 2020

  31. arXiv:1911.02536  [pdf, other

    cs.LG stat.ML

    Unsupervised Hierarchy Matching with Optimal Transport over Hyperbolic Spaces

    Authors: David Alvarez-Melis, Youssef Mroueh, Tommi S. Jaakkola

    Abstract: This paper focuses on the problem of unsupervised alignment of hierarchical data such as ontologies or lexical databases. This is a problem that appears across areas, from natural language processing to bioinformatics, and is typically solved by appeal to outside knowledge bases and label-textual similarity. In contrast, we approach the problem from a purely geometric perspective: given only a vec… ▽ More

    Submitted 7 May, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: AISTATS 2020

  32. arXiv:1910.14212  [pdf, other

    cs.LG stat.ML

    Sobolev Independence Criterion

    Authors: Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Dos Santos

    Abstract: We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y . SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection. SIC can be seen as a gradient regularized Integral Probability Metric (IPM) between the joint distribution of the two random var… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019

  33. arXiv:1910.12999  [pdf, other

    math.OC cs.LG

    A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

    Authors: Mingrui Liu, Wei Zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das

    Abstract: Generative Adversarial Networks (GANs) are a powerful class of generative models in the deep learning community. Current practice on large-scale GAN training utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs… ▽ More

    Submitted 19 October, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

    Comments: Accepted by NeurIPS 2020

  34. arXiv:1905.12828  [pdf, other

    cs.LG cs.CV stat.ML

    Wasserstein Style Transfer

    Authors: Youssef Mroueh

    Abstract: We propose Gaussian optimal transport for Image style transfer in an Encoder/Decoder framework. Optimal transport for Gaussian measures has closed forms Monge mappings from source to target distributions. Moreover interpolates between a content and a style image can be seen as geodesics in the Wasserstein Geometry. Using this insight, we show how to mix different target styles , using Wasserstein… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  35. arXiv:1904.02762  [pdf, other

    cs.CV cs.LG

    Learning Implicit Generative Models by Matching Perceptual Features

    Authors: Cicero Nogueira dos Santos, Youssef Mroueh, Inkit Padhi, Pierre Dognin

    Abstract: Perceptual features (PFs) have been used with great success in tasks such as transfer learning, style transfer, and super-resolution. However, the efficacy of PFs as key source of information for learning generative models is not well studied. We investigate here the use of PFs in the context of learning implicit generative models through moment matching (MM). More specifically, we propose a new e… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: 16 pages

    Journal ref: ICCV 2019

  36. arXiv:1902.10214  [pdf, other

    stat.ML cs.AI cs.LG

    Implicit Kernel Learning

    Authors: Chun-Liang Li, Wei-Cheng Chang, Youssef Mroueh, Yiming Yang, Barnabás Póczos

    Abstract: Kernels are powerful and versatile tools in machine learning and statistics. Although the notion of universal kernels and characteristic kernels has been studied, kernel selection still greatly influences the empirical performance. While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative mode… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: In the Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)

  37. arXiv:1902.04999  [pdf, other

    cs.LG stat.ML

    Wasserstein Barycenter Model Ensembling

    Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero Dos Santos, Tom Sercu

    Abstract: In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement b… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

    Comments: ICLR 2019

  38. arXiv:1805.12062  [pdf, other

    cs.LG stat.ML

    Sobolev Descent

    Authors: Youssef Mroueh, Tom Sercu, Anant Raj

    Abstract: We study a simplification of GAN training: the problem of transporting particles from a source to a target distribution. Starting from the Sobolev GAN critic, part of the gradient regularized GAN family, we show a strong relation with Optimal Transport (OT). Specifically with the less popular dynamic formulation of OT that finds a path of distributions from source to target minimizing a ``kinetic… ▽ More

    Submitted 5 August, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: AISTATS 2019

  39. arXiv:1805.06441  [pdf, ps, other

    cs.LG stat.ML

    Regularized Finite Dimensional Kernel Sobolev Discrepancy

    Authors: Youssef Mroueh

    Abstract: We show in this note that the Sobolev Discrepancy introduced in Mroueh et al in the context of generative adversarial networks, is actually the weighted negative Sobolev norm $||.||_{\dot{H}^{-1}(ν_q)}$, that is known to linearize the Wasserstein $W_2$ distance and plays a fundamental role in the dynamic formulation of optimal transport of Benamou and Brenier. Given a Kernel with finite dimensiona… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

  40. arXiv:1805.00063  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Adversarial Semantic Alignment for Improved Image Captions

    Authors: Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu

    Abstract: In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient… ▽ More

    Submitted 6 June, 2019; v1 submitted 30 April, 2018; originally announced May 2018.

    Comments: Authors Equal Contribution, CVPR 2019

  41. arXiv:1712.02505  [pdf, other

    cs.LG

    Semi-Supervised Learning with IPM-based GANs: an Empirical Study

    Authors: Tom Sercu, Youssef Mroueh

    Abstract: We present an empirical investigation of a recent class of Generative Adversarial Networks (GANs) using Integral Probability Metrics (IPM) and their performance for semi-supervised learning. IPM-based GANs like Wasserstein GAN, Fisher GAN and Sobolev GAN have desirable properties in terms of theoretical understanding, training stability, and a meaningful loss. In this work we investigate how the d… ▽ More

    Submitted 7 December, 2017; originally announced December 2017.

    Comments: Appeared at NIPS 2017 Workshop: Deep Learning: Bridging Theory and Practice

  42. arXiv:1711.04894  [pdf, other

    cs.LG stat.ML

    Sobolev GAN

    Authors: Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, Yu Cheng

    Abstract: We propose a new Integral Probability Metric (IPM) between distributions: the Sobolev IPM. The Sobolev IPM compares the mean discrepancy of two distributions for functions (critic) restricted to a Sobolev ball defined with respect to a dominant measure $μ$. We show that the Sobolev IPM compares two distributions in high dimensions based on weighted conditional Cumulative Distribution Functions (CD… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

  43. arXiv:1705.09675  [pdf, other

    cs.LG stat.ML

    Fisher GAN

    Authors: Youssef Mroueh, Tom Sercu

    Abstract: Generative Adversarial Networks (GANs) are powerful models for learning complex distributions. Stable training of GANs has been addressed in many recent works which explore different metrics between distributions. In this paper we introduce Fisher GAN which fits within the Integral Probability Metrics (IPM) framework for training GANs. Fisher GAN defines a critic with a data dependent constraint o… ▽ More

    Submitted 3 November, 2017; v1 submitted 26 May, 2017; originally announced May 2017.

    Comments: Published at NIPS 2017. v2: added inception score table & plot update, relation to f-gan, illustration (Figure 1). v3: added strong SSL results for critic without batch normalization

  44. arXiv:1702.08398  [pdf, other

    cs.LG stat.ML

    McGan: Mean and Covariance Feature Matching GAN

    Authors: Youssef Mroueh, Tom Sercu, Vaibhava Goel

    Abstract: We introduce new families of Integral Probability Metrics (IPM) for training Generative Adversarial Networks (GAN). Our IPMs are based on matching statistics of distributions embedded in a finite dimensional feature space. Mean and covariance feature matching IPMs allow for stable training of GANs, which we will call McGan. McGan minimizes a meaningful loss between distributions.

    Submitted 8 June, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: 15 pages; published at ICML 2017

  45. arXiv:1612.01988  [pdf, other

    cs.LG stat.ML

    Local Group Invariant Representations via Orbit Embeddings

    Authors: Anant Raj, Abhishek Kumar, Youssef Mroueh, P. Thomas Fletcher, Bernhard Schölkopf

    Abstract: Invariance to nuisance transformations is one of the desirable properties of effective representations. We consider transformations that form a \emph{group} and propose an approach based on kernel methods to derive local group invariant representations. Locality is achieved by defining a suitable probability distribution over the group which in turn induces distributions in the input feature space… ▽ More

    Submitted 24 May, 2017; v1 submitted 6 December, 2016; originally announced December 2016.

    Comments: AISTATS 2017 accepted version including appendix, 18 pages, 1 figure

  46. arXiv:1612.00563  [pdf, other

    cs.LG cs.AI cs.CV

    Self-critical Sequence Training for Image Captioning

    Authors: Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel

    Abstract: Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, signifi… ▽ More

    Submitted 15 November, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: CVPR 2017 + additional analysis + fixed baseline results, 16 pages

  47. arXiv:1610.07686  [pdf, ps, other

    cs.LG

    Co-Occuring Directions Sketching for Approximate Matrix Multiply

    Authors: Youssef Mroueh, Etienne Marcheret, Vaibhava Goel

    Abstract: We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model. We show that co-occuring directions achieves a better error bound for AMM than other randomized and deterministic approaches for AMM. Co-occurring directions gives a $1 + ε$ -approximation of the optimal low rank approximation of a matrix product. Empirically our… ▽ More

    Submitted 24 October, 2016; originally announced October 2016.

  48. arXiv:1511.06267  [pdf, other

    cs.LG

    Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Image & Text Retrieval

    Authors: Youssef Mroueh, Etienne Marcheret, Vaibhava Goel

    Abstract: Joint modeling of language and vision has been drawing increasing interest. A multimodal data representation allowing for bidirectional retrieval of images by sentences and vice versa is a key aspect. In this paper we present three contributions in canonical correlation analysis (CCA) based multimodal retrieval. Firstly, we show that an asymmetric weighting of the canonical weights, while achievin… ▽ More

    Submitted 5 December, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under Review CVPR 2017

  49. arXiv:1506.03705  [pdf, other

    cs.LG stat.ML

    Random Maxout Features

    Authors: Youssef Mroueh, Steven Rennie, Vaibhava Goel

    Abstract: In this paper, we propose and study random maxout features, which are constructed by first projecting the input data onto sets of randomly generated vectors with Gaussian elements, and then outputing the maximum projection value for each set. We show that the resulting random feature map, when used in conjunction with linear models, allows for the locally linear estimation of the function of inter… ▽ More

    Submitted 12 June, 2015; v1 submitted 11 June, 2015; originally announced June 2015.

  50. arXiv:1506.02544  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Learning with Group Invariant Features: A Kernel Perspective

    Authors: Youssef Mroueh, Stephen Voinea, Tomaso Poggio

    Abstract: We analyze in this paper a random feature map based on a theory of invariance I-theory introduced recently. More specifically, a group invariant signal signature is obtained through cumulative distributions of group transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar integration kernel that i… ▽ More

    Submitted 4 December, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: NIPS 2015