Skip to main content

Showing 1–50 of 145 results for author: Cevher, V

  1. arXiv:2407.09111  [pdf, other

    cs.AI cs.LG

    Inference Optimization of Foundation Models on AI Accelerators

    Authors: Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis

    Abstract: Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Tutorial published at KDD 2024. Camera-ready version

  2. arXiv:2406.18781  [pdf, other

    math.OC cs.DM cs.LG

    Learning to Remove Cuts in Integer Linear Programming

    Authors: Pol Puigdemont, Stratis Skoulakis, Grigorios Chrysos, Volkan Cevher

    Abstract: Cutting plane methods are a fundamental approach for solving integer linear programs (ILPs). In each iteration of such methods, additional linear constraints (cuts) are introduced to the constraint set with the aim of excluding the previous fractional optimal solution while not affecting the optimal integer solution. In this work, we explore a novel approach within cutting plane methods: instead o… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning

    MSC Class: 68R01

  3. arXiv:2406.16906  [pdf, other

    eess.SP cs.AI cs.LG

    REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates

    Authors: Arshia Afzal, Grigorios Chrysos, Volkan Cevher, Mahsa Shoaran

    Abstract: EEG-based seizure detection models face challenges in terms of inference speed and memory efficiency, limiting their real-time implementation in clinical devices. This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis in applications such as epileptic seizure detection. By leveraging a combination of graph neural networks and recurrent st… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted paper at International Confrence on Machine Learning (ICML 2024). Visit our website: https://arshiaafzal.github.io/REST/

  4. arXiv:2406.04731  [pdf, other

    math.OC cs.LG

    Efficient Continual Finite-Sum Minimization

    Authors: Ioannis Mavrothalassitis, Stratis Skoulakis, Leello Tadesse Dadi, Volkan Cevher

    Abstract: Given a sequence of functions $f_1,\ldots,f_n$ with $f_i:\mathcal{D}\mapsto \mathbb{R}$, finite-sum minimization seeks a point ${x}^\star \in \mathcal{D}$ minimizing $\sum_{j=1}^n f_j(x)/n$. In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points ${x}_1^\star,\ldots,{x}_n^\star \in \mathcal{D}$ such that… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted in ICLR 2024, 35 pages

  5. arXiv:2406.03171  [pdf, other

    stat.ML cs.LG

    High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

    Authors: Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher

    Abstract: This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  6. arXiv:2405.19201  [pdf, other

    cs.CV cs.AI cs.NE

    Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations

    Authors: Justin Deschenaux, Igor Krawczuk, Grigorios Chrysos, Volkan Cevher

    Abstract: Denoising Diffusion Probabilistic Models (DDPMs) exhibit remarkable capabilities in image generation, with studies suggesting that they can generalize by composing latent factors learned from the training data. In this work, we go further and study DDPMs trained on strictly separate subsets of the data distribution with large gaps on the support of the latent factors. We show that such a model can… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.17050  [pdf, other

    cs.LG

    HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity

    Authors: Sonny Achten, Francesco Tonin, Volkan Cevher, Johan A. K. Suykens

    Abstract: Clustering nodes in heterophilous graphs presents unique challenges due to the asymmetric relationships often overlooked by traditional methods, which moreover assume that good clustering corresponds to high intra-cluster and low inter-cluster connectivity. To address these issues, we introduce HeNCler - a novel approach for Heterophilous Node Clustering. Our method begins by defining a weighted k… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  8. arXiv:2405.15509  [pdf, other

    math.OC cs.LG

    Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces

    Authors: Angeliki Kamoutsi, Peter Schmitt-Förster, Tobias Sutter, Volkan Cevher, John Lygeros

    Abstract: This work studies discrete-time discounted Markov decision processes with continuous state and action spaces and addresses the inverse problem of inferring a cost function from observed optimal behavior. We first consider the case in which we have access to the entire expert policy and characterize the set of solutions to the inverse problem by using occupation measures, linear duality, and comple… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 29 pages, 4 figures

  9. arXiv:2405.04346  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Revisiting character-level adversarial attacks

    Authors: Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher

    Abstract: Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popula… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted in ICML 2024

  10. arXiv:2405.02181  [pdf, other

    cs.LG

    Imitation Learning in Discounted Linear MDPs without exploration assumptions

    Authors: Luca Viano, Stratis Skoulakis, Volkan Cevher

    Abstract: We present a new algorithm for imitation learning in infinite horizon linear MDPs dubbed ILARL which greatly improves the bound on the number of trajectories that the learner needs to sample from the environment. In particular, we remove exploration assumptions required in previous works and we improve the dependence on the desired accuracy $ε$ from $\mathcal{O}\br{ε^{-5}}$ to… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  11. arXiv:2404.18769  [pdf, other

    stat.ML cs.LG

    Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

    Authors: Fanghui Liu, Leello Dadi, Volkan Cevher

    Abstract: Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron (Bach, 2017). In this paper, we study a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron… ▽ More

    Submitted 25 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by JMLR, update on the dimension dependence in sample complexity, see Prop. 5

  12. arXiv:2403.13134  [pdf, other

    cs.LG cs.AI stat.ML

    Robust NAS under adversarial training: benchmark, theory, and beyond

    Authors: Yongtao Wu, Fanghui Liu, Carl-Johann Simon-Gabriel, Grigorios G Chrysos, Volkan Cevher

    Abstract: Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robust architectures, especially when adversarial training is considered. In this work, we aim to address these two challenges, making twofold contri… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  13. arXiv:2403.09889  [pdf, other

    cs.LG

    Generalization of Scaled Deep ResNets in the Mean-Field Regime

    Authors: Yihang Chen, Fanghui Liu, Yiping Lu, Grigorios G. Chrysos, Volkan Cevher

    Abstract: Despite the widespread empirical success of ResNet, the generalization properties of deep ResNet are rarely explored beyond the lazy training regime. In this work, we investigate \emph{scaled} ResNet in the limit of infinitely deep and wide neural networks, of which the gradient flow is described by a partial differential equation in the large-neural network limit, i.e., the \emph{mean-field} regi… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 (Spotlight)

  14. arXiv:2402.17509  [pdf, other

    cs.CL

    Extreme Miscalibration and the Illusion of Adversarial Robustness

    Authors: Vyas Raina, Samson Tan, Volkan Cevher, Aditya Rawal, Sheng Zha, George Karypis

    Abstract: Deep learning-based Natural Language Processing (NLP) models are vulnerable to adversarial attacks, where small perturbations can cause a model to misclassify. Adversarial Training (AT) is often used to increase model robustness. However, we have discovered an intriguing phenomenon: deliberately or accidentally miscalibrating models masks gradients in a way that interferes with adversarial attack… ▽ More

    Submitted 30 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  15. arXiv:2402.15776  [pdf, other

    cs.LG stat.ML

    Truly No-Regret Learning in Constrained MDPs

    Authors: Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He

    Abstract: Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in a… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  16. arXiv:2402.09177  [pdf, other

    cs.LG cs.AI cs.CL

    Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks

    Authors: Yixin Cheng, Markos Georgopoulos, Volkan Cevher, Grigorios G. Chrysos

    Abstract: Large Language Models (LLMs) are susceptible to Jailbreaking attacks, which aim to extract harmful information by subtly modifying the attack query. As defense mechanisms evolve, directly obtaining harmful information becomes increasingly challenging for Jailbreaking attacks. In this work, inspired by human practices of indirect context to elicit harmful information, we focus on a new attack form… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 29 pages

  17. arXiv:2401.17992  [pdf, other

    cs.CV cs.LG

    Multilinear Operator Networks

    Authors: Yixin Cheng, Grigorios G. Chrysos, Markos Georgopoulos, Volkan Cevher

    Abstract: Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other hand, Polynomial Networks is a class of models that does not require activation functions, but have yet to perform on par with modern architectures. In this work, we aim close this gap and propose MONet… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: International Conference on Learning Representations Poster(2024)

  18. arXiv:2401.11618  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Efficient local linearity regularization to overcome catastrophic overfitting

    Authors: Elias Abad Rocamora, Fanghui Liu, Grigorios G. Chrysos, Pablo M. Olmos, Volkan Cevher

    Abstract: Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce… ▽ More

    Submitted 28 February, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted in ICLR 2024

  19. arXiv:2401.09628  [pdf, ps, other

    cs.GT

    Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games

    Authors: Leello Dadi, Ioannis Panageas, Stratis Skoulakis, Luca Viano, Volkan Cevher

    Abstract: We introduce an online learning algorithm in the bandit feedback model that, once adopted by all agents of a congestion game, results in game-dynamics that converge to an $ε$-approximate Nash Equilibrium in a polynomial number of rounds with respect to $1/ε$, the number of players and the number of available resources. The proposed algorithm also guarantees sublinear regret to any agent adopting i… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  20. arXiv:2401.08893  [pdf, other

    cs.LG math.OC

    MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

    Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

    Abstract: Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  21. arXiv:2401.03058  [pdf, other

    math.OC cs.LG stat.ML

    Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

    Authors: Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

    Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 27 pages, 2 figures

  22. arXiv:2311.01575  [pdf, other

    cs.LG

    On the Convergence of Encoder-only Shallow Transformers

    Authors: Yongtao Wu, Fanghui Liu, Grigorios G Chrysos, Volkan Cevher

    Abstract: In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers under a realistic setting from the perspective of architectures, initialization, and scaling under a finite width regime. The difficulty lies in how to tackle the softmax in self-attention mechanism, the core ingredient of Transformer. In particular, we diagnose the scaling scheme, carefully tackle th… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  23. arXiv:2310.20579  [pdf, other

    stat.ML cs.CR cs.LG

    Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks

    Authors: Jiayuan Ye, Zhenyu Zhu, Fanghui Liu, Reza Shokri, Volkan Cevher

    Abstract: We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  24. arXiv:2310.18672  [pdf, other

    cs.LG cs.DM

    Maximum Independent Set: Self-Training through Dynamic Programming

    Authors: Lorenzo Brusca, Lars C. P. M. Quaedvlieg, Stratis Skoulakis, Grigorios G Chrysos, Volkan Cevher

    Abstract: This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted in NeurIPS 2023

  25. arXiv:2310.18123  [pdf, ps, other

    cs.LG stat.ML

    Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling

    Authors: Zhenyu Zhu, Francesco Locatello, Volkan Cevher

    Abstract: This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery me… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted in NeurIPS 2023

  26. arXiv:2310.13459  [pdf, other

    cs.LG math.OC

    Stable Nonconvex-Nonconcave Training via Linear Interpolation

    Authors: Thomas Pethick, Wanyun Xie, Volkan Cevher

    Abstract: This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear interpolation can help by leveraging the theory of nonexpansive operators. We construct a new optimization scheme cal… ▽ More

    Submitted 14 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  27. arXiv:2310.11867  [pdf, other

    cs.CV cs.CY cs.LG

    Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

    Authors: Junaid Ali, Matthaeus Kleindessner, Florian Wenzel, Kailash Budhathoki, Volkan Cevher, Chris Russell

    Abstract: We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, i… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted at AIES'23

  28. arXiv:2310.02387  [pdf, other

    cs.GT

    Exponential Lower Bounds for Fictitious Play in Potential Games

    Authors: Ioannis Panageas, Nikolas Patris, Stratis Skoulakis, Volkan Cevher

    Abstract: Fictitious Play (FP) is a simple and natural dynamic for repeated play with many applications in game theory and multi-agent reinforcement learning. It was introduced by Brown (1949,1951) and its convergence properties for two-player zero-sum games was established later by Robinson (1951). Potential games Monderer and Shapley (1996b) is another class of games which exhibit the FP property (Mondere… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted to appear in NeurIPS 2023

  29. arXiv:2308.09187  [pdf, other

    cs.LG cs.DC math.OC

    Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

    Authors: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher

    Abstract: We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors. This setting includes a broad range of important problems from distributed convex minimization to min-max and games. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-e… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: International Conference on Learning Representations (ICLR 2023)

  30. arXiv:2306.15543  [pdf, other

    cs.GT

    Semi Bandit Dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees

    Authors: Ioannis Panageas, Stratis Skoulakis, Luca Viano, Xiao Wang, Volkan Cevher

    Abstract: In this work, we introduce a new variant of online gradient descent, which provably converges to Nash Equilibria and simultaneously attains sublinear regret for the class of congestion games in the semi-bandit feedback setting. Our proposed method admits convergence rates depending only polynomially on the number of players and the number of facilities, but not on the size of the action set, which… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  31. arXiv:2306.11035  [pdf, other

    cs.LG math.OC stat.ML

    Adversarial Training Should Be Cast as a Non-Zero-Sum Game

    Authors: Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher

    Abstract: One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior… ▽ More

    Submitted 18 March, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

  32. arXiv:2306.05325  [pdf, other

    cs.LG

    Federated Learning under Covariate Shifts with Generalization Guarantees

    Authors: Ali Ramezani-Kebrya, Fanghui Liu, Thomas Pethick, Grigorios Chrysos, Volkan Cevher

    Abstract: This paper addresses intra-client and inter-client covariate shifts in federated learning (FL) with a focus on the overall generalization performance. To handle covariate shifts, we formulate a new global model training paradigm and propose Federated Importance-Weighted Empirical Risk Minimization (FTW-ERM) along with improving density ratio matching methods without requiring perfect knowledge of… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Published in Transactions on Machine Learning Research (TMLR)

  33. arXiv:2305.19377  [pdf, other

    cs.LG

    Benign Overfitting in Deep Neural Networks under Lazy Training

    Authors: Zhenyu Zhu, Fanghui Liu, Grigorios G Chrysos, Francesco Locatello, Volkan Cevher

    Abstract: This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining (nearly) zero-training error under the lazy training regime. For this purpose, we unify three interrelated concepts of overparameterization, benign overfitting,… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted in ICML 2023

  34. arXiv:2304.12886  [pdf, ps, other

    stat.ML cs.LG

    What can online reinforcement learning with function approximation benefit from general coverage conditions?

    Authors: Fanghui Liu, Luca Viano, Volkan Cevher

    Abstract: In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility… ▽ More

    Submitted 31 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by ICML 2023

  35. arXiv:2303.13896  [pdf, other

    cs.CV cs.LG

    Regularization of polynomial networks for image recognition

    Authors: Grigorios G Chrysos, Bohan Wang, Jiankang Deng, Volkan Cevher

    Abstract: Deep Neural Networks (DNNs) have obtained impressive performance across tasks, however they still remain as black boxes, e.g., hard to theoretically analyze. At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close th… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR'23

  36. arXiv:2302.09831  [pdf, other

    math.OC cs.LG

    Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

    Authors: Thomas Pethick, Puya Latafat, Panagiotis Patrinos, Olivier Fercoq, Volkan Cevher

    Abstract: This paper introduces a new extragradient-type algorithm for a class of nonconvex-nonconcave minimax problems. It is well-known that finding a local solution for general minimax problems is computationally intractable. This observation has recently motivated the study of structures sufficient for convergence of first order methods in the more general setting of variational inequalities when the so… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: Code accessible at: https://github.com/LIONS-EPFL/weak-minty-code/

  37. arXiv:2302.09029  [pdf, other

    math.OC cs.LG

    Solving stochastic weak Minty variational inequalities without increasing batch size

    Authors: Thomas Pethick, Olivier Fercoq, Puya Latafat, Panagiotis Patrinos, Volkan Cevher

    Abstract: This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty variational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per ite… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Code accessible at: https://github.com/LIONS-EPFL/stochastic-weak-minty-code

  38. arXiv:2302.08872  [pdf, other

    cs.LG

    Revisiting adversarial training for the worst-performing class

    Authors: Thomas Pethick, Grigorios G. Chrysos, Volkan Cevher

    Abstract: Despite progress in adversarial training (AT), there is a substantial gap between the top-performing and worst-performing classes in many datasets. For example, on CIFAR10, the accuracies for the best and worst classes are 74% and 23%, respectively. We argue that this gap can be reduced by explicitly optimizing for the worst-performing class, resulting in a min-max-max optimization formulation. Ou… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Code accessible at: https://github.com/LIONS-EPFL/class-focused-online-learning-code

  39. arXiv:2301.03931  [pdf, ps, other

    cs.GT cs.LG math.OC

    Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps

    Authors: Volkan Cevher, Georgios Piliouras, Ryann Sim, Stratis Skoulakis

    Abstract: In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accur… ▽ More

    Submitted 16 January, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: To appear in SOSA23

  40. arXiv:2211.01851  [pdf, other

    math.OC cs.LG

    Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

    Authors: Ali Kavis, Stratis Skoulakis, Kimon Antonakopoulos, Leello Tadesse Dadi, Volkan Cevher

    Abstract: We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $L$-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan & Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our know… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: 23 pages, 2 figures, accepted at NeurIPS 2022

  41. arXiv:2211.01832  [pdf, other

    math.OC cs.LG stat.ML

    Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods

    Authors: Kimon Antonakopoulos, Ali Kavis, Volkan Cevher

    Abstract: This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(σ/ \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $σ^2$, and improves its convergence to $O( 1 / T^3)$ with deterministic oracles, where $T$ is the number of iterations. Our method also interpolates these rates without knowing… ▽ More

    Submitted 12 December, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 32 pages, 4 figures, accepted at NeurIPS 2022

  42. arXiv:2209.14734  [pdf, other

    cs.LG

    DiGress: Discrete Denoising diffusion for graph generation

    Authors: Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, Pascal Frossard

    Abstract: This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model utilizes a discrete diffusion process that progressively edits graphs with noise, through the process of adding or removing edges and changing the categories. A graph transformer network is trained to revert this process, simplifying the problem of distribut… ▽ More

    Submitted 23 May, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 22 pages. Published as a conference paper at ICLR 2023

    Journal ref: International Conference on Learning Representations (ICLR 2023)

  43. arXiv:2209.10974  [pdf, other

    cs.LG

    Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning

    Authors: Paul Rolland, Luca Viano, Norman Schuerhoff, Boris Nikolov, Volkan Cevher

    Abstract: While Reinforcement Learning (RL) aims to train an agent from a reward function in a given environment, Inverse Reinforcement Learning (IRL) seeks to recover the reward function from observing an expert's behavior. It is well known that, in general, various reward functions can lead to the same optimal policy, and hence, IRL is ill-defined. However, (Cao et al., 2021) showed that, if we observe tw… ▽ More

    Submitted 13 October, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

  44. arXiv:2209.10968  [pdf, other

    cs.LG

    Proximal Point Imitation Learning

    Authors: Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher

    Abstract: This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offl… ▽ More

    Submitted 30 May, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

  45. arXiv:2209.07736  [pdf, other

    cs.LG cs.AI

    Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study

    Authors: Yongtao Wu, Zhenyu Zhu, Fanghui Liu, Grigorios G Chrysos, Volkan Cevher

    Abstract: Neural tangent kernel (NTK) is a powerful tool to analyze training dynamics of neural networks and their generalization bounds. The study on NTK has been devoted to typical neural network architectures, but it is incomplete for neural networks with Hadamard products (NNs-Hp), e.g., StyleGAN and polynomial neural networks (PNNs). In this work, we derive the finite-width NTK formulation for a specia… ▽ More

    Submitted 16 October, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

  46. arXiv:2209.07376  [pdf, ps, other

    cs.LG

    Understanding Deep Neural Function Approximation in Reinforcement Learning via $ε$-Greedy Exploration

    Authors: Fanghui Liu, Luca Viano, Volkan Cevher

    Abstract: This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) with the $ε$-greedy exploration under the online setting. This problem setting is motivated by the successful deep Q-networks (DQN) framework that falls in this regime. In this work, we provide an initial attempt on theoretical understanding deep RL from the perspective of function class an… ▽ More

    Submitted 16 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted by NeurIPS22

  47. arXiv:2209.07263  [pdf, other

    cs.LG cs.AI

    Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

    Authors: Zhenyu Zhu, Fanghui Liu, Grigorios G Chrysos, Volkan Cevher

    Abstract: We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, wh… ▽ More

    Submitted 9 February, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted in NeurIPS 2022

  48. arXiv:2209.07238  [pdf, other

    cs.LG cs.AI

    Generalization Properties of NAS under Activation and Skip Connection Search

    Authors: Zhenyu Zhu, Fanghui Liu, Grigorios G Chrysos, Volkan Cevher

    Abstract: Neural Architecture Search (NAS) has fostered the automatic discovery of state-of-the-art neural architectures. Despite the progress achieved with NAS, so far there is little attention to theoretical guarantees on NAS. In this work, we study the generalization properties of NAS under a unifying framework enabling (deep) layer skip connection search and activation function search. To this end, we d… ▽ More

    Submitted 1 November, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted in NeurIPS 2022

  49. arXiv:2209.07235  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Sound and Complete Verification of Polynomial Networks

    Authors: Elias Abad Rocamora, Mehmet Fatih Sahin, Fanghui Liu, Grigorios G Chrysos, Volkan Cevher

    Abstract: Polynomial Networks (PNs) have demonstrated promising performance on face and image recognition recently. However, robustness of PNs is unclear and thus obtaining certificates becomes imperative for enabling their adoption in real-world applications. Existing verification algorithms on ReLU neural networks (NNs) based on classical branch and bound (BaB) techniques cannot be trivially applied to PN… ▽ More

    Submitted 22 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted in NeurIPS 2022

  50. arXiv:2206.06811  [pdf, other

    eess.AS cs.LG

    Adversarial Audio Synthesis with Complex-valued Polynomial Networks

    Authors: Yongtao Wu, Grigorios G Chrysos, Volkan Cevher

    Abstract: Time-frequency (TF) representations in audio synthesis have been increasingly modeled with real-valued networks. However, overlooking the complex-valued nature of TF representations can result in suboptimal performance and require additional modules (e.g., for modeling the phase). To this end, we introduce complex-valued polynomial networks, called APOLLO, that integrate such complex-valued repres… ▽ More

    Submitted 21 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted as oral presentation in Workshop on Machine Learning for Audio Synthesis at ICML 2022