Skip to main content

Showing 1–15 of 15 results for author: Ruan, F

  1. arXiv:2406.06903  [pdf, ps, other

    stat.ML cs.LG math.ST

    On the Limitation of Kernel Dependence Maximization for Feature Selection

    Authors: Keli Liu, Feng Ruan

    Abstract: A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important feature… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2404.09151  [pdf, other

    cs.SE cs.LG

    Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development

    Authors: Siyuan Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu, Lingming Zhang, Tianqi Chen

    Abstract: Deploying machine learning (ML) on diverse computing platforms is crucial to accelerate and broaden their applications. However, it presents significant software engineering challenges due to the fast evolution of models, especially the recent Large Language Models (LLMs), and the emergence of new computing platforms. Current ML frameworks are primarily engineered for CPU and CUDA platforms, leavi… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  3. arXiv:2404.03900  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Nonparametric Modern Hopfield Models

    Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

    Abstract: We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known resul… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 59 pages; Code available at https://github.com/MAGICS-LAB/NonparametricHopfield

  4. arXiv:2404.01245  [pdf, other

    math.ST cs.CL cs.CR cs.LG stat.ML

    A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

    Authors: Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

    Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical effi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  5. arXiv:2302.00845  [pdf, other

    cs.LG cs.DC math.OC

    Coordinating Distributed Example Orders for Provably Accelerated Training

    Authors: A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

    Abstract: Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: whil… ▽ More

    Submitted 21 December, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  6. arXiv:2208.09793  [pdf, other

    stat.ML cs.AI stat.AP

    FastCPH: Efficient Survival Analysis for Neural Networks

    Authors: Xuelin Yang, Louis Abraham, Sejin Kim, Petr Smirnov, Feng Ruan, Benjamin Haibe-Kains, Robert Tibshirani

    Abstract: The Cox proportional hazards model is a canonical method in survival analysis for prediction of the life expectancy of a patient given clinical or genetic covariates -- it is a linear model in its original form. In recent years, several methods have been proposed to generalize the Cox model to neural networks, but none of these are both numerically correct and computationally efficient. We propose… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

  7. arXiv:2110.05852  [pdf, other

    stat.ML cs.LG math.ST

    On the Self-Penalization Phenomenon in Feature Selection

    Authors: Michael I. Jordan, Keli Liu, Feng Ruan

    Abstract: We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: \begin{equation*} \min_{β, f}~\widehat{\mathbb{E}}[L(Y, f(β^{1/q} \odot X)] + λ_n \|f\|_{\mathcal{H}_q}^2~~\text{subject to}~~β\ge 0, \end{equation*} where $L$ is the loss, $\odot$ is coordinate-wise multiplication and $\mathcal{H}_q$ is the reproducing kernel Hilbert space based on the kernel… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 54 pages

  8. arXiv:2012.07348  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    Bandit Learning in Decentralized Matching Markets

    Authors: Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan

    Abstract: We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player s… ▽ More

    Submitted 21 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: 34 pages

  9. arXiv:2011.12215  [pdf, other

    stat.ME cs.LG math.ST

    A Self-Penalizing Objective Function for Scalable Interaction Detection

    Authors: Keli Liu, Feng Ruan

    Abstract: We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonpa… ▽ More

    Submitted 12 December, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: 34 pages; the Appendix can be found on the authors' personal websites (the url is in the pdf)

  10. arXiv:2003.07337  [pdf, other

    stat.ML cs.LG math.OC

    Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

    Authors: Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael I. Jordan

    Abstract: We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations s… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: 38 pages, 3 figures

  11. arXiv:1907.12207  [pdf, other

    stat.ML cs.LG

    LassoNet: A Neural Network with Feature Sparsity

    Authors: Ismael Lemhadri, Feng Ruan, Louis Abraham, Robert Tibshirani

    Abstract: Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we… ▽ More

    Submitted 16 June, 2021; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: 18 pages, 10 fg. arXiv admin note: text overlap with arXiv:1901.09346 by other authors

    Journal ref: Journal of Machine Learning Research 22 (2021) 1-29

  12. arXiv:1810.02954  [pdf, other

    math.ST cs.IT stat.ME

    Adapting to Unknown Noise Distribution in Matrix Denoising

    Authors: Andrea Montanari, Feng Ruan, Jun Yan

    Abstract: We consider the problem of estimating an unknown matrix $\boldsymbol{X}\in {\mathbb R}^{m\times n}$, from observations $\boldsymbol{Y} = \boldsymbol{X}+\boldsymbol{W}$ where $\boldsymbol{W}$ is a noise matrix with independent and identically distributed entries, as to minimize estimation error measured in operator norm. Assuming that the underlying signal $\boldsymbol{X}$ is low-rank and incoheren… ▽ More

    Submitted 4 November, 2018; v1 submitted 6 October, 2018; originally announced October 2018.

  13. arXiv:1806.05756  [pdf, other

    math.ST cs.IT

    The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information

    Authors: John C. Duchi, Feng Ruan

    Abstract: We identify fundamental tradeoffs between statistical utility and privacy under local models of privacy in which data is kept private even from the statistician, providing instance-specific bounds for private estimation and learning problems by developing the \emph{local minimax risk}. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to evaluate… ▽ More

    Submitted 29 September, 2020; v1 submitted 14 June, 2018; originally announced June 2018.

  14. arXiv:1705.02356  [pdf, other

    math.ST cs.IT math.OC

    Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval

    Authors: John C. Duchi, Feng Ruan

    Abstract: We develop procedures, based on minimization of the composition $f(x) = h(c(x))$ of a convex function $h$ and smooth function $c$, for solving random collections of quadratic equalities, applying our methodology to phase retrieval problems. We show that the prox-linear algorithm we develop can solve phase retrieval problems---even with adversarially faulty measurements---with high probability as s… ▽ More

    Submitted 22 April, 2018; v1 submitted 5 May, 2017; originally announced May 2017.

    Comments: 55 pages, 9 figures

  15. arXiv:1603.00126  [pdf, ps, other

    math.ST cs.IT

    Multiclass Classification, Information, Divergence, and Surrogate Risk

    Authors: John C. Duchi, Khashayar Khosravi, Feng Ruan

    Abstract: We provide a unifying view of statistical information measures, multi-way Bayesian hypothesis testing, loss functions for multi-class classification problems, and multi-distribution $f$-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of $f$-divergences to multiple… ▽ More

    Submitted 10 September, 2017; v1 submitted 29 February, 2016; originally announced March 2016.