Skip to main content

Showing 1–35 of 35 results for author: Jitkrittum, W

  1. arXiv:2406.00060  [pdf, other

    cs.CL cs.LG

    Cascade-Aware Training of Language Models

    Authors: Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

    Abstract: Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 22 pages, 13 figures

  2. arXiv:2405.19261  [pdf, other

    cs.CL cs.AI cs.LG

    Faster Cascades via Speculative Decoding

    Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2404.10136  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model Cascades: Token-level uncertainty and beyond

    Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  4. arXiv:2310.09250  [pdf, other

    cs.LG cs.AI stat.ML

    It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models

    Authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

    Abstract: Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}. However, in this paper, we show that for an ensemble of deep learning based classification models, bias and variance are \emph{aligned} at a sample level, where squared bias is approximately \emph{equal} to variance for correctly classif… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  5. arXiv:2307.02764  [pdf, other

    cs.LG stat.ML

    When Does Confidence-Based Cascade Deferral Suffice?

    Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  6. arXiv:2301.12386  [pdf, other

    cs.LG

    Plugin estimators for selective classification with out-of-distribution detection

    Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

    Abstract: Real-world classifiers can benefit from the option of abstaining from predicting on samples where they have low confidence. Such abstention is particularly useful on samples which are close to the learned decision boundary, or which are outliers with respect to the training sample. These settings have been the subject of extensive but disjoint study in the selective classification (SC) and out-of-… ▽ More

    Submitted 24 July, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

  7. arXiv:2301.12005  [pdf, other

    cs.LG

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Authors: Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

    Abstract: Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages… ▽ More

    Submitted 3 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  8. arXiv:2208.03354  [pdf, other

    cs.CV cs.LG

    A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

    Authors: Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, James Hays

    Abstract: We address the problem of retrieving images with both a sketch and a text query. We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for image retrieval using a text description and a sketch as input. We argue that both input modalities complement each other in a manner that cannot be achieved easily by either one alone. TASK-former follows the late-fusion dual-enco… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: ECCV 2022

  9. arXiv:2206.11142  [pdf, other

    stat.ME cs.LG stat.AP stat.CO stat.ML

    Discussion of `Multiscale Fisher's Independence Test for Multivariate Dependence'

    Authors: Antonin Schrab, Wittawat Jitkrittum, Zoltán Szabó, Dino Sejdinovic, Arthur Gretton

    Abstract: We discuss how MultiFIT, the Multiscale Fisher's Independence Test for Multivariate Dependence proposed by Gorsky and Ma (2022), compares to existing linear-time kernel tests based on the Hilbert-Schmidt independence criterion (HSIC). We highlight the fact that the levels of the kernel tests at any finite sample size can be controlled exactly, as it is the case with the level of MultiFIT. In our e… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 8 pages

  10. arXiv:2204.13208  [pdf, other

    cs.LG stat.ML

    ELM: Embedding and Logit Margins for Long-Tail Learning

    Authors: Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural m… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: 24 pages

  11. arXiv:2110.15440  [pdf, other

    cs.CR cs.LG

    HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

    Authors: Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

    Abstract: Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a well designed protocol to securely compute a function. With the non-colluding party assumption, MPC has a cryptographic guarantee that the parties will not learn sensitive information from the computation process, making it an appealing framework for applications that involve privacy-sensitive… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  12. arXiv:2105.05736  [pdf, other

    cs.LG stat.ML

    Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

    Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off pe… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: To appear in ICML 2021

  13. arXiv:2102.05573  [pdf, other

    cs.LG stat.ML

    A Witness Two-Sample Test

    Authors: Jonas M. Kübler, Wittawat Jitkrittum, Bernhard Schölkopf, Krikamol Muandet

    Abstract: The Maximum Mean Discrepancy (MMD) has been the state-of-the-art nonparametric test for tackling the two-sample problem. Its statistic is given by the difference in expectations of the witness function, a real-valued function defined as a weighted sum of kernel evaluations on a set of basis points. Typically the kernel is optimized on a training set, and hypothesis testing is performed on a separa… ▽ More

    Submitted 11 February, 2022; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: AISTATS2022

  14. arXiv:2006.06981  [pdf, other

    math.OC cs.LG stat.ML

    Kernel Distributionally Robust Optimization

    Authors: Jia-Jie Zhu, Wittawat Jitkrittum, Moritz Diehl, Bernhard Schölkopf

    Abstract: We propose kernel distributionally robust optimization (Kernel DRO) using insights from the robust optimization theory and functional analysis. Our method uses reproducing kernel Hilbert spaces (RKHS) to construct a wide range of convex ambiguity sets, which can be generalized to sets based on integral probability metrics and finite-order moment bounds. This perspective unifies multiple existing r… ▽ More

    Submitted 27 February, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of Machine Learning Research, PMLR 130:280-288, 2021

  15. arXiv:2006.02286  [pdf, other

    cs.LG stat.ML

    Learning Kernel Tests Without Data Splitting

    Authors: Jonas M. Kübler, Wittawat Jitkrittum, Bernhard Schölkopf, Krikamol Muandet

    Abstract: Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inf… ▽ More

    Submitted 19 October, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: 24 (11+13) pages, 10 figures. Camera-Ready version. Accepted to the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)

  16. arXiv:2004.00166  [pdf, other

    math.OC cs.LG eess.SY

    Worst-Case Risk Quantification under Distributional Ambiguity using Kernel Mean Embedding in Moment Problem

    Authors: Jia-Jie Zhu, Wittawat Jitkrittum, Moritz Diehl, Bernhard Schölkopf

    Abstract: In order to anticipate rare and impactful events, we propose to quantify the worst-case risk under distributional ambiguity using a recent development in kernel methods -- the kernel mean embedding. Specifically, we formulate the generalized moment problem whose ambiguity set (i.e., the moment constraint) is described by constraints in the associated reproducing kernel Hilbert space in a nonparame… ▽ More

    Submitted 6 September, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

  17. arXiv:2002.10271  [pdf, other

    stat.ML cs.LG math.ST

    Testing Goodness of Fit of Conditional Density Models with Kernels

    Authors: Wittawat Jitkrittum, Heishiro Kanagawa, Bernhard Schölkopf

    Abstract: We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function $p(y|x)$ and a joint sample, decide whether the sample is drawn from $p(y|x)r_x(x)$ for some density $r_x$. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the norma… ▽ More

    Submitted 30 June, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: In UAI 2020. http://auai.org/uai2020/accepted.php

    MSC Class: 46E22; 62G10 ACM Class: G.3; I.2.6

  18. arXiv:2002.09225  [pdf, ps, other

    math.ST cs.LG econ.EM stat.ML

    Kernel Conditional Moment Test via Maximum Moment Restriction

    Authors: Krikamol Muandet, Wittawat Jitkrittum, Jonas Kübler

    Abstract: We propose a new family of specification tests called kernel conditional moment (KCM) tests. Our tests are built on a novel representation of conditional moment restrictions in a reproducing kernel Hilbert space (RKHS) called conditional moment embedding (CMME). After transforming the conditional moment restrictions into a continuum of unconditional counterparts, the test statistic is defined as t… ▽ More

    Submitted 19 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI2020)

  19. arXiv:1910.14428   

    stat.ML cs.LG math.DS

    Kernel-Guided Training of Implicit Generative Models with Stability Guarantees

    Authors: Arash Mehrjou, Wittawat Jitkrittum, Krikamol Muandet, Bernhard Schölkopf

    Abstract: Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meaningful way during the course of training. In this wo… ▽ More

    Submitted 3 November, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: There was a misunderstanding in how an article should be updated on arXiv. We have withdrawn this article from this link. The same article can be found at arXiv:1901.09206

  20. arXiv:1910.12252  [pdf, other

    cs.LG stat.ML

    Kernel Stein Tests for Multiple Model Comparison

    Authors: Jen Ning Lim, Makoto Yamada, Bernhard Schölkopf, Wittawat Jitkrittum

    Abstract: We address the problem of non-parametric multiple model comparison: given $l$ candidate models, decide whether each candidate is as good as the best one(s) or worse than it. We propose two statistical tests, each controlling a different notion of decision errors. The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declare… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: Accepted to NeurIPS 2019

  21. arXiv:1910.06134  [pdf, other

    cs.LG stat.ML

    More Powerful Selective Kernel Tests for Feature Selection

    Authors: Jen Ning Lim, Makoto Yamada, Wittawat Jitkrittum, Yoshikazu Terada, Shigeyuki Matsui, Hidetoshi Shimodaira

    Abstract: Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection. Many… ▽ More

    Submitted 29 February, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted to AISTATS 2020

  22. arXiv:1910.05103  [pdf, other

    stat.ML cs.LG

    ABCDP: Approximate Bayesian Computation with Differential Privacy

    Authors: Mijung Park, Margarita Vinaroz, Wittawat Jitkrittum

    Abstract: We develop a novel approximate Bayesian computation (ABC) framework, ABCDP, that produces differentially private (DP) and approximate posterior samples. Our framework takes advantage of the Sparse Vector Technique (SVT), widely studied in the differential privacy literature. SVT incurs the privacy cost only when a condition (whether a quantity of interest is above/below a threshold) is met. If the… ▽ More

    Submitted 8 February, 2021; v1 submitted 11 October, 2019; originally announced October 2019.

  23. A Kernel Stein Test for Comparing Latent Variable Models

    Authors: Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

    Abstract: We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable. The proposed test generalizes the recently proposed kernel Stein discrepancy (KSD) tests (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018) t… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: This is a pre-copyedited, author-produced version of an article accepted for publication in The Journal of the Royal Statistical Society Series: B following peer review

  24. arXiv:1905.05882  [pdf, other

    cs.LG cs.CV stat.ML

    Kernel Mean Matching for Content Addressability of GANs

    Authors: Wittawat Jitkrittum, Patsorn Sangkloy, Muhammad Waleed Gondal, Amit Raj, James Hays, Bernhard Schölkopf

    Abstract: We propose a novel procedure which adds "content-addressability" to any given unconditional implicit model e.g., a generative adversarial network (GAN). The procedure allows users to control the generative process by specifying a set (arbitrary size) of desired examples based on which similar samples are generated from the model. The proposed approach, based on kernel mean matching, is applicable… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: Wittawat Jitkrittum and Patsorn Sangkloy contributed equally to this work

  25. arXiv:1901.09206  [pdf, other

    cs.LG stat.ML

    Kernel-Guided Training of Implicit Generative Models with Stability Guarantees

    Authors: Arash Mehrjou, Wittawat Jitkrittum, Krikamol Muandet, Bernhard Schölkopf

    Abstract: Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meaningful way during the course of training. In this wo… ▽ More

    Submitted 6 November, 2019; v1 submitted 26 January, 2019; originally announced January 2019.

    Comments: This article supersedes arXiv:1901.09206 version 1. The paper is restructured, its writing is improved, and new experiments are added. The main result on stability is unchanged

  26. arXiv:1810.11630  [pdf, other

    stat.ML cs.LG

    Informative Features for Model Comparison

    Authors: Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton

    Abstract: Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models. We propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. As a unique advantage, our tests can produce a set of examples (informative features) indicating… ▽ More

    Submitted 27 October, 2018; originally announced October 2018.

    Comments: Accepted to NIPS 2018

    MSC Class: 46E22; 62G10 ACM Class: G.3; I.2.6

  27. arXiv:1805.07454  [pdf, other

    stat.ML cs.LG

    Fisher Efficient Inference of Intractable Models

    Authors: Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen

    Abstract: Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{é}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term… ▽ More

    Submitted 1 November, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: Fixed typos in the text. To appear in Neural Information Process 2019

  28. arXiv:1705.07673  [pdf, other

    stat.ML cs.LG

    A Linear-Time Kernel Goodness-of-Fit Test

    Authors: Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, Arthur Gretton

    Abstract: We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We anal… ▽ More

    Submitted 24 October, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Accepted to NIPS 2017

    MSC Class: 46E22; 62G10 ACM Class: G.3; I.2.6

  29. arXiv:1610.04782  [pdf, other

    stat.ML cs.LG

    An Adaptive Test of Independence with Analytic Kernel Embeddings

    Authors: Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton

    Abstract: A new computationally efficient dependence measure, and an adaptive statistical test of independence, are proposed. The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features). These features are chosen so as to maximize a lower bound on the test power, resulting in a test that is… ▽ More

    Submitted 15 October, 2016; originally announced October 2016.

    Comments: 8 pages of main text

    MSC Class: 46E22; 62G10 ACM Class: G.3; I.2.6

  30. arXiv:1605.06796  [pdf, other

    stat.ML cs.LG

    Interpretable Distribution Features with Maximum Testing Power

    Authors: Wittawat Jitkrittum, Zoltan Szabo, Kacper Chwialkowski, Arthur Gretton

    Abstract: Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious and int… ▽ More

    Submitted 28 October, 2016; v1 submitted 22 May, 2016; originally announced May 2016.

    MSC Class: 46E22; 62G10 ACM Class: G.3; I.2.6

  31. arXiv:1503.02551  [pdf, other

    stat.ML cs.LG

    Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

    Authors: Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Dino Sejdinovic, Zoltán Szabó

    Abstract: We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained o… ▽ More

    Submitted 9 June, 2015; v1 submitted 9 March, 2015; originally announced March 2015.

    Comments: accepted to UAI 2015. Correct typos. Add more content to the appendix. Main results unchanged

    MSC Class: 62F15; 46e22; 62-09; 62F30 ACM Class: G.3; I.2.6

  32. arXiv:1502.02558  [pdf, other

    stat.ML cs.LG

    K2-ABC: Approximate Bayesian Computation with Kernel Embeddings

    Authors: Mijung Park, Wittawat Jitkrittum, Dino Sejdinovic

    Abstract: Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in term… ▽ More

    Submitted 26 December, 2015; v1 submitted 9 February, 2015; originally announced February 2015.

  33. arXiv:1501.00375  [pdf, ps, other

    stat.ML cs.LG

    Passing Expectation Propagation Messages with Kernel Methods

    Authors: Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess

    Abstract: We propose to learn a kernel-based message operator which takes as input all expectation propagation (EP) incoming messages to a factor node and produces an outgoing message. In ordinary EP, computing an outgoing message involves estimating a multivariate integral which may not have an analytic expression. Learning such an operator allows one to bypass the expensive computation of the integral dur… ▽ More

    Submitted 2 January, 2015; originally announced January 2015.

    Comments: Accepted to Advances in Variational Inference, NIPS 2014 Workshop

  34. Feature Selection via L1-Penalized Squared-Loss Mutual Information

    Authors: Wittawat Jitkrittum, Hirotaka Hachiya, Masashi Sugiyama

    Abstract: Feature selection is a technique to screen out less important features. Many existing supervised feature selection algorithms use redundancy and relevancy as the main criteria to select features. However, feature interaction, potentially a key characteristic in real-world problems, has not received much attention. As an attempt to take feature interaction into account, we propose L1-LSMI, an L1-re… ▽ More

    Submitted 6 October, 2012; originally announced October 2012.

    Comments: 25 pages

  35. arXiv:1202.0515  [pdf, ps, other

    stat.ML cs.AI stat.ME

    High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

    Authors: Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P. Xing, Masashi Sugiyama

    Abstract: The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-… ▽ More

    Submitted 3 January, 2019; v1 submitted 2 February, 2012; originally announced February 2012.

    Comments: 18 pages

    Journal ref: Neural Computation 2014