Skip to main content

Showing 1–15 of 15 results for author: Hron, J

  1. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  2. The Progression of Disparities within the Criminal Justice System: Differential Enforcement and Risk Assessment Instruments

    Authors: Miri Zilka, Riccardo Fogliato, Jiri Hron, Bradley Butcher, Carolyn Ashurst, Adrian Weller

    Abstract: Algorithmic risk assessment instruments (RAIs) increasingly inform decision-making in criminal justice. RAIs largely rely on arrest records as a proxy for underlying crime. Problematically, the extent to which arrests reflect overall offending can vary with the person's characteristics. We examine how the disconnect between crime and arrest rates impacts RAIs and their evaluation. Our main contrib… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to FAccT '23

  3. arXiv:2302.09324  [pdf, other

    cs.CL cs.HC cs.IR cs.LG

    Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

    Authors: Bradley Butcher, Miri Zilka, Darren Cook, Jiri Hron, Adrian Weller

    Abstract: While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, an… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  4. arXiv:2206.13102  [pdf, other

    cs.GT cs.CY cs.IR cs.LG stat.ML

    Modeling Content Creator Incentives on Algorithm-Curated Platforms

    Authors: Jiri Hron, Karl Krauth, Michael I. Jordan, Niki Kilbertus, Sarah Dean

    Abstract: Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a… ▽ More

    Submitted 6 July, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: presented at ICLR 2023 (top 5%)

  5. arXiv:2206.07673  [pdf, other

    stat.ML cs.LG

    Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

    Authors: Jiri Hron, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. The repriorisation map acts directly on parameters, and its analytic simplicity complements the known neural network Gaussian process (NNGP) behaviour of wide BNNs in function space. Exploit… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  6. arXiv:2106.14979  [pdf, other

    cs.IR cs.LG stat.ML

    On component interactions in two-stage recommender systems

    Authors: Jiri Hron, Karl Krauth, Michael I. Jordan, Niki Kilbertus

    Abstract: Thanks to their scalability, two-stage recommenders are used by many of today's largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators, tuned for low prediction latency, preselect a small subset of candidates from the whole item pool; (ii) a slower but more accurate ranker further narrows down the nominated… ▽ More

    Submitted 12 January, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: Appears in the proceedings of the NeurIPS 2021 conference

  7. arXiv:2009.08956  [pdf, other

    cs.IR cs.LG stat.ML

    Exploration in two-stage recommender systems

    Authors: Jiri Hron, Karl Krauth, Michael I. Jordan, Niki Kilbertus

    Abstract: Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. These systems produce recommendations in two steps: (i) multiple nominators preselect a small number of items from a large pool using cheap-to-compute item embeddings; (ii) with a richer set of features, a ranker rearranges the nominated items and serves them to the user. A key challenge of t… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

    Comments: Published at the REVEAL 2020 workshop (RecSys 2020)

  8. arXiv:2006.10541  [pdf, other

    stat.ML cs.LG

    Exact posterior distributions of wide Bayesian neural networks

    Authors: Jiri Hron, Yasaman Bahri, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: Recent work has shown that the prior over functions induced by a deep Bayesian neural network (BNN) behaves as a Gaussian process (GP) as the width of all layers becomes large. However, many BNN applications are concerned with the BNN function space posterior. While some empirical evidence of the posterior convergence was provided in the original works of Neal (1996) and Matthews et al. (2018), it… ▽ More

    Submitted 26 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  9. arXiv:2006.10540  [pdf, other

    stat.ML cs.LG

    Infinite attention: NNGP and NTK for deep attention networks

    Authors: Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

    Abstract: There is a growing amount of literature on the relationship between wide neural networks (NNs) and Gaussian processes (GPs), identifying an equivalence between the two for a variety of NN architectures. This equivalence enables, for instance, accurate approximation of the behaviour of wide Bayesian NNs without MCMC or variational approximations, or characterisation of the distribution of randomly… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  10. arXiv:1912.02803  [pdf, other

    stat.ML cs.LG

    Neural Tangents: Fast and Easy Infinite Neural Networks in Python

    Authors: Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: Neural Tangents is a library designed to enable research into infinite-width neural networks. It provides a high-level API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finite-width as usual or in their infinite-width limit. Infinite-width networks can be trained analytically using exact Bayesian inference or using… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

  11. arXiv:1903.03784  [pdf, other

    stat.ML cs.LG

    Orthogonal Estimation of Wasserstein Distances

    Authors: Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller

    Abstract: Wasserstein distances are increasingly used in a wide variety of applications in machine learning. Sliced Wasserstein distances form an important subclass which may be estimated efficiently through one-dimensional sorting operations. In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and dr… ▽ More

    Submitted 5 April, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: Published at AISTATS 2019

  12. arXiv:1810.06530  [pdf, other

    cs.LG stat.ML

    Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

    Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

    Abstract: Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, a… ▽ More

    Submitted 3 December, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Camera ready version, NeurIPS 2019

  13. arXiv:1810.05148  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

    Authors: Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous… ▽ More

    Submitted 21 August, 2020; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019

  14. arXiv:1807.01969  [pdf, other

    stat.ML cs.LG

    Variational Bayesian dropout: pitfalls and fixes

    Authors: Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

    Abstract: Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; fro… ▽ More

    Submitted 5 July, 2018; originally announced July 2018.

    Comments: Extended version of the paper accepted to ICML 2018: more details in the proofs, few minor modifications

  15. arXiv:1804.11271  [pdf, other

    stat.ML cs.LG

    Gaussian Process Behaviour in Wide Deep Neural Networks

    Authors: Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani

    Abstract: Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architectur… ▽ More

    Submitted 16 August, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: This work substantially extends the work of Matthews et al. (2018) published at the International Conference on Learning Representations (ICLR) 2018