Skip to main content

Showing 1–22 of 22 results for author: Petersen, C

  1. arXiv:2404.14875  [pdf, other

    cs.LG math.OC

    Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks

    Authors: Adeyemi D. Adeoye, Philipp Christian Petersen, Alberto Bemporad

    Abstract: The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent ke… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 27 pages, 9 figures, 2 tables

  2. arXiv:2404.04549  [pdf, other

    cs.NE cs.LG math.FA stat.ML

    Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders

    Authors: A. Martina Neuman, Philipp Christian Petersen

    Abstract: We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including rate-optimal approximation of smo… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  3. arXiv:2301.13867  [pdf, other

    cs.LG cs.AI cs.CL

    Mathematical Capabilities of ChatGPT

    Authors: Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Julius Berner

    Abstract: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-languag… ▽ More

    Submitted 20 July, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Added further evaluations on another ChatGPT version and on GPT-4. The GHOSTS and miniGHOSTS datasets are available at https://github.com/xyfrieder/science-GHOSTS

    Journal ref: NeurIPS 2023 Datasets and Benchmarks

  4. arXiv:2212.09507  [pdf, ps, other

    cs.LG math.FA stat.ML

    VC dimensions of group convolutional neural networks

    Authors: Philipp Christian Petersen, Anna Sepliarskaia

    Abstract: We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant t… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    MSC Class: 68T07; 68Q32; 68T05

  5. arXiv:2212.01354  [pdf, other

    cs.AI cs.MA nlin.AO

    Designing Ecosystems of Intelligence from First Principles

    Authors: Karl J Friston, Maxwell J D Ramstead, Alex B Kiefer, Alexander Tschantz, Christopher L Buckley, Mahault Albarracin, Riddhi J Pitliya, Conor Heins, Brennan Klein, Beren Millidge, Dalton A R Sakthivadivel, Toby St Clere Smithe, Magnus Koudahl, Safae Essafi Tremblay, Capm Petersen, Kaiser Fung, Jason G Fox, Steven Swanson, Dan Mapes, Gabriel René

    Abstract: This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants -- what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read… ▽ More

    Submitted 11 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: 23+18 pages, one figure, one six page appendix

    Journal ref: Collective Intelligence, 3(1), 2024

  6. arXiv:2210.12194  [pdf, other

    astro-ph.IM cs.LG hep-ex physics.data-an

    GraphNeT: Graph neural networks for neutrino telescope event reconstruction

    Authors: Andreas Søgaard, Rasmus F. Ørsøe, Leon Bozianu, Morten Holm, Kaare Endrup Iversen, Tim Guggenmos, Martin Ha Minh, Philipp Eller, Troels C. Petersen

    Abstract: GraphNeT is an open-source python framework aimed at providing high quality, user friendly, end-to-end functionality to perform reconstruction tasks at neutrino telescopes using graph neural networks (GNNs). GraphNeT makes it fast and easy to train complex models that can provide event reconstruction with state-of-the-art performance, for arbitrary detector configurations, with inference times tha… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 6 pages, 1 figure. Code can be found at https://github.com/graphnet-team/graphnet . Submitted to the Journal of Open Source Software (JOSS)

  7. arXiv:2210.00805  [pdf, other

    cs.LG math.FA stat.ML

    Limitations of neural network training due to numerical instability of backpropagation

    Authors: Clemens Karner, Vladimir Kazeev, Philipp Christian Petersen

    Abstract: We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtua… ▽ More

    Submitted 15 November, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    MSC Class: 65G50; 68T07; 41A25; 68T09

  8. arXiv:2209.03042  [pdf, other

    hep-ex astro-ph.IM cs.LG physics.data-an physics.ins-det

    Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

    Authors: R. Abbasi, M. Ackermann, J. Adams, N. Aggarwal, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, K. -H. Becker , et al. (359 additional authors not shown)

    Abstract: IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen… ▽ More

    Submitted 11 October, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: Prepared for submission to JINST

  9. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory

    Authors: Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, Prakash Chauhan

    Abstract: The increasing demand for memory in hyperscale applications has led to memory becoming a large portion of the overall datacenter spend. The emergence of coherent interfaces like CXL enables main memory expansion and offers an efficient solution to this problem. In such systems, the main memory can constitute different memory technologies with varied characteristics. In this paper, we characterize… ▽ More

    Submitted 28 May, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

  10. arXiv:2107.04140  [pdf, other

    cs.AR

    First-Generation Inference Accelerator Deployment at Facebook

    Authors: Michael Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, Satish Nadathur, Sam Naghshineh, Avinash Nayak, Jongsoo Park, Chris Petersen, Martin Schatz, Narayanan Sundaram, Bangsheng Tang, Peter Tang, Amy Yang, Jiecao Yu , et al. (90 additional authors not shown)

    Abstract: In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in… ▽ More

    Submitted 4 August, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

  11. arXiv:2104.06819  [pdf, other

    cs.LG stat.ML

    Short-term bus travel time prediction for transfer synchronization with intelligent uncertainty handling

    Authors: Niklas Christoffer Petersen, Anders Parslov, Filipe Rodrigues

    Abstract: This paper presents two novel approaches for uncertainty estimation adapted and extended for the multi-link bus travel time problem. The uncertainty is modeled directly as part of recurrent artificial neural networks, but using two fundamentally different approaches: one based on Deep Quantile Regression (DQR) and the other on Bayesian Recurrent Neural Networks (BRNN). Both models predict multiple… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  12. Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities

    Authors: Carlo Marcati, Joost A. A. Opschoor, Philipp C. Petersen, Christoph Schwab

    Abstract: We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Found Comput Math (2022)

    MSC Class: 35Q40; 41A25; 41A46; 65N30

    Journal ref: Found. Comput. Math.23(2023), no.3, 1043-1127

  13. arXiv:1904.00289  [pdf

    cs.IR

    On the Estimation and Use of Statistical Modelling in Information Retrieval

    Authors: Casper Petersen

    Abstract: Several tasks in information retrieval (IR) rely on assumptions regarding the distribution of some property (such as term frequency) in the data being processed. This thesis argues that such distributional assumptions can lead to incorrect conclusions and proposes a statistically principled method for determining the "true" distribution. This thesis further applies this method to derive a new fami… ▽ More

    Submitted 30 March, 2019; originally announced April 2019.

    Comments: Phd thesis

    MSC Class: 68P20

  14. Multi-output Bus Travel Time Prediction with Convolutional LSTM Neural Network

    Authors: Niklas Christoffer Petersen, Filipe Rodrigues, Francisco Camara Pereira

    Abstract: Accurate and reliable travel time predictions in public transport networks are essential for delivering an attractive service that is able to compete with other modes of transport in urban areas. The traditional application of this information, where arrival and departure predictions are displayed on digital boards, is highly visible in the city landscape of most modern metropolises. More recently… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Journal ref: Expert Systems with Applications, Volume 120, 15 April 2019, Pages 426-435

  15. arXiv:1901.05744  [pdf, ps, other

    cs.LG stat.ML

    The Oracle of DLphi

    Authors: Dominik Alfke, Weston Baines, Jan Blechschmidt, Mauricio J. del Razo Sarmina, Amnon Drory, Dennis Elbrächter, Nando Farchmin, Matteo Gambara, Silke Glas, Philipp Grohs, Peter Hinz, Danijel Kivaranovic, Christian Kümmerle, Gitta Kutyniok, Sebastian Lunz, Jan Macdonald, Ryan Malthaner, Gregory Naisat, Ariel Neufeld, Philipp Christian Petersen, Rafael Reisenhofer, Jun-Da Sheng, Laura Thesing, Philipp Trunschke, Johannes von Lindheim , et al. (2 additional authors not shown)

    Abstract: We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting t… ▽ More

    Submitted 27 January, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

    MSC Class: 68T05; 82C32

  16. arXiv:1805.02790  [pdf, other

    cs.DC

    Live Recovery of Bit Corruptions in Datacenter Storage Systems

    Authors: Amy Tai, Andrew Kryczka, Shobhit Kanaujia, Chris Petersen, Mikhail Antonov, Muhammad Waliji, Kyle Jamieson, Michael J. Freedman, Asaf Cidon

    Abstract: Due to its high performance and decreasing cost per bit, flash is becoming the main storage medium in datacenters for hot data. However, flash endurance is a perpetual problem, and due to technology trends, subsequent generations of flash devices exhibit progressively shorter lifetimes before they experience uncorrectable bit errors. In this paper we propose extending flash lifetime by allowing… ▽ More

    Submitted 8 May, 2018; v1 submitted 7 May, 2018; originally announced May 2018.

  17. arXiv:1609.00969  [pdf, other

    cs.IR

    Adaptive Distributional Extensions to DFR Ranking

    Authors: Casper Petersen, Jakob Grue Simonsen, Kalervo Jarvelin, Christina Lioma

    Abstract: Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution… ▽ More

    Submitted 4 September, 2016; originally announced September 2016.

  18. arXiv:1608.00758  [pdf, other

    cs.IR

    Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

    Authors: Christina Lioma, Fabien Tarissan, Jakob Grue Simonsen, Casper Petersen, Birger Larsen

    Abstract: Document coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

  19. arXiv:1606.07660  [pdf, other

    cs.IR

    Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)

    Authors: Christina Lioma, Birger Larsen, Casper Petersen, Jakob Grue Simonsen

    Abstract: What if Information Retrieval (IR) systems did not just retrieve relevant information that is stored in their indices, but could also "understand" it and synthesise it into a single document? We present a preliminary study that makes a first step towards answering this question. Given a query, we train a Recurrent Neural Network (RNN) on existing relevant information to that query. We then use the… ▽ More

    Submitted 27 June, 2016; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, July 21, 2016, Pisa, Italy

  20. arXiv:1507.08234  [pdf, other

    cs.IR

    Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

    Authors: Casper Petersen, Christina Lioma, Jakob Grue Simonsen, Birger Larsen

    Abstract: We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which ne… ▽ More

    Submitted 29 July, 2015; originally announced July 2015.

  21. arXiv:1502.03971  [pdf, other

    cs.DC cs.DS

    Near-optimal adjacency labeling scheme for power-law graphs

    Authors: Casper Petersen, Noy Rotbart, Jakob Grue Simonsen, Christian Wulff-Nilsen

    Abstract: An adjacency labeling scheme is a method that assigns labels to the vertices of a graph such that adjacency between vertices can be inferred directly from the assigned label, without using a centralized data structure. We devise adjacency labeling schemes for the family of power-law graphs. This family that has been used to model many types of networks, e.g. the Internet AS-level graph. Furthermor… ▽ More

    Submitted 13 February, 2015; originally announced February 2015.

    ACM Class: E.1; G.2.2

  22. arXiv:1407.4490  [pdf, other

    cs.AI q-bio.QM

    Virus Detection in Multiplexed Nanowire Arrays using Hidden Semi-Markov models

    Authors: Shalini Ghosh, Patrick Lincoln, Christian Petersen, Alfonso Valdes

    Abstract: In this paper, we address the problem of real-time detection of viruses docking to nanowires, especially when multiple viruses dock to the same nano-wire. The task becomes more complicated when there is an array of nanowires coated with different antibodies, where different viruses can dock to each coated nanowire at different binding strengths. We model the array response to a viral agent as a pa… ▽ More

    Submitted 16 July, 2014; originally announced July 2014.