-
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks
Authors:
Adeyemi D. Adeoye,
Philipp Christian Petersen,
Alberto Bemporad
Abstract:
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent ke…
▽ More
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders
Authors:
A. Martina Neuman,
Philipp Christian Petersen
Abstract:
We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including rate-optimal approximation of smo…
▽ More
We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including rate-optimal approximation of smooth functions or approximation without the curse of dimensionality. Moreover, positive-weight spiking neural networks are shown to depend continuously on their parameters which facilitates classical covering number-based generalization statements. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Mathematical Capabilities of ChatGPT
Authors:
Simon Frieder,
Luca Pinchetti,
Alexis Chevalier,
Ryan-Rhys Griffiths,
Tommaso Salvatori,
Thomas Lukasiewicz,
Philipp Christian Petersen,
Julius Berner
Abstract:
We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-languag…
▽ More
We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!
△ Less
Submitted 20 July, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
VC dimensions of group convolutional neural networks
Authors:
Philipp Christian Petersen,
Anna Sepliarskaia
Abstract:
We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant t…
▽ More
We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant to the action of an infinite group.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Designing Ecosystems of Intelligence from First Principles
Authors:
Karl J Friston,
Maxwell J D Ramstead,
Alex B Kiefer,
Alexander Tschantz,
Christopher L Buckley,
Mahault Albarracin,
Riddhi J Pitliya,
Conor Heins,
Brennan Klein,
Beren Millidge,
Dalton A R Sakthivadivel,
Toby St Clere Smithe,
Magnus Koudahl,
Safae Essafi Tremblay,
Capm Petersen,
Kaiser Fung,
Jason G Fox,
Steven Swanson,
Dan Mapes,
Gabriel René
Abstract:
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants -- what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read…
▽ More
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants -- what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read as a physics of intelligence, and which inherits from the physics of self-organization. In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one's sensed world -- also known as self-evidencing. Formally, this corresponds to maximizing (Bayesian) model evidence, via belief updating over several scales: i.e., inference, learning, and model selection. Operationally, this self-evidencing can be realized via (variational) message passing or belief propagation on a factor graph. Crucially, active inference foregrounds an existential imperative of intelligent systems; namely, curiosity or the resolution of uncertainty. This same imperative underwrites belief sharing in ensembles of agents, in which certain aspects (i.e., factors) of each agent's generative world model provide a common ground or frame of reference. Active inference plays a foundational role in this ecology of belief sharing -- leading to a formal account of collective intelligence that rests on shared narratives and goals. We also consider the kinds of communication protocols that must be developed to enable such an ecosystem of intelligences and motivate the development of a shared hyper-spatial modeling language and transaction protocol, as a first -- and key -- step towards such an ecology.
△ Less
Submitted 11 January, 2024; v1 submitted 2 December, 2022;
originally announced December 2022.
-
GraphNeT: Graph neural networks for neutrino telescope event reconstruction
Authors:
Andreas Søgaard,
Rasmus F. Ørsøe,
Leon Bozianu,
Morten Holm,
Kaare Endrup Iversen,
Tim Guggenmos,
Martin Ha Minh,
Philipp Eller,
Troels C. Petersen
Abstract:
GraphNeT is an open-source python framework aimed at providing high quality, user friendly, end-to-end functionality to perform reconstruction tasks at neutrino telescopes using graph neural networks (GNNs). GraphNeT makes it fast and easy to train complex models that can provide event reconstruction with state-of-the-art performance, for arbitrary detector configurations, with inference times tha…
▽ More
GraphNeT is an open-source python framework aimed at providing high quality, user friendly, end-to-end functionality to perform reconstruction tasks at neutrino telescopes using graph neural networks (GNNs). GraphNeT makes it fast and easy to train complex models that can provide event reconstruction with state-of-the-art performance, for arbitrary detector configurations, with inference times that are orders of magnitude faster than traditional reconstruction techniques. GNNs from GraphNeT are flexible enough to be applied to data from all neutrino telescopes, including future projects such as IceCube extensions or P-ONE. This means that GNN-based reconstruction can be used to provide state-of-the-art performance on most reconstruction tasks in neutrino telescopes, at real-time event rates, across experiments and physics analyses, with vast potential impact for neutrino and astro-particle physics.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Limitations of neural network training due to numerical instability of backpropagation
Authors:
Clemens Karner,
Vladimir Kazeev,
Philipp Christian Petersen
Abstract:
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtua…
▽ More
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments that yield high-order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results.
△ Less
Submitted 15 November, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
N. Aggarwal,
J. A. Aguilar,
M. Ahlers,
M. Ahrens,
J. M. Alameddine,
A. A. Alves Jr.,
N. M. Amin,
K. Andeen,
T. Anderson,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker
, et al. (359 additional authors not shown)
Abstract:
IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen…
▽ More
IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events.
△ Less
Submitted 11 October, 2022; v1 submitted 7 September, 2022;
originally announced September 2022.
-
TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory
Authors:
Hasan Al Maruf,
Hao Wang,
Abhishek Dhanotia,
Johannes Weiner,
Niket Agarwal,
Pallab Bhattacharya,
Chris Petersen,
Mosharaf Chowdhury,
Shobhit Kanaujia,
Prakash Chauhan
Abstract:
The increasing demand for memory in hyperscale applications has led to memory becoming a large portion of the overall datacenter spend. The emergence of coherent interfaces like CXL enables main memory expansion and offers an efficient solution to this problem. In such systems, the main memory can constitute different memory technologies with varied characteristics. In this paper, we characterize…
▽ More
The increasing demand for memory in hyperscale applications has led to memory becoming a large portion of the overall datacenter spend. The emergence of coherent interfaces like CXL enables main memory expansion and offers an efficient solution to this problem. In such systems, the main memory can constitute different memory technologies with varied characteristics. In this paper, we characterize memory usage patterns of a wide range of datacenter applications across the server fleet of Meta. We, therefore, demonstrate the opportunities to offload colder pages to slower memory tiers for these applications. Without efficient memory management, however, such systems can significantly degrade performance.
We propose a novel OS-level application-transparent page placement mechanism (TPP) for CXL-enabled memory. TPP employs a lightweight mechanism to identify and place hot/cold pages to appropriate memory tiers. It enables a proactive page demotion from local memory to CXL-Memory. This technique ensures a memory headroom for new page allocations that are often related to request processing and tend to be short-lived and hot. At the same time, TPP can promptly promote performance-critical hot pages trapped in the slow CXL-Memory to the fast local memory, while minimizing both sampling overhead and unnecessary migrations. TPP works transparently without any application-specific knowledge and can be deployed globally as a kernel release.
We evaluate TPP in the production server fleet with early samples of new x86 CPUs with CXL 1.1 support. TPP makes a tiered memory system performant as an ideal baseline (<1% gap) that has all the memory in the local tier. It is 18% better than today's Linux, and 5-17% better than existing solutions including NUMA Balancing and AutoTiering. Most of the TPP patches have been merged in the Linux v5.18 release.
△ Less
Submitted 28 May, 2023; v1 submitted 6 June, 2022;
originally announced June 2022.
-
First-Generation Inference Accelerator Deployment at Facebook
Authors:
Michael Anderson,
Benny Chen,
Stephen Chen,
Summer Deng,
Jordan Fix,
Michael Gschwind,
Aravind Kalaiah,
Changkyu Kim,
Jaewon Lee,
Jason Liang,
Haixin Liu,
Yinghai Lu,
Jack Montgomery,
Arun Moorthy,
Satish Nadathur,
Sam Naghshineh,
Avinash Nayak,
Jongsoo Park,
Chris Petersen,
Martin Schatz,
Narayanan Sundaram,
Bangsheng Tang,
Peter Tang,
Amy Yang,
Jiecao Yu
, et al. (90 additional authors not shown)
Abstract:
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in…
▽ More
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design.
△ Less
Submitted 4 August, 2021; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Short-term bus travel time prediction for transfer synchronization with intelligent uncertainty handling
Authors:
Niklas Christoffer Petersen,
Anders Parslov,
Filipe Rodrigues
Abstract:
This paper presents two novel approaches for uncertainty estimation adapted and extended for the multi-link bus travel time problem. The uncertainty is modeled directly as part of recurrent artificial neural networks, but using two fundamentally different approaches: one based on Deep Quantile Regression (DQR) and the other on Bayesian Recurrent Neural Networks (BRNN). Both models predict multiple…
▽ More
This paper presents two novel approaches for uncertainty estimation adapted and extended for the multi-link bus travel time problem. The uncertainty is modeled directly as part of recurrent artificial neural networks, but using two fundamentally different approaches: one based on Deep Quantile Regression (DQR) and the other on Bayesian Recurrent Neural Networks (BRNN). Both models predict multiple time steps into the future, but handle the time-dependent uncertainty estimation differently. We present a sampling technique in order to aggregate quantile estimates for link level travel time to yield the multi-link travel time distribution needed for a vehicle to travel from its current position to a specific downstream stop point or transfer site.
To motivate the relevance of uncertainty-aware models in the domain, we focus on the connection assurance application as a case study: An expert system to determine whether a bus driver should hold and wait for a connecting service, or break the connection and reduce its own delay. Our results show that the DQR-model performs overall best for the 80%, 90% and 95% prediction intervals, both for a 15 minute time horizon into the future (t + 1), but also for the 30 and 45 minutes time horizon (t + 2 and t + 3), with a constant, but very small underestimation of the uncertainty interval (1-4 pp.). However, we also show, that the BRNN model still can outperform the DQR for specific cases. Lastly, we demonstrate how a simple decision support system can take advantage of our uncertainty-aware travel time models to prioritize the difference in travel time uncertainty for bus holding at strategic points, thus reducing the introduced delay for the connection assurance application.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities
Authors:
Carlo Marcati,
Joost A. A. Opschoor,
Philipp C. Petersen,
Christoph Schwab
Abstract:
We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary…
▽ More
We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary $\partial Ω$. The exponential expression rate bounds proved here imply uniform exponential expressivity by ReLU NNs of solution families for several elliptic boundary and eigenvalue problems with analytic data. The exponential approximation rates are shown to hold in space dimension $d = 2$ on Lipschitz polygons with straight sides, and in space dimension $d=3$ on Fichera-type polyhedral domains with plane faces. The constructive proofs indicate in particular that NN depth and size increase poly-logarithmically with respect to the target NN approximation accuracy $\varepsilon>0$ in $H^1(Ω)$. The results cover in particular solution sets of linear, second order elliptic PDEs with analytic data and certain nonlinear elliptic eigenvalue problems with analytic nonlinearities and singular, weighted analytic potentials as arise in electron structure models. In the latter case, the functions correspond to electron densities that exhibit isolated point singularities at the positions of the nuclei. Our findings provide in particular mathematical foundation of recently reported, successful uses of deep neural networks in variational electron structure algorithms.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
On the Estimation and Use of Statistical Modelling in Information Retrieval
Authors:
Casper Petersen
Abstract:
Several tasks in information retrieval (IR) rely on assumptions regarding the distribution of some property (such as term frequency) in the data being processed. This thesis argues that such distributional assumptions can lead to incorrect conclusions and proposes a statistically principled method for determining the "true" distribution. This thesis further applies this method to derive a new fami…
▽ More
Several tasks in information retrieval (IR) rely on assumptions regarding the distribution of some property (such as term frequency) in the data being processed. This thesis argues that such distributional assumptions can lead to incorrect conclusions and proposes a statistically principled method for determining the "true" distribution. This thesis further applies this method to derive a new family of ranking models that adapt their computations to the statistics of the data being processed. Experimental evaluation shows results on par or better than multiple strong baselines on several TREC collections. Overall, this thesis concludes that distributional assumptions can be replaced with an effective, efficient and principled method for determining the "true" distribution and that using the "true" distribution can lead to improved retrieval performance.
△ Less
Submitted 30 March, 2019;
originally announced April 2019.
-
Multi-output Bus Travel Time Prediction with Convolutional LSTM Neural Network
Authors:
Niklas Christoffer Petersen,
Filipe Rodrigues,
Francisco Camara Pereira
Abstract:
Accurate and reliable travel time predictions in public transport networks are essential for delivering an attractive service that is able to compete with other modes of transport in urban areas. The traditional application of this information, where arrival and departure predictions are displayed on digital boards, is highly visible in the city landscape of most modern metropolises. More recently…
▽ More
Accurate and reliable travel time predictions in public transport networks are essential for delivering an attractive service that is able to compete with other modes of transport in urban areas. The traditional application of this information, where arrival and departure predictions are displayed on digital boards, is highly visible in the city landscape of most modern metropolises. More recently, the same information has become critical as input for smart-phone trip planners in order to alert passengers about unreachable connections, alternative route choices and prolonged travel times. More sophisticated Intelligent Transport Systems (ITS) include the predictions of connection assurance, i.e. to hold back services in case a connecting service is delayed. In order to operate such systems, and to ensure the confidence of passengers in the systems, the information provided must be accurate and reliable. Traditional methods have trouble with this as congestion, and thus travel time variability, increases in cities, consequently making travel time predictions in urban areas a non-trivial task. This paper presents a system for bus travel time prediction that leverages the non-static spatio-temporal correlations present in urban bus networks, allowing the discovery of complex patterns not captured by traditional methods. The underlying model is a multi-output, multi-time-step, deep neural network that uses a combination of convolutional and long short-term memory (LSTM) layers. The method is empirically evaluated and compared to other popular approaches for link travel time prediction and currently available services, including the currently deployed model in Copenhagen, Denmark. We find that the proposed model significantly outperforms all the other methods we compare with, and is able to detect small irregular peaks in bus travel times very quickly.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
The Oracle of DLphi
Authors:
Dominik Alfke,
Weston Baines,
Jan Blechschmidt,
Mauricio J. del Razo Sarmina,
Amnon Drory,
Dennis Elbrächter,
Nando Farchmin,
Matteo Gambara,
Silke Glas,
Philipp Grohs,
Peter Hinz,
Danijel Kivaranovic,
Christian Kümmerle,
Gitta Kutyniok,
Sebastian Lunz,
Jan Macdonald,
Ryan Malthaner,
Gregory Naisat,
Ariel Neufeld,
Philipp Christian Petersen,
Rafael Reisenhofer,
Jun-Da Sheng,
Laura Thesing,
Philipp Trunschke,
Johannes von Lindheim
, et al. (2 additional authors not shown)
Abstract:
We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting t…
▽ More
We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting that as long as one has access to enough data points, the quality of the data is irrelevant.
△ Less
Submitted 27 January, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Live Recovery of Bit Corruptions in Datacenter Storage Systems
Authors:
Amy Tai,
Andrew Kryczka,
Shobhit Kanaujia,
Chris Petersen,
Mikhail Antonov,
Muhammad Waliji,
Kyle Jamieson,
Michael J. Freedman,
Asaf Cidon
Abstract:
Due to its high performance and decreasing cost per bit, flash is becoming the main storage medium in datacenters for hot data. However, flash endurance is a perpetual problem, and due to technology trends, subsequent generations of flash devices exhibit progressively shorter lifetimes before they experience uncorrectable bit errors.
In this paper we propose extending flash lifetime by allowing…
▽ More
Due to its high performance and decreasing cost per bit, flash is becoming the main storage medium in datacenters for hot data. However, flash endurance is a perpetual problem, and due to technology trends, subsequent generations of flash devices exhibit progressively shorter lifetimes before they experience uncorrectable bit errors.
In this paper we propose extending flash lifetime by allowing devices to expose higher bit error rates. To do so, we present DIRECT, a novel set of policies that leverages latent redundancy in distributed storage systems to recover from bit corruption errors with minimal performance and recovery overhead. In doing so, DIRECT can significantly extend the lifetime of flash devices by effectively utilizing these devices even after they begin exposing bit errors.
We implemented DIRECT on two real-world storage systems: ZippyDB, a distributed key-value store backed by RocksDB, and HDFS, a distributed file system. When tested on production traces at Facebook, DIRECT reduces application-visible error rates in ZippyDB by more than 10^2 and recovery time by more than 10^4. DIRECT also allows HDFS to tolerate a 10^4--10^5 higher bit error rate without experiencing application-visible errors.
△ Less
Submitted 8 May, 2018; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Adaptive Distributional Extensions to DFR Ranking
Authors:
Casper Petersen,
Jakob Grue Simonsen,
Kalervo Jarvelin,
Christina Lioma
Abstract:
Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution…
▽ More
Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).
△ Less
Submitted 4 September, 2016;
originally announced September 2016.
-
Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval
Authors:
Christina Lioma,
Fabien Tarissan,
Jakob Grue Simonsen,
Casper Petersen,
Birger Larsen
Abstract:
Document coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where…
▽ More
Document coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where adjusting the ranking of documents according to both their relevance and their coherence has been shown to increase retrieval effectiveness [34,37].
The state of the art in unsupervised coherence modelling represents documents as bipartite graphs of sentences and discourse entities, and then projects these bipartite graphs into one-mode undirected graphs. However, one-mode projections may incur significant loss of the information present in the original bipartite structure. To address this we present three novel graph metrics that compute document coherence on the original bipartite graph of sentences and entities. Evaluation on standard settings shows that: (i) one of our coherence metrics beats the state of the art in terms of coherence accuracy; and (ii) all three of our coherence metrics improve retrieval effectiveness because, as closer analysis reveals, they capture aspects of document quality that go undetected by both keyword-based standard ranking and by spam filtering. This work contributes document coherence metrics that are theoretically principled, parameter-free, and useful to IR.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)
Authors:
Christina Lioma,
Birger Larsen,
Casper Petersen,
Jakob Grue Simonsen
Abstract:
What if Information Retrieval (IR) systems did not just retrieve relevant information that is stored in their indices, but could also "understand" it and synthesise it into a single document? We present a preliminary study that makes a first step towards answering this question. Given a query, we train a Recurrent Neural Network (RNN) on existing relevant information to that query. We then use the…
▽ More
What if Information Retrieval (IR) systems did not just retrieve relevant information that is stored in their indices, but could also "understand" it and synthesise it into a single document? We present a preliminary study that makes a first step towards answering this question. Given a query, we train a Recurrent Neural Network (RNN) on existing relevant information to that query. We then use the RNN to "deep learn" a single, synthetic, and we assume, relevant document for that query. We design a crowdsourcing experiment to assess how relevant the "deep learned" document is, compared to existing relevant documents. Users are shown a query and four wordclouds (of three existing relevant documents and our deep learned synthetic document). The synthetic document is ranked on average most relevant of all.
△ Less
Submitted 27 June, 2016; v1 submitted 24 June, 2016;
originally announced June 2016.
-
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
Authors:
Casper Petersen,
Christina Lioma,
Jakob Grue Simonsen,
Birger Larsen
Abstract:
We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which ne…
▽ More
We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56].
△ Less
Submitted 29 July, 2015;
originally announced July 2015.
-
Near-optimal adjacency labeling scheme for power-law graphs
Authors:
Casper Petersen,
Noy Rotbart,
Jakob Grue Simonsen,
Christian Wulff-Nilsen
Abstract:
An adjacency labeling scheme is a method that assigns labels to the vertices of a graph such that adjacency between vertices can be inferred directly from the assigned label, without using a centralized data structure. We devise adjacency labeling schemes for the family of power-law graphs. This family that has been used to model many types of networks, e.g. the Internet AS-level graph. Furthermor…
▽ More
An adjacency labeling scheme is a method that assigns labels to the vertices of a graph such that adjacency between vertices can be inferred directly from the assigned label, without using a centralized data structure. We devise adjacency labeling schemes for the family of power-law graphs. This family that has been used to model many types of networks, e.g. the Internet AS-level graph. Furthermore, we prove an almost matching lower bound for this family. We also provide an asymptotically near- optimal labeling scheme for sparse graphs. Finally, we validate the efficiency of our labeling scheme by an experimental evaluation using both synthetic data and real-world networks of up to hundreds of thousands of vertices.
△ Less
Submitted 13 February, 2015;
originally announced February 2015.
-
Virus Detection in Multiplexed Nanowire Arrays using Hidden Semi-Markov models
Authors:
Shalini Ghosh,
Patrick Lincoln,
Christian Petersen,
Alfonso Valdes
Abstract:
In this paper, we address the problem of real-time detection of viruses docking to nanowires, especially when multiple viruses dock to the same nano-wire. The task becomes more complicated when there is an array of nanowires coated with different antibodies, where different viruses can dock to each coated nanowire at different binding strengths. We model the array response to a viral agent as a pa…
▽ More
In this paper, we address the problem of real-time detection of viruses docking to nanowires, especially when multiple viruses dock to the same nano-wire. The task becomes more complicated when there is an array of nanowires coated with different antibodies, where different viruses can dock to each coated nanowire at different binding strengths. We model the array response to a viral agent as a pattern of conductance change over nanowires with known modifier --- this representation permits analysis of the output of such an array via belief network (Bayes) methods, as well as novel generative models like the Hidden Semi-Markov Model (HSMM).
△ Less
Submitted 16 July, 2014;
originally announced July 2014.