Skip to main content

Showing 1–6 of 6 results for author: Burg, G J J v d

  1. arXiv:2211.00192  [pdf, other

    cs.DB

    AI Assistants: A Framework for Semi-Automated Data Wrangling

    Authors: Tomas Petricek, Gerrit J. J. van den Burg, Alfredo Nazábal, Taha Ceritli, Ernesto Jiménez-Ruiz, Christopher K. I. Williams

    Abstract: Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering

  2. arXiv:2106.03216  [pdf, other

    cs.LG stat.ML

    On Memorization in Probabilistic Deep Generative Models

    Authors: Gerrit J. J. van den Burg, Christopher K. I. Williams

    Abstract: Recent advances in deep generative models have led to impressive results in a variety of application domains. Motivated by the possibility that deep learning models might memorize part of the input data, there have been increased efforts to understand how memorization arises. In this work, we extend a recently proposed measure of memorization for supervised learning (Feldman, 2019) to the unsuperv… ▽ More

    Submitted 29 December, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at NeurIPS 2021

    MSC Class: 68T07

  3. arXiv:2003.06222  [pdf, other

    stat.ML cs.LG stat.ME

    An Evaluation of Change Point Detection Algorithms

    Authors: Gerrit J. J. van den Burg, Christopher K. I. Williams

    Abstract: Change point detection is an important part of time series analysis, as the presence of a change point indicates an abrupt and significant change in the data generating process. While many algorithms for change point detection have been proposed, comparatively little attention has been paid to evaluating their performance on real-world time series. Algorithms are typically evaluated on simulated d… ▽ More

    Submitted 12 February, 2022; v1 submitted 13 March, 2020; originally announced March 2020.

    Comments: For code and data, see https://github.com/alan-turing-institute/TCPDBench ; Changelog in pdf

    MSC Class: 62M10 ACM Class: G.3

  4. arXiv:1910.03906  [pdf, other

    stat.ML cs.LG stat.CO

    Probabilistic sequential matrix factorization

    Authors: Ömer Deniz Akyildiz, Gerrit J. J. van den Burg, Theodoros Damoulas, Mark F. J. Steel

    Abstract: We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with potentially nonline… ▽ More

    Submitted 18 March, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at AISTATS 2021

  5. Wrangling Messy CSV Files by Detecting Row and Type Patterns

    Authors: Gerrit J. J. van den Burg, Alfredo Nazabal, Charles Sutton

    Abstract: It is well known that data scientists spend the majority of their time on preparing data for analysis. One of the first steps in this preparation phase is to load the data from the raw storage format. Comma-separated value (CSV) files are a popular format for tabular data due to their simplicity and ostensible ease of use. However, formatting standards for CSV files are not followed consistently,… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    ACM Class: E.5; H.2.8

    Journal ref: Data Mining and Knowledge Discovery (July, 2019)

  6. arXiv:1711.03512  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Fast Meta-Learning for Adaptive Hierarchical Classifier Design

    Authors: Gerrit J. J. van den Burg, Alfred O. Hero

    Abstract: We propose a new splitting criterion for a meta-learning approach to multiclass classifier design that adaptively merges the classes into a tree-structured hierarchy of increasingly difficult binary classification problems. The classification tree is constructed from empirical estimates of the Henze-Penrose bounds on the pairwise Bayes misclassification rates that rank the binary subproblems in te… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: Code available at: https://github.com/HeroResearchGroup/SmartSVM

    MSC Class: 68T05; 62H30; 62C10 ACM Class: I.2.6