Computer Science > Machine Learning

arXiv:2407.03080 (cs)

[Submitted on 3 Jul 2024]

Title:Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Authors:Patricia A. Apellániz, Ana Jiménez, Borja Arroyo Galende, Juan Parras, Santiago Zazo

Abstract:While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on substantial training data, often unavailable in real-world applications. This paper addresses this challenge by proposing a novel methodology for generating realistic and reliable synthetic tabular data with DGMs in limited real-data environments. Our approach proposes several ways to generate an artificial inductive bias in a DGM through transfer learning and meta-learning techniques. We explore and compare four different methods within this framework, demonstrating that transfer learning strategies like pre-training and model averaging outperform meta-learning approaches, like Model-Agnostic Meta-Learning, and Domain Randomized Search. We validate our approach using two state-of-the-art DGMs, namely, a Variational Autoencoder and a Generative Adversarial Network, to show that our artificial inductive bias fuels superior synthetic data quality, as measured by Jensen-Shannon divergence, achieving relative gains of up to 50\% when using our proposed approach. This methodology has broad applicability in various DGMs and machine learning tasks, particularly in areas like healthcare and finance, where data scarcity is often a critical issue.

Comments:	19 pages, 6 Figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
MSC classes:	I.2.0
Cite as:	arXiv:2407.03080 [cs.LG]
	(or arXiv:2407.03080v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.03080

Submission history

From: Patricia Alonso De Apellániz [view email]
[v1] Wed, 3 Jul 2024 12:53:42 UTC (1,485 KB)

Computer Science > Machine Learning

Title:Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators