Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Abstract
In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2023
- DOI:
- 10.48550/arXiv.2312.06134
- arXiv:
- arXiv:2312.06134
- Bibcode:
- 2023arXiv231206134C
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Machine Learning