-
DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation
Authors:
Jie Xu,
Karthikeyan Saravanan,
Rogier van Dalen,
Haaris Mehmood,
David Tuckey,
Mete Ozay
Abstract:
Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributi…
▽ More
Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern IoT systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of ε=2.
△ Less
Submitted 28 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Consistency Based Unsupervised Self-training For ASR Personalisation
Authors:
Jisi Zhang,
Vandana Rajan,
Haaris Mehmood,
David Tuckey,
Pablo Peso Parada,
Md Asif Jalal,
Karthikeyan Saravanan,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model…
▽ More
On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model robustness. The majority of ASR personalisation methods assume labelled user data for supervision. Personalisation without any labelled data is challenging due to limited data size and poor quality of recorded audio samples. This work addresses unsupervised personalisation by developing a novel consistency based training method via pseudo-labelling. Our method achieves a relative Word Error Rate Reduction (WERR) of 17.3% on unlabelled training data and 8.1% on held-out data compared to a pre-trained model, and outperforms the current state-of-the art methods.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Tunable Quantum Neural Networks in the QPAC-Learning Framework
Authors:
Viet Pham Ngoc,
David Tuckey,
Herbert Wiklicky
Abstract:
In this paper, we investigate the performances of tunable quantum neural networks in the Quantum Probably Approximately Correct (QPAC) learning framework. Tunable neural networks are quantum circuits made of multi-controlled X gates. By tuning the set of controls these circuits are able to approximate any Boolean functions. This architecture is particularly suited to be used in the QPAC-learning f…
▽ More
In this paper, we investigate the performances of tunable quantum neural networks in the Quantum Probably Approximately Correct (QPAC) learning framework. Tunable neural networks are quantum circuits made of multi-controlled X gates. By tuning the set of controls these circuits are able to approximate any Boolean functions. This architecture is particularly suited to be used in the QPAC-learning framework as it can handle the superposition produced by the oracle. In order to tune the network so that it can approximate a target concept, we have devised and implemented an algorithm based on amplitude amplification. The numerical results show that this approach can efficiently learn concepts from a simple class.
△ Less
Submitted 15 November, 2023; v1 submitted 3 May, 2022;
originally announced May 2022.
-
PASOCS: A Parallel Approximate Solver for Probabilistic Logic Programs under the Credal Semantics
Authors:
David Tuckey,
Alessandra Russo,
Krysia Broda
Abstract:
The Credal semantics is a probabilistic extension of the answer set semantics which can be applied to programs that may or may not be stratified. It assigns to atoms a set of acceptable probability distributions characterised by its lower and upper bounds. Performing exact probabilistic inference in the Credal semantics is computationally intractable. This paper presents a first solver, based on s…
▽ More
The Credal semantics is a probabilistic extension of the answer set semantics which can be applied to programs that may or may not be stratified. It assigns to atoms a set of acceptable probability distributions characterised by its lower and upper bounds. Performing exact probabilistic inference in the Credal semantics is computationally intractable. This paper presents a first solver, based on sampling, for probabilistic inference under the Credal semantics called PASOCS (Parallel Approximate SOlver for the Credal Semantics). PASOCS performs both exact and approximate inference for queries given evidence. Approximate solutions can be generated using any of the following sampling methods: naive sampling, Metropolis-Hastings and Gibbs Markov Chain Monte-Carlo. We evaluate the fidelity and performance of our system when applied to both stratified and non-stratified programs. We perform a sanity check by comparing PASOCS to available systems for stratified programs, where the semantics agree, and show that our system is competitive on unstratified programs.
△ Less
Submitted 23 May, 2021;
originally announced May 2021.
-
A general framework for scientifically inspired explanations in AI
Authors:
David Tuckey,
Alessandra Russo,
Krysia Broda
Abstract:
Explainability in AI is gaining attention in the computer science community in response to the increasing success of deep learning and the important need of justifying how such systems make predictions in life-critical applications. The focus of explainability in AI has predominantly been on trying to gain insights into how machine learning systems function by exploring relationships between input…
▽ More
Explainability in AI is gaining attention in the computer science community in response to the increasing success of deep learning and the important need of justifying how such systems make predictions in life-critical applications. The focus of explainability in AI has predominantly been on trying to gain insights into how machine learning systems function by exploring relationships between input data and predicted outcomes or by extracting simpler interpretable models. Through literature surveys of philosophy and social science, authors have highlighted the sharp difference between these generated explanations and human-made explanations and claimed that current explanations in AI do not take into account the complexity of human interaction to allow for effective information passing to not-expert users. In this paper we instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented. This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations. We illustrate how we can utilize this framework through two very different examples: an artificial neural network and a Prolog solver and we provide a possible implementation for both examples.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Saliency Maps Generation for Automatic Text Summarization
Authors:
David Tuckey,
Krysia Broda,
Alessandra Russo
Abstract:
Saliency map generation techniques are at the forefront of explainable AI literature for a broad range of machine learning applications. Our goal is to question the limits of these approaches on more complex tasks. In this paper we apply Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained on a text summarization dataset. We obtain unexpected saliency maps and d…
▽ More
Saliency map generation techniques are at the forefront of explainable AI literature for a broad range of machine learning applications. Our goal is to question the limits of these approaches on more complex tasks. In this paper we apply Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained on a text summarization dataset. We obtain unexpected saliency maps and discuss the rightfulness of these "explanations". We argue that we need a quantitative way of testing the counterfactual case to judge the truthfulness of the saliency maps. We suggest a protocol to check the validity of the importance attributed to the input and show that the saliency maps obtained sometimes capture the real use of the input features by the network, and sometimes do not. We use this example to discuss how careful we need to be when accepting them as explanation.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.