subscribe to arXiv mailings

Exploring compressibility of transformer based text-to-music (TTM) models

Authors: Vasileios Moschopoulos, Thanasis Kotsiopoulos, Pablo Peso Parada, Konstantinos Nikiforidis, Alexandros Stergiadis, Gerasimos Papakostas, Md Asif Jalal, Jisi Zhang, Anastasios Drosou, Karthikeyan Saravanan

Abstract: State-of-the art Text-To-Music (TTM) generative AI models are large and require desktop or server class compute, making them infeasible for deployment on mobile phones. This paper presents an analysis of trade-offs between model compression and generation performance of TTM models. We study compression through knowledge distillation and specific modifications that enable applicability over the var… ▽ More State-of-the art Text-To-Music (TTM) generative AI models are large and require desktop or server class compute, making them infeasible for deployment on mobile phones. This paper presents an analysis of trade-offs between model compression and generation performance of TTM models. We study compression through knowledge distillation and specific modifications that enable applicability over the various components of the TTM model (encoder, generative model and the decoder). Leveraging these methods we create TinyTTM (89.2M params) that achieves a FAD of 3.66 and KL of 1.32 on MusicBench dataset, better than MusicGen-Small (557.6M params) but not lower than MusicGen-small fine-tuned on MusicBench. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Proceedings of INTERSPEECH 2024

arXiv:2405.06368 [pdf, other]

DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

Authors: Jie Xu, Karthikeyan Saravanan, Rogier van Dalen, Haaris Mehmood, David Tuckey, Mete Ozay

Abstract: Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributi… ▽ More Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern IoT systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of ε=2. △ Less

Submitted 28 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: 16 pages, 10 figures, 5 tables

arXiv:2401.13146 [pdf, other]

Locality enhanced dynamic biasing and sampling strategies for contextual ASR

Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the training of CB for ASR with correlation plots between the bias embeddings among various training stages. Secondly, we introduce a neighbourhood attention (NA) that localizes self attention (SA) to the nearest neighbouring frames to further refine the CB output. The results show that this proposed approach provides on average a 25.84% relative WER improvement on LibriSpeech sets and rare-word evaluation compared to the baseline. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted for IEEE ASRU 2023

arXiv:2401.12085 [pdf, other]

Consistency Based Unsupervised Self-training For ASR Personalisation

Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model robustness. The majority of ASR personalisation methods assume labelled user data for supervision. Personalisation without any labelled data is challenging due to limited data size and poor quality of recorded audio samples. This work addresses unsupervised personalisation by developing a novel consistency based training method via pseudo-labelling. Our method achieves a relative Word Error Rate Reduction (WERR) of 17.3% on unlabelled training data and 8.1% on held-out data compared to a pre-trained model, and outperforms the current state-of-the art methods. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted for IEEE ASRU 2023

arXiv:2308.01385 [pdf, other]

doi 10.1145/3570361.3592498

BEAVIS: Balloon Enabled Aerial Vehicle for IoT and Sensing

Authors: Suryansh Sharma, Ashutosh Simha, R. Venkatesha Prasad, Shubham Deshmukh, Kavin B. Saravanan, Ravi Ramesh, Luca Mottola

Abstract: UAVs are becoming versatile and valuable platforms for various applications. However, the main limitation is their flying time. We present BEAVIS, a novel aerial robotic platform striking an unparalleled trade-off between the manoeuvrability of drones and the long lasting capacity of blimps. BEAVIS scores highly in applications where drones enjoy unconstrained mobility yet suffer from limited life… ▽ More UAVs are becoming versatile and valuable platforms for various applications. However, the main limitation is their flying time. We present BEAVIS, a novel aerial robotic platform striking an unparalleled trade-off between the manoeuvrability of drones and the long lasting capacity of blimps. BEAVIS scores highly in applications where drones enjoy unconstrained mobility yet suffer from limited lifetime. A nonlinear flight controller exploiting novel, unexplored, aerodynamic phenomena to regulate the ambient pressure and enable all translational and yaw degrees of freedom is proposed without direct actuation in the vertical direction. BEAVIS has built-in rotor fault detection and tolerance. We explain the design and the necessary background in detail. We verify the dynamics of BEAVIS and demonstrate its distinct advantages, such as agility, over existing platforms including the degrees of freedom akin to a drone with 11.36x increased lifetime. We exemplify the potential of BEAVIS to become an invaluable platform for many applications. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: To be published in the 29th Annual International Conference on Mobile Computing and Networking (ACM MobiCom 23), October 2-6, 2023, Madrid, Spain. ACM, New York, NY, USA, 15 pages

arXiv:2307.13343 [pdf, other]

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

Authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

Abstract: Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition… ▽ More Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition (ASR). The proposed framework attaches flexible gradient reversal based speaker adversarial layers to target layers within an ASR model, where speaker adversarial training anonymizes acoustic embeddings generated by the targeted layers to remove speaker identity. We propose on-device deployment by execution of initial layers of the ASR model, and transmitting anonymized embeddings to the cloud, where the rest of the model is executed while preserving privacy. Experimental results show that our method efficiently reduces speaker recognition relative accuracy by 33%, and improves ASR performance by achieving 6.2% relative Word Error Rate (WER) reduction. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Proceedings of INTERSPEECH 2023

arXiv:2207.04949 [pdf, ps, other]

pMCT: Patched Multi-Condition Training for Robust Speech Recognition

Authors: Pablo Peso Parada, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay

Abstract: We propose a novel Patched Multi-Condition Training (pMCT) method for robust Automatic Speech Recognition (ASR). pMCT employs Multi-condition Audio Modification and Patching (MAMP) via mixing {\it patches} of the same utterance extracted from clean and distorted speech. Training using patch-modified signals improves robustness of models in noisy reverberant scenarios. Our proposed pMCT is evaluate… ▽ More We propose a novel Patched Multi-Condition Training (pMCT) method for robust Automatic Speech Recognition (ASR). pMCT employs Multi-condition Audio Modification and Patching (MAMP) via mixing {\it patches} of the same utterance extracted from clean and distorted speech. Training using patch-modified signals improves robustness of models in noisy reverberant scenarios. Our proposed pMCT is evaluated on the LibriSpeech dataset showing improvement over using vanilla Multi-Condition Training (MCT). For analyses on robust ASR, we employed pMCT on the VOiCES dataset which is a noisy reverberant dataset created using utterances from LibriSpeech. In the analyses, pMCT achieves 23.1% relative WER reduction compared to the MCT. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted at Interspeech 2022

arXiv:2206.02797 [pdf, ps, other]

FedNST: Federated Noisy Student Training for Automatic Speech Recognition

Authors: Haaris Mehmood, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay

Abstract: Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems, hence preventing transmission of raw user data to a central server. A key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients. Existing approaches rely on clients to manually transcribe their speech, whic… ▽ More Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems, hence preventing transmission of raw user data to a central server. A key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients. Existing approaches rely on clients to manually transcribe their speech, which is impractical for obtaining large training corpora. A promising alternative is using semi-/self-supervised learning approaches to leverage unlabelled user data. To this end, we propose FedNST, a novel method for training distributed ASR models using private and unlabelled user data. We explore various facets of FedNST, such as training models with different proportions of labelled and unlabelled data, and evaluate the proposed approach on 1173 simulated clients. Evaluating FedNST on LibriSpeech, where 960 hours of speech data is split equally into server (labelled) and client (unlabelled) data, showed a 22.5% relative word error rate reduction} (WERR) over a supervised baseline trained only on server data. △ Less

Submitted 12 July, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted at Interspeech 2022

ACM Class: I.2.11

arXiv:1403.0068 [pdf]

doi 10.14445/22315381/IJETT-V8P252

Semantic Annotation and Search for Educational Resources Supporting Distance Learning

Authors: C. Nithya, K. Saravanan

Abstract: Multimedia educational resources play an important role in education, particularly for distance learning environments. With the rapid growth of the multimedia web, large numbers of education articles video resources are increasingly being created by several different organizations. It is crucial to explore, share, reuse, and link these educational resources for better e-learning experiences. Most… ▽ More Multimedia educational resources play an important role in education, particularly for distance learning environments. With the rapid growth of the multimedia web, large numbers of education articles video resources are increasingly being created by several different organizations. It is crucial to explore, share, reuse, and link these educational resources for better e-learning experiences. Most of the video resources are currently annotated in an isolated way, which means that they lack semantic connections. Thus, providing the facilities for annotating these video resources is highly demanded. These facilities create the semantic connections among video resources and allow their metadata to be understood globally. Adopting Linked Data technology, this paper introduces a video annotation and browser platform with two online tools: Notitia and Sansu-Wolke. Notitia enables users to semantically annotate video resources using vocabularies defined in the Linked Data cloud. Sansu-Wolke allows users to browse semantically linked educational video resources with enhanced web information from different online resources. In the prototype development, the platform uses existing video resources for education articles. The result of the initial development demonstrates the benefits of applying Linked Data technology in the aspects of reusability, scalability, and extensibility △ Less

Submitted 1 March, 2014; originally announced March 2014.

Comments: Linked Data, Semantic search, Cloud Applications, Web services, Semantic annotation, Ontology

Journal ref: IJETT V8(6),277-285 February 2014. ISSN:2231-5381

arXiv:1402.2509 [pdf]

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Authors: M. Subha, K. Saravanan

Abstract: Building high quality cloud applications becomes an urgently required research problem. Nonfunctional performance of cloud services is usually described by quality-of-service (QoS). In cloud applications, cloud services are invoked remotely by internet connections. The QoS Ranking of cloud services for a user cannot be transferred directly to another user, since the locations of the cloud applicat… ▽ More Building high quality cloud applications becomes an urgently required research problem. Nonfunctional performance of cloud services is usually described by quality-of-service (QoS). In cloud applications, cloud services are invoked remotely by internet connections. The QoS Ranking of cloud services for a user cannot be transferred directly to another user, since the locations of the cloud applications are quite different. Personalized QoS Ranking is required to evaluate all candidate services at the user - side but it is impractical in reality. To get QoS values, the service candidates are usually required and it is very expensive. To avoid time consuming and expensive realworld service invocations, this paper proposes a CloudRank framework which predicts the QoS ranking directly without predicting the corresponding QoS values. This framework provides an accurate ranking but the QoS values are same in both algorithms so, an optimal VM allocation policy is used to improve the QoS performance of cloud services and it also provides better ranking accuracy than CloudRank2 algorithm. △ Less

Submitted 11 February, 2014; originally announced February 2014.

Comments: 6 pages, 10 figures, Published with International Journal of Engineering Trends and Technology (IJETT)

Journal ref: International Journal of Engineering Trends and Technology (IJETT) 6(6):307-312, December 2013

arXiv:1402.2491 [pdf]

Optimizing the Cost for Resource Subscription Policy in IaaS Cloud

Authors: M. Uthaya Banu, K. Saravanan

Abstract: Cloud computing allow the users to efficiently and dynamically provision computing resource to meet their IT needs. Cloud Provider offers two subscription plan to the customer namely reservation and on-demand. The reservation plan is typically cheaper than on-demand plan. If the actual computing demand is known in advance reserving the resource would be straightforward. The challenge is how to mak… ▽ More Cloud computing allow the users to efficiently and dynamically provision computing resource to meet their IT needs. Cloud Provider offers two subscription plan to the customer namely reservation and on-demand. The reservation plan is typically cheaper than on-demand plan. If the actual computing demand is known in advance reserving the resource would be straightforward. The challenge is how to make properly resource provisioning and how the customers efficiently purchase the provisioning options under reservation and on-demand. To address this issue, two-phase algorithm are proposed to minimize service provision cost in both reservation and on-demand plan. To reserve the correct and optimal amount of resources during reservation, proposed a mathematical formulae in the first phase. To predict resource demand, use kalman filter in the second phase. The evaluation result shows that the two-phase algorithm can significantly reduce the provision cost and the prediction is of reasonable accuracy. △ Less

Submitted 11 February, 2014; originally announced February 2014.

Comments: 6 pages,8 figures,"Published with International Journal of Engineering Trends and Technology (IJETT)". M.Uthaya Banu, K.Saravanan. Article:Optimizing the Cost for Resource Subscription Policy in IaaS Cloud

Journal ref: International Journal of Engineering Trends and Technology (IJETT) 6(6):296-301, December 2013

arXiv:1210.2977 [pdf]

An Effective Fusion Technique of Cloud Computing and Networking Series

Authors: K. Saravanan, S. Akshaya, R. Pavithra, K. Pushpavalli

Abstract: Cloud computing is making it possible to separate the process of building an infrastructure for service provisioning from the business of providing end user services. Today, such infrastructures are normally provided in large data centres and the applications are executed remotely from the users. One reason for this is that cloud computing requires a reasonably stable infrastructure and networking… ▽ More Cloud computing is making it possible to separate the process of building an infrastructure for service provisioning from the business of providing end user services. Today, such infrastructures are normally provided in large data centres and the applications are executed remotely from the users. One reason for this is that cloud computing requires a reasonably stable infrastructure and networking environment, largely due to management reasons. Networking of Information (NetInf) is an information centric networking paradigm that can support cloud computing by providing new possibilities for network transport and storage. It offers direct access to information objects through a simple API, independent of their location in the network. This abstraction can hide much of the complexity of storage and network transport systems that cloud computing today has to deal with. In this paper we analyze how cloud computing and NetInf can be combined to make cloud computing infrastructures easier to manage, and potentially enable deployment in smaller and more dynamic networking environments. NetInf should thus be understood as an enhancement to the infrastructure for cloud computing rather than a change to cloud computing technology as such. To illustrate the approach taken by NetInf, we also describe how it can be implemented by introducing a specific name resolution and routing mechanism. △ Less

Submitted 10 October, 2012; originally announced October 2012.

arXiv:1210.2971 [pdf]

A new application of Multi modal Biometrics in home and office security system

Authors: K. Saravanan, C. Saranya, M. Saranya

Abstract: Biometric door lock security systems are used at those places where you have important information and stuffs. In that kind of places multibiometric electronic door lock security systems that are based on finger print and iris recognization.Multibiometric door lock security systems are used to prevent the door related burglaries such as break ins occurred in different forms so this is the best met… ▽ More Biometric door lock security systems are used at those places where you have important information and stuffs. In that kind of places multibiometric electronic door lock security systems that are based on finger print and iris recognization.Multibiometric door lock security systems are used to prevent the door related burglaries such as break ins occurred in different forms so this is the best method to prevent this type of happenings. Unlike keyed locks, there is no need to take the keys with you when you go out without necessary of worrying about losing keys.This paper proposes a multimodal biometrics door lock system with iris and fingerprint as a computer application for automatically identifying or verifying a person from fingerprint iris recognition system. The first stage is identification and second one is verifying that whether he is a genuine user or imposter. During second stage system compares the input set with all available stored set in database. This comparison gives a ranked list of matches. Based on the rank retrieved an alarm is activated automatically when any unauthorized person tries to open the door. So this kind of multimodal biometrics will provide a highly secured and authenticated access. △ Less

Submitted 10 October, 2012; originally announced October 2012.

arXiv:1203.4649 [pdf]

A Novel Bluetooth Man-In-The-Middle Attack Based On SSP using OOB Association model

Authors: K. Saravanan, L. Vijayanand, R. K. Negesh

Abstract: As an interconnection technology, Bluetooth has to address all traditional security problems, well known from the distributed networks. Moreover, as Bluetooth networks are formed by the radio links, there are also additional security aspects whose impact is yet not well understood. In this paper, we propose a novel Man-In-The-Middle (MITM) attack against Bluetooth enabled mobile phone that support… ▽ More As an interconnection technology, Bluetooth has to address all traditional security problems, well known from the distributed networks. Moreover, as Bluetooth networks are formed by the radio links, there are also additional security aspects whose impact is yet not well understood. In this paper, we propose a novel Man-In-The-Middle (MITM) attack against Bluetooth enabled mobile phone that support Simple Secure Pairing(SSP). From the literature it was proved that the SSP association models such as Numeric comparison, Just works and passkey Entry are not more secure. Here we propose the Out Of Band (OOB) channeling with enhanced security than the previous methods. △ Less

Submitted 21 March, 2012; originally announced March 2012.

Report number: EMICS12

arXiv:1202.2024 [pdf]

Packet Score based network security and Traffic Optimization

Authors: k. Saravanan, S. Karthik

Abstract: One of the critical threat to internet security is Distributed Denial of Service (DDoS). This paper by the introduction of automated online attack classification and attack packet discarding helps to resolve the network security issue by certain level. The incoming packets are assigned scores based on the priority associated with the attributes and on comparison with probability distribution of ar… ▽ More One of the critical threat to internet security is Distributed Denial of Service (DDoS). This paper by the introduction of automated online attack classification and attack packet discarding helps to resolve the network security issue by certain level. The incoming packets are assigned scores based on the priority associated with the attributes and on comparison with probability distribution of arriving packets on per packet basis. △ Less

Submitted 9 February, 2012; originally announced February 2012.

Showing 1–15 of 15 results for author: Saravanan, K