Skip to main content

Showing 1–50 of 141 results for author: Kumar, S

  1. arXiv:2407.10837  [pdf, other

    eess.SY cs.RO math.DS

    Trajectory Tracking for Unmanned Aerial Vehicles in 3D Spaces under Motion Constraints

    Authors: Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha

    Abstract: This article presents a three-dimensional nonlinear trajectory tracking control strategy for unmanned aerial vehicles (UAVs) in the presence of spatial constraints. As opposed to many existing control strategies, which do not consider spatial constraints, the proposed strategy considers spatial constraints on each degree of freedom movement of the UAV. Such consideration makes the design appealing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.08655  [pdf, other

    eess.IV cs.AI cs.LG physics.med-ph

    SPOCKMIP: Segmentation of Vessels in MRAs with Enhanced Continuity using Maximum Intensity Projection as Loss

    Authors: Chethan Radhakrishna, Karthikesh Varma Chintalapati, Sri Chandana Hudukula Ram Kumar, Raviteja Sutrave, Hendrik Mattern, Oliver Speck, Andreas Nürnberger, Soumick Chatterjee

    Abstract: Identification of vessel structures of different sizes in biomedical images is crucial in the diagnosis of many neurodegenerative diseases. However, the sparsity of good-quality annotations of such images makes the task of vessel segmentation challenging. Deep learning offers an efficient way to segment vessels of different sizes by learning their high-level feature representations and the spatial… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.06868  [pdf, other

    cs.IT cs.LG eess.SP

    Energy Efficient Fair STAR-RIS for Mobile Users

    Authors: Ashok S. Kumar, Nancy Nayak, Sheetal Kalyani, Himal A. Suraweera

    Abstract: In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS e… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.04444  [pdf, other

    cs.CL cs.SD eess.AS

    TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, double column

  5. arXiv:2407.04439  [pdf, other

    eess.AS

    XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Iuliia Nigmatulina, Petr Motlicek, Manjunath K E, Aravind Ganapathiraju

    Abstract: Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data for training. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer set… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, double column

  6. arXiv:2406.12405  [pdf

    cs.IT cs.ET eess.SP

    On The Effective Rate and Error Rate Analysis over Fluctuating Nakagami-m Fading Channel

    Authors: Manpreet Kaur, Puspraj Singh Chauhan, Sandeep Kumar, Pappu Kumar Verma

    Abstract: This paper provides a detailed analysis of the important performance metrics like effective capacity and symbol error rate over fluctuating Nakagami-m fading channel. This distribution is obtained from the ratio of two random variables, following the Nakagami-m distribution and the uniform distribution. Our study derives exact analytical expressions for the EC and SER under different modulation sc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages

  7. arXiv:2406.11768  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

    Authors: Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

    Abstract: Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including feat… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Project Website: https://sreyan88.github.io/gamaaudio/

  8. arXiv:2406.09443  [pdf, other

    eess.AS cs.HC cs.LG

    Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

    Authors: Satyam Kumar, Sai Srujana Buddi, Utkarsh Oggy Sarawgi, Vineet Garg, Shivesh Ranjan, Ognjen, Rudovic, Ahmed Hussen Abdelaziz, Saurabh Adya

    Abstract: Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.09167  [pdf, other

    cs.SD eess.AS

    Vision Transformer Segmentation for Visual Bird Sound Denoising

    Authors: Sahil Kumar, Jialu Li, Youshan Zhang

    Abstract: Audio denoising, especially in the context of bird sounds, remains a challenging task due to persistent residual noise. Traditional and deep learning methods often struggle with artificial or low-frequency noise. In this work, we propose ViTVS, a novel approach that leverages the power of the vision transformer (ViT) architecture. ViTVS adeptly combines segmentation techniques to disentangle clean… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  10. arXiv:2406.07910  [pdf, other

    cs.ET cs.NI eess.SP

    Demonstration of Safe Electromagnetic Radiation Emitted by 5G Active Antenna Systems

    Authors: Sumit Kumar, Chandan Kumar Sheemar, Abdelrahman Astro, Jorge Querol, Symeon Chatzinotas

    Abstract: The careful planning and safe deployment of 5G technologies will bring enormous benefits to society and the economy. Higher frequency, beamforming, and small-cells are key technologies that will provide unmatched throughput and seamless connectivity to 5G users. Superficial knowledge of these technologies has raised concerns among the general public about the harmful effects of radiation. Several… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.04432  [pdf, other

    eess.AS cs.AI cs.CL

    LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

    Authors: Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha

    Abstract: Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveraging visual cues for noise-robust ASR. Instead of learning the cross-modal correlation between the audio and visual modalities, we make an LLM learn the task of vis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: InterSpeech 2024. Code and Data: https://github.com/Sreyan88/LipGER

  12. arXiv:2405.18297  [pdf, other

    eess.SP

    Artificial Intelligence Satellite Telecommunication Testbed using Commercial Off-The-Shelf Chipsets

    Authors: Luis M. Garces, Amirhossein Nik, Flor Ortiz, Juan A. Vásquez-Peralvo, Jorge L. Gonzalez, Mouhamad Chehailty, Marcele Kuhfuss, Eva Lagunas, Jan Thoemel, Sumit Kumar, Vishal Singh, Juan C. Duncan, Sahar Malmir, Swetha Varadajulu, Jorge Querol, Symeon Chatzinotas

    Abstract: The Artificial Intelligence Satellite Telecommunications Testbed (AISTT), part of the ESA project SPAICE, is focused on the transformation of the satellite payload by using artificial intelligence (AI) and machine learning (ML) methodologies over available commercial off-the-shelf (COTS) AI chips for on-board processing. The objectives include validating artificial intelligence-driven SATCOM scena… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Submitted to SPAICE Conference 2024: AI in and for Space, 5 pages, 3 figures

    Journal ref: SPAICE Conference 2024

  13. arXiv:2405.06726  [pdf, other

    eess.SY

    Region of Attraction Estimation for Free-Floating Systems under Time-Varying LQR Control

    Authors: Lasse Shala, Shubham Vyas, Mohamed Khalil Ben-Larbi, Shivesh Kumar, Enrico Stoll

    Abstract: Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar k… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  14. arXiv:2405.05937  [pdf, other

    eess.SP eess.SY

    Dynamics of a Towed Cable with Sensor-Array for Underwater Target Motion Analysis

    Authors: Rohit Kumar Singh, Subrata Kumar, Shovan Bhaumik

    Abstract: During a war situation, many times an underwater target motion analysis (TMA) is performed using bearing-only measurements, obtained from a sensor array, which is towed by an own-ship with the help of a connected cable. It is well known that the own-ship is required to perform a manoeuvre in order to make the system observable and localise the target successfully. During the maneuver, it is import… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  15. arXiv:2404.09493  [pdf, ps, other

    eess.SP cs.HC cs.NE

    Novel entropy difference-based EEG channel selection technique for automated detection of ADHD

    Authors: Shishir Maheshwari, Kandala N V P S Rajesh, Vivek Kanhangad, U Rajendra Acharya, T Sunil Kumar

    Abstract: Attention deficit hyperactivity disorder (ADHD) is one of the common neurodevelopmental disorders in children. This paper presents an automated approach for ADHD detection using the proposed entropy difference (EnD)- based encephalogram (EEG) channel selection approach. In the proposed approach, we selected the most significant EEG channels for the accurate identification of ADHD using an EnD-base… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.00122  [pdf, other

    cs.CV eess.IV

    AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation

    Authors: Peijie Qiu, Jin Yang, Sayantan Kumar, Soumyendu Sekhar Ghosh, Aristeidis Sotiras

    Abstract: In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in a variety of medical image segmentation tasks. Recently, the introduction of the vision transformer (ViT) has significantly altered the landscape of deep segmentation models. There has been a growing focus on ViTs, driven by their excellent performance and scalabilit… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  17. Intelligent fault diagnosis of worm gearbox based on adaptive CNN using amended gorilla troop optimization with quantum gate mutation strategy

    Authors: Govind Vashishtha, Sumika Chauhan, Surinder Kumar, Rajesh Kumar, Radoslaw Zimroz, Anil Kumar

    Abstract: The worm gearbox is a high-speed transmission system that plays a vital role in various industries. Therefore it becomes necessary to develop a robust fault diagnosis scheme for worm gearbox. Due to advancements in sensor technology, researchers from academia and industries prefer deep learning models for fault diagnosis purposes. The optimal selection of hyperparameters (HPs) of deep learning mod… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: Knowledge-Based Systems Volume 280, 25 November 2023, 110984

  18. arXiv:2403.10966  [pdf, other

    cs.RO eess.SY

    Robust Co-Design of Canonical Underactuated Systems for Increased Certifiable Stability

    Authors: Federico Girlanda, Lasse Shala, Shivesh Kumar, Frank Kirchner

    Abstract: Optimal behaviours of a system to perform a specific task can be achieved by leveraging the coupling between trajectory optimization, stabilization, and design optimization. This approach is particularly advantageous for underactuated systems, which are systems that have fewer actuators than degrees of freedom and thus require for more elaborate control systems. This paper proposes a novel co-desi… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Copr. 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. PREPRINT

  19. Asynchronous Distributed Coordinated Hybrid Precoding in Multi-cell mmWave Wireless Networks

    Authors: Meesam Jafri, Suraj Srivastava, Sunil Kumar, Aditya K. Jagannatham, Lajos Hanzo

    Abstract: Asynchronous distributed hybrid beamformers (ADBF) are conceived for minimizing the total transmit power subject to signal-to-interference-plus-noise ratio (SINR) constraints at the users. Our design requires only limited information exchange between the base stations (BSs) of the mmWave multi-cell coordinated (MCC) networks considered. To begin with, a semidefinite relaxation (SDR)-based fully-di… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Journal ref: IEEE Open Journal of Vehicular Technology, vol. 5, pp. 200-218, 2024

  20. arXiv:2402.06176  [pdf, other

    eess.SY cs.MA cs.RO math.DS math.OC

    Cooperative Nonlinear Guidance Strategies for Guaranteed Pursuit-Evasion

    Authors: Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha

    Abstract: This paper addresses the pursuit-evasion problem involving three agents -- a purser, an evader, and a defender. We develop cooperative guidance laws for the evader-defender team that guarantee that the defender intercepts the pursuer before it reaches the vicinity of the evader. Unlike heuristic methods, optimal control, differential game formulation, and recently proposed time-constrained guidanc… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  21. arXiv:2402.05918  [pdf, other

    eess.SY cs.MA math.DS math.OC nlin.AO

    Consensus-driven Deviated Pursuit for Guaranteed Simultaneous Interception of Moving Targets

    Authors: Abhinav Sinha, Dwaipayan Mukherjee, Shashi Ranjan Kumar

    Abstract: This work proposes a cooperative strategy that employs deviated pursuit guidance to simultaneously intercept a moving (but not manoeuvring) target. As opposed to many existing cooperative guidance strategies which use estimates of time-to-go, based on proportional-navigation guidance, the proposed strategy uses an exact expression for time-to-go to ensure simultaneous interception. The guidance de… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  22. arXiv:2402.05273  [pdf, other

    eess.SY

    ASCENT: A Context-Aware Spectrum Coexistence Design and Implementation Toolset for Policymakers in Satellite Bands

    Authors: Ta-seen Reaz Niloy, Saurav Kumar, Aniruddha Hore, Zoheb Hassan, Carl Dietrich, Eric W. Burger, Jeffrey H. Reed, Vijay K. Shah

    Abstract: This paper introduces ASCENT (context Aware Spectrum Coexistence Design and Implementation) toolset, an advanced context-aware terrestrial satellite spectrum sharing toolset designed for researchers, policymakers, and regulators. It serves two essential purposes (a) evaluating the potential for harmful interference to primary users in satellite bands and (b) facilitating the analysis, design, and… ▽ More

    Submitted 15 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  23. arXiv:2402.01933  [pdf, other

    eess.AS cs.SD

    ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush

    Authors: Kuang Yuan, Mohamed Ibrahim, Yiwen Song, Guoxiang Deng, Suvendra Vijayan, Robert Nerone, Akshay Gadre, Swarun Kumar

    Abstract: Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    ACM Class: J.3; C.3; H.5.2

  24. arXiv:2401.08468  [pdf, other

    math.ST cs.LG eess.SP

    Keep or toss? A nonparametric score to evaluate solutions for noisy ICA

    Authors: Syamantak Kumar, Purnamrita Sarkar, Peter Bickel, Derek Bean

    Abstract: Independent Component Analysis (ICA) was introduced in the 1980's as a model for Blind Source Separation (BSS), which refers to the process of recovering the sources underlying a mixture of signals, with little knowledge about the source signals or the mixing process. While there are many sophisticated algorithms for estimation, different methods have different shortcomings. In this paper, we deve… ▽ More

    Submitted 9 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  25. arXiv:2401.05376  [pdf, other

    eess.SP cs.HC

    Eating Speed Measurement Using Wrist-Worn IMU Sensors in Free-Living Environments

    Authors: Chunzhuo Wang, T. Sunil Kumar, Walter De Raedt, Guido Camps, Hans Hallez, Bart Vanrumste

    Abstract: Eating speed is an important indicator that has been widely scrutinized in nutritional studies. The relationship between eating speed and several intake-related problems such as obesity, diabetes, and oral health has received increased attention from researchers. However, existing studies mainly use self-reported questionnaires to obtain participants' eating speed, where they choose options from s… ▽ More

    Submitted 15 December, 2023; originally announced January 2024.

  26. arXiv:2312.09842  [pdf, ps, other

    cs.SD eess.AS

    On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

    Authors: Nagaraj Adiga, Jinhwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda

    Abstract: Recently, the cascaded two-pass architecture has emerged as a strong contender for on-device automatic speech recognition (ASR). A cascade of causal and shallow non-causal encoders coupled with a shared decoder enables operation in both streaming and look-ahead modes. In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation,… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  27. arXiv:2311.16409  [pdf, other

    cs.NI eess.SP

    A Deep Q-Learning based, Base-Station Connectivity-Aware, Decentralized Pheromone Mobility Model for Autonomous UAV Networks

    Authors: Shreyas Devaraju, Alexander Ihler, Sunil Kumar

    Abstract: UAV networks consisting of low SWaP (size, weight, and power), fixed-wing UAVs are used in many applications, including area monitoring, search and rescue, surveillance, and tracking. Performing these operations efficiently requires a scalable, decentralized, autonomous UAV network architecture with high network connectivity. Whereas fast area coverage is needed for quickly sensing the area, stron… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  28. arXiv:2311.11521  [pdf

    cs.IT eess.SP

    On the Effective throughput of Shadowed Beaulieu-Xie fading channel

    Authors: Manpreet Kaur, Sandeep Kumar, Poonam Yadav, Puspraj Singh Chauhan

    Abstract: Given the imperative for advanced wireless networks in the next generation and the rise of real-time applications within wireless communication, there is a notable focus on investigating data rate performance across various fading scenarios. This research delved into analyzing the effective throughput of the shadowed Beaulieu-Xie (SBX) composite fading channel using the PDF-based approach. To get… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 18 pages

  29. arXiv:2310.08846  [pdf, other

    eess.AS

    Speaking rate attention-based duration prediction for speed control TTS

    Authors: Jesuraj Bandekar, Sathvik Udupa, Abhayjeet Singh, Anjali Jayakumar, Deekshitha G, Sandhya Badiger, Saurabh Kumar, Pooja VH, Prasanta Kumar Ghosh

    Abstract: With the advent of high-quality speech synthesis, there is a lot of interest in controlling various prosodic attributes of speech. Speaking rate is an essential attribute towards modelling the expressivity of speech. In this work, we propose a novel approach to control the speaking rate for non-autoregressive TTS. We achieve this by conditioning the speaking rate inside the duration predictor, all… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  30. arXiv:2310.08753  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

    Authors: Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, Ramaneswaran S, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

    Abstract: A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perfo… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  31. Analysis of system capacity and spectral efficiency of fixed-grid network

    Authors: Adarsha M, S. Malathi, Santosh Kumar

    Abstract: In this article, the performance of a fixed grid network is examined for various modulation formats to estimate the system's capacity and spectral efficiency. The optical In-phase Quadrature Modulator structure is used to build a fixed grid network modulation, and the homodyne detection approach is used for the receiver. Data multiplexing is accomplished using the Polarization Division Multiplexed… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.5, September 2023

  32. arXiv:2309.14923  [pdf, other

    eess.SP

    ML-based PBCH symbol detection and equalization for 5G Non-Terrestrial Networks

    Authors: Inés Larráyoz-Arrigote, Marcele O. K. Mendonca, Alejandro Gonzalez-Garrido, Jevgenij Krivochiza, Sumit Kumar, Jorge Querol, Joel Grotz, Stefano Andrenacci, Symeon Chatzinotas

    Abstract: This paper delves into the application of Machine Learning (ML) techniques in the realm of 5G Non-Terrestrial Networks (5G-NTN), particularly focusing on symbol detection and equalization for the Physical Broadcast Channel (PBCH). As 5G-NTN gains prominence within the 3GPP ecosystem, ML offers significant potential to enhance wireless communication performance. To investigate these possibilities,… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  33. arXiv:2309.13716  [pdf, other

    cs.CV eess.IV

    MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP

    Authors: Prajwal Ganugula, Y S S S Santosh Kumar, N K Sagar Reddy, Prabhath Chellingi, Avinash Thakur, Neeraj Kasera, C Shyam Anand

    Abstract: Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: Camera ready, New Ideas in Vision Transformers workshop, ICCV 2023

  34. arXiv:2309.09836  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    RECAP: Retrieval-Augmented Audio Captioning

    Authors: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha

    Abstract: We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning. To generate a caption for an audio sample, we leverage an audio-t… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICASSP 2024. Code and data: https://github.com/Sreyan88/RECAP

  35. arXiv:2308.11673  [pdf, other

    eess.SP cs.LG

    WEARS: Wearable Emotion AI with Real-time Sensor data

    Authors: Dhruv Limbani, Daketi Yatin, Nitish Chaturvedi, Vaishnavi Moorthy, Pushpalatha M, Harichandana BSS, Sumit Kumar

    Abstract: Emotion prediction is the field of study to understand human emotions. Existing methods focus on modalities like text, audio, facial expressions, etc., which could be private to the user. Emotion can be derived from the subject's psychological data as well. Various approaches that employ combinations of physiological sensors for emotion recognition have been proposed. Yet, not all sensors are simp… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  36. arXiv:2308.09809  [pdf, other

    cs.NI eess.SP

    Adaptive Timers and Buffer Optimization for Layer-2 Protocols in 5G Non-Terrestrial Networks

    Authors: Chandan Kumar Sheemar, Sumit Kumar, Jorge Querol, Symeon Chatzinotas

    Abstract: Interest in the integration of Terrestrial Networks (TN) and Non-Terrestrial Networks (NTN); primarily satellites; has been rekindled due to the potential of NTN to provide ubiquitous coverage. Especially with the peculiar and flexible physical layer properties of 5G-NR, now direct access to 5G services through satellites could become possible. However, the large Round-Trip Delays (RTD) in NTNs re… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  37. arXiv:2308.08579  [pdf, other

    eess.SP

    IRS Assisted MIMO Full Duplex: Rate Analysis and Beamforming Under Imperfect CSI

    Authors: Chandan Kumar Sheemar, Sourabh Solanki, Jorge Querol, Sumit Kumar, Symeon Chatzinotas

    Abstract: Intelligent reflecting surfaces (IRS) have emerged as a promising technology to enhance the performance of wireless communication systems. By actively manipulating the wireless propagation environment, IRS enables efficient signal transmission and reception. In recent years, the integration of IRS with full-duplex (FD) communication has garnered significant attention due to its potential to furthe… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2308.08016

  38. arXiv:2308.08016  [pdf, other

    eess.SP

    Robust Beamforming for IRS Aided MIMO Full Duplex Systems

    Authors: Chandan Kumar Sheemar, Jorge Querol, Sourabh Solanki, Sumit Kumar, Symeon Chatzinotas

    Abstract: In this paper, a novel robust beamforming for an intelligent reflecting surface (IRS) assisted FD system is presented. Since perfect channel state information (CSI) is often challenging to acquire in practice, we consider the case of imperfect CSI and adopt a statistically robust beamforming approach to maximize the ergodic weighted sum rate (WSR). We also analyze the achievable WSR of an IRS-assi… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  39. arXiv:2308.06300  [pdf

    eess.IV cs.CV cs.LG

    Automatic Classification of Blood Cell Images Using Convolutional Neural Network

    Authors: Rabia Asghar, Sanjay Kumar, Paul Hynds, Abeera Mahfooz

    Abstract: Human blood primarily comprises plasma, red blood cells, white blood cells, and platelets. It plays a vital role in transporting nutrients to different organs, where it stores essential health-related data about the human body. Blood cells are utilized to defend the body against diverse infections, including fungi, viruses, and bacteria. Hence, blood analysis can help physicians assess an individu… ▽ More

    Submitted 21 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: 15

  40. arXiv:2308.06296  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Classification of White Blood Cells Using Machine and Deep Learning Models: A Systematic Review

    Authors: Rabia Asghar, Sanjay Kumar, Paul Hynds, Arslan Shaukat

    Abstract: Machine learning (ML) and deep learning (DL) models have been employed to significantly improve analyses of medical imagery, with these approaches used to enhance the accuracy of prediction and classification. Model predictions and classifications assist diagnoses of various cancers and tumors. This review presents an in-depth analysis of modern techniques applied within the domain of medical imag… ▽ More

    Submitted 21 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

  41. End-to-End Reinforcement Learning for Torque Based Variable Height Hopping

    Authors: Raghav Soni, Daniel Harnack, Hauke Isermann, Sotaro Fushimi, Shivesh Kumar, Frank Kirchner

    Abstract: Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hopping is a challenging dynamic task involving a flight phase and has the potential to increase the traversability… ▽ More

    Submitted 18 December, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: Update publication info. Cite as: R. Soni, D. Harnack, H. Isermann, S. Fushimi, S. Kumar and F. Kirchner, "End-to-End Reinforcement Learning for Torque Based Variable Height Hopping," 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 7531-7538, doi: 10.1109/IROS55552.2023.10342187

    Journal ref: End-to-End Reinforcement Learning for Torque Based Variable Height Hopping, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 7531-7538

  42. arXiv:2307.16033  [pdf, other

    eess.IV cs.CV

    CoVid-19 Detection leveraging Vision Transformers and Explainable AI

    Authors: Pangoth Santhosh Kumar, Kundrapu Supriya, Mallikharjuna Rao K, Taraka Satya Krishna Teja Malisetti

    Abstract: Lung disease is a common health problem in many parts of the world. It is a significant risk to people health and quality of life all across the globe since it is responsible for five of the top thirty leading causes of death. Among them are COVID 19, pneumonia, and tuberculosis, to name just a few. It is critical to diagnose lung diseases in their early stages. Several different models including… ▽ More

    Submitted 6 May, 2024; v1 submitted 29 July, 2023; originally announced July 2023.

  43. arXiv:2307.14164  [pdf, ps, other

    cs.RO eess.SY math.OC

    Towards Continuous Time Finite Horizon LQR Control in SE(3)

    Authors: Shivesh Kumar, Andreas Mueller, Patrick Wensing, Frank Kirchner

    Abstract: The control of free-floating robots requires dealing with several challenges. The motion of such robots evolves on a continuous manifold described by the Special Euclidean Group of dimension 3, known as SE(3). Methods from finite horizon Linear Quadratic Regulators (LQR) control have gained recent traction in the robotics community. However, such approaches are inherently solving an unconstrained… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023 Workshop on Geometric Representations The Roles of Modern Screw Theory, Lie algebra, and Geometric Algebra in Robotics

  44. arXiv:2307.09425  [pdf, other

    cs.SD eess.AS physics.pop-ph

    Musical Excellence of Mridangam: an introductory review

    Authors: Arvind Shankar Kumar

    Abstract: This is an introductory review of Musical Excellence of Mridangam by Dr. Umayalpuram K Sivaraman, Dr. T Ramasami and Dr. Naresh, which is a scientific treatise exploring the unique tonal properties of the ancient Indian classical percussive instrument -- the Mridangam. This review aims to bridge the gap between the primary intended audience of Musical Excellence of Mridangam - listeners, artistes… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  45. arXiv:2307.07948  [pdf, ps, other

    eess.AS cs.CL

    Model Adaptation for ASR in low-resource Indian Languages

    Authors: Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati, Rohan Saxena, Sai Praneeth Reddy Mora, Srinivasa Raghavan

    Abstract: Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: ASRU Special session overview paper

  46. arXiv:2307.07414  [pdf, other

    eess.SY

    An Embedded Auto-Calibrated Offset Current Compensation Technique for PPG/fNIRS System

    Authors: Sadan Saquib Khan, Sumit Kumar, Benish Jan, Laxmeesha Somappa, Shahid Malik

    Abstract: Usually, the current generated by the photodiode proportional to the oxygenated blood in the photoplethysmography (PPG) and functional infrared spectroscopy (fNIRS) based recording systems is small as compared to the offset-current. The offset current is the combination of the dark current of the photodiode, the current due to ambient light, and the current due to the reflected light from fat and… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  47. arXiv:2307.07395  [pdf, other

    eess.SP

    Flexible Beamforming in B5G for Improving Tethered UAV Coverage over Smart Environments

    Authors: Abdu Saif, Nor Shahida Mohd Shah, Soreen Ameen Fattah, Saeed Hamood Alsamhi, Santosh Kumar, Ali Saad Al khuraib

    Abstract: Unmanned Aerial Vehicles (UAVs) are being used for wireless communications in smart environments. However, the need for mobility, scalability of data transmission over wide areas, and the required coverage area make UAV beamforming essential for better coverage and user experience. To this end, we propose a flexible beamforming approach to improve tethered UAV coverage quality and maximize the num… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 6 pages, 7 figures

  48. arXiv:2305.18419  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

    Authors: W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

    Abstract: We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023. First 3 authors contributed equally

  49. LEAN: Light and Efficient Audio Classification Network

    Authors: Shwetank Choudhary, CR Karthik, Punuru Sri Lakshmi, Sumit Kumar

    Abstract: Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and Pretrained Audio Neural Network (PANN). These models are available as pretrained architecture for transfer learning as well as specific audio task adoption. In t… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at INDICON 2022

  50. AcroMonk: A Minimalist Underactuated Brachiating Robot

    Authors: Mahdi Javadi, Daniel Harnack, Paula Stocco, Shivesh Kumar, Shubham Vyas, Daniel Pizzutilo, Frank Kirchner

    Abstract: Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper p… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: The open-source implementation is available at https://github.com/dfki-ric-underactuated-lab/acromonk and a video demonstration of the experiments can be accessed at https://youtu.be/FIcDNtJo9Jc}

    Journal ref: journal={IEEE Robotics and Automation Letters}, year={2023}, volume={8}, number={6}, pages={3637-3644}