Skip to main content

Showing 1–50 of 421 results for author: Shah, M

  1. arXiv:2407.09073  [pdf, other

    cs.CV

    Open Vocabulary Multi-Label Video Classification

    Authors: Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

    Abstract: Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2407.08855  [pdf, other

    eess.IV cs.CV

    BraTS-PEDs: Results of the Multi-Consortium International Pediatric Brain Tumor Segmentation Challenge 2023

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Anna Zapaishchykova, Julija Pavaine, Lubdha M. Shah, Blaise V. Jones, Nakul Sheth, Sanjay P. Prabhu, Aaron S. McAllister, Wenxin Tu, Khanak K. Nandolia, Andres F. Rodriguez, Ibraheem Salman Shaikh, Mariana Sanchez Montano, Hollie Anne Lai, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Hannah Anderson, Syed Muhammed Anwar, Alejandro Aristizabal, Sina Bagheri , et al. (54 additional authors not shown)

    Abstract: Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 cha… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.07562  [pdf, other

    quant-ph

    Transforming qubits via quasi-geometric approaches

    Authors: Nyirahafashimana Valentine, Nurisya Mohd Shah, Umair Abdul Halim, Sharifah Kartini Said Husain, Ahmed Jellal

    Abstract: We develop a theory based on quasi-geometric (QG) approach to transform a small number of qubits into a larger number of error-correcting qubits by considering four different cases. More precisely, we use the 2-dimensional quasi-orthogonal complete complementary codes (2D-QOCCCSs) and quasi-cyclic asymmetric quantum error-correcting codes (AQECCs) via quasigroup and group theory properties. We int… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 24 pages, 20 figures, 10 tables

  4. arXiv:2407.04370  [pdf, other

    cs.LG cs.AI

    Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

    Authors: Peiyu Yang, Naveed Akhtar, Mubarak Shah, Ajmal Mian

    Abstract: Trustworthy machine learning necessitates meticulous regulation of model reliance on non-robust features. We propose a framework to delineate and regulate such features by attributing model predictions to the input. Within our approach, robust feature attributions exhibit a certain consistency, while non-robust feature attributions are susceptible to fluctuations. This behavior allows identificati… ▽ More

    Submitted 8 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  5. arXiv:2407.03200  [pdf, other

    cs.CV

    SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

    Authors: Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan

    Abstract: Different from Object Detection, Visual Grounding deals with detecting a bounding box for each text-image pair. This one box for each text-image data provides sparse supervision signals. Although previous works achieve impressive results, their passive utilization of annotation, i.e. the sole use of the box annotation as regression ground truth, results in a suboptimal performance. In this paper,… ▽ More

    Submitted 6 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  6. arXiv:2407.02625  [pdf, other

    eess.IV cs.CV cs.LG

    Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images

    Authors: Furqan Shaukat, Syed Muhammad Anwar, Abhijeet Parida, Van Khanh Lam, Marius George Linguraru, Mubarak Shah

    Abstract: Lung cancer has been one of the major threats to human life for decades. Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization. Large Visual Language models (VLMs) have been found effective for multiple downstream medical tasks that rely on both imaging and text data. However, lesion level detection and subsequent diagnosis using VLMs h… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2407.00269  [pdf, other

    physics.optics physics.app-ph

    High-power and narrow-linewidth laser on thin-film lithium niobate enabled by photonic wire bonding

    Authors: Cornelis A. A. Franken, Rebecca Cheng, Keith Powell, Georgios Kyriazidis, Victoria Rosborough, Juergen Musolf, Maximilian Shah, David R. Barton III, Gage Hills, Leif Johansson, Klaus-J. Boller, Marko Lončar

    Abstract: Thin-film lithium niobate (TFLN) has emerged as a promising platform for the realization of high performance chip-scale optical systems, spanning a range of applications from optical communications to microwave photonics. Such applications rely on the integration of multiple components onto a single platform. However, while many of these components have already been demonstrated on the TFLN platfo… ▽ More

    Submitted 5 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures; updated long-term stability measurements with new and improved data

  8. arXiv:2406.16932  [pdf, other

    eess.SP cs.LG

    Xi-Net: Transformer Based Seismic Waveform Reconstructor

    Authors: Anshuman Gaharwar, Parth Parag Kulkarni, Joshua Dickey, Mubarak Shah

    Abstract: Missing/erroneous data is a major problem in today's world. Collected seismic data sometimes contain gaps due to multitude of reasons like interference and sensor malfunction. Gaps in seismic waveforms hamper further signal processing to gain valuable information. Plethora of techniques are used for data reconstruction in other domains like image, video, audio, but translation of those methods to… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Oral Presentation at IEEE International Conference on Image Processing(ICIP) 2023 (Multidimensional Signal Processing Track)

  9. arXiv:2406.13210  [pdf, other

    cs.CV cs.AI

    Surgical Triplet Recognition via Diffusion Model

    Authors: Daochang Liu, Axel Hu, Mubarak Shah, Chang Xu

    Abstract: Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. The goal is to identify the combinations of instruments, verbs, and targets presented in surgical video frames. In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iter… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.10565  [pdf, ps, other

    physics.atm-clus physics.optics quant-ph

    Gain assistant control of photonic spin Hall effect

    Authors: Muhammad Waseem, Muzamil Shah, Gao Xianlong

    Abstract: In the photonic spin Hall effect (SHE), also known as transverse shift, incident light photons with opposite spins are spatially separated in the transverse direction due to the spin-orbit interaction of light. Here, we propose a gain-assisted model to control the SHE in the reflected probe light. In this model, a probe light is incident on a cavity containing a three-level dilute gaseous atomic m… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  11. arXiv:2406.06565  [pdf, other

    cs.CL cs.AI cs.LG

    MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

    Authors: Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

    Abstract: Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  12. arXiv:2405.18295  [pdf, other

    cs.CV

    Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

    Authors: Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan

    Abstract: In real-life scenarios, humans seek out objects in the 3D world to fulfill their daily needs or intentions. This inspires us to introduce 3D intention grounding, a new task in 3D object detection employing RGB-D, based on human intention, such as "I want something to support my back". Closely related, 3D visual grounding focuses on understanding human reference. To achieve detection based on human… ▽ More

    Submitted 6 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  13. arXiv:2405.16005  [pdf, other

    cs.CV

    PTQ4DiT: Post-training Quantization for Diffusion Transformers

    Authors: Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan

    Abstract: The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computa… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

  14. arXiv:2405.15439  [pdf, other

    cs.CV cs.AI

    Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

    Authors: Zichen Geng, Caren Han, Zeeshan Hayder, Jian Liu, Mubarak Shah, Ajmal Mian

    Abstract: Text-driven human motion generation is an emerging task in animation and humanoid robot design. Existing algorithms directly generate the full sequence which is computationally expensive and prone to errors as it does not pay special attention to key poses, a process that has been the cornerstone of animation for decades. We propose KeyMotion, that generates plausible human motion sequences corres… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  15. arXiv:2405.14645  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Lagrangian Neural Networks for Reversible Dissipative Evolution

    Authors: Veera Sundararaghavan, Megna N. Shah, Jeff P. Simmons

    Abstract: There is a growing attention given to utilizing Lagrangian and Hamiltonian mechanics with network training in order to incorporate physics into the network. Most commonly, conservative systems are modeled, in which there are no frictional losses, so the system may be run forward and backward in time without requiring regularization. This work addresses systems in which the reverse direction is ill… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.13637  [pdf, other

    cs.CV cs.AI cs.LG

    Curriculum Direct Preference Optimization for Diffusion and Consistency Models

    Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

    Abstract: Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employ… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  17. arXiv:2405.12716  [pdf, other

    cs.AI cs.LG cs.MA

    Reinforcement Learning Enabled Peer-to-Peer Energy Trading for Dairy Farms

    Authors: Mian Ibad Ali Shah, Enda Barrett, Karl Mason

    Abstract: Farm businesses are increasingly adopting renewables to enhance energy efficiency and reduce reliance on fossil fuels and the grid. This shift aims to decrease dairy farms' dependence on traditional electricity grids by enabling the sale of surplus renewable energy in Peer-to-Peer markets. However, the dynamic nature of farm communities poses challenges, requiring specialized algorithms for P2P en… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Proc. of the Main Track of 22nd International Conference on Practical Applications of Agents and Multi-Agent Systems, 26th-28th June, 2024, https://www.paams.net/. Includes 6 figures, 1 table and 32 references

  18. arXiv:2405.11574  [pdf, other

    cs.CV cs.AI cs.LG

    Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

    Authors: Manan Shah, Yash Bhalgat

    Abstract: This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible, well commented and open-sourced code implementation for the entire method specified in the original paper. (2) We try to verify the effectiveness of the novel a… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Reproducibility study

  19. arXiv:2405.07518  [pdf, other

    cs.AR cs.AI

    SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

    Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

    Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  20. arXiv:2405.07354  [pdf, other

    cs.SD cs.IR cs.LG cs.MM eess.AS

    SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset

    Authors: Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah

    Abstract: The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    ACM Class: I.2.7; I.7

  21. arXiv:2405.07338  [pdf, other

    eess.IV cs.CV

    Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images

    Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad Shah

    Abstract: Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  22. arXiv:2405.02937  [pdf, other

    cs.CL

    Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study

    Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Asif Iftekher Fahim, Pronay Debnath, Faisal Muhammad Shah

    Abstract: Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two s… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

  23. arXiv:2405.02296  [pdf, other

    cs.CV

    Möbius Transform for Mitigating Perspective Distortions in Representation Learning

    Authors: Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah

    Abstract: Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision me… ▽ More

    Submitted 15 July, 2024; v1 submitted 7 March, 2024; originally announced May 2024.

    Comments: Accepted to European Conference on Computer Vision(ECCV2024). project page- https://prakashchhipa.github.io/projects/mpd

  24. arXiv:2404.18021  [pdf, other

    cs.AI cs.CL cs.HC q-bio.QM

    CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

    Authors: Kaixuan Huang, Yuanhao Qu, Henry Cousins, William A. Johnson, Di Yin, Mihir Shah, Denny Zhou, Russ Altman, Mengdi Wang, Le Cong

    Abstract: The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often la… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  25. arXiv:2404.06715  [pdf, other

    cs.CV

    Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data

    Authors: Aakash Kumar, Chen Chen, Ajmal Mian, Neils Lobo, Mubarak Shah

    Abstract: 3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolu… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  26. arXiv:2404.02840  [pdf, ps, other

    cs.DC

    A Survey on Error-Bounded Lossy Compression for Scientific Datasets

    Authors: Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello

    Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: submitted to ACM Computing journal, requited to be 35 pages including references

  27. arXiv:2404.02618  [pdf, other

    cs.CV cs.AI

    Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

    Authors: Matteo Pennisi, Giovanni Bellitto, Simone Palazzo, Mubarak Shah, Concetto Spampinato

    Abstract: We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability. DiffExplainer employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs and hidden features of a classifier, thus providing a visual tool for explaining decisions. Moreover, the analysis of generated visual descriptions… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  28. arXiv:2403.19407  [pdf, other

    cs.CV

    Towards Temporally Consistent Referring Video Object Segmentation

    Authors: Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian

    Abstract: Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates i… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  29. arXiv:2403.17632  [pdf, other

    cs.AI cs.CY cs.LG

    Data-driven Energy Consumption Modelling for Electric Micromobility using an Open Dataset

    Authors: Yue Ding, Sen Yan, Maqsood Hussain Shah, Hongyuan Fang, Ji Li, Mingming Liu

    Abstract: The escalating challenges of traffic congestion and environmental degradation underscore the critical importance of embracing E-Mobility solutions in urban spaces. In particular, micro E-Mobility tools such as E-scooters and E-bikes, play a pivotal role in this transition, offering sustainable alternatives for urban commuters. However, the energy consumption patterns for these tools are a critical… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures, 4 tables. This manuscript has been accepted by the IEEE ITEC 2024

  30. arXiv:2403.16997  [pdf, other

    cs.CV

    Composed Video Retrieval via Enriched Context and Discriminative Embeddings

    Authors: Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

    Abstract: Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases. Existing works predominantly rely on visual queries combined with modification text to distinguish relevant videos. However, such a strategy struggles to fully preserve the rich qu… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR-2024

  31. arXiv:2403.14870  [pdf, other

    cs.CV cs.CL cs.LG

    VidLA: Video-Language Alignment at Scale

    Authors: Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

    Abstract: In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  32. arXiv:2403.14614  [pdf, other

    cs.CV

    AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

    Authors: Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan

    Abstract: In the image acquisition process, various forms of degradation, including noise, haze, and rain, are frequently introduced. These degradations typically arise from the inherent limitations of cameras or unfavorable ambient conditions. To recover clean images from degraded versions, numerous specialized restoration methods have been developed, each targeting a specific type of degradation. Recently… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 28 pages,15 figures

  33. arXiv:2403.07964  [pdf, other

    cs.AI

    Optimal Design and Implementation of an Open-source Emulation Platform for User-Centric Shared E-mobility Services

    Authors: Maqsood Hussain Shah, Yue Ding, Shaoshu Zhu, Yingqi Gu, Mingming Liu

    Abstract: With the rising concern over transportation emissions and pollution on a global scale, shared electric mobility services like E-cars, E-bikes, and E-scooters have emerged as promising solutions to mitigate these pressing challenges. However, existing shared E-mobility services exhibit critical design deficiencies, including insufficient service integration, imprecise energy consumption forecasting… ▽ More

    Submitted 1 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 15 pages, 5 figures

  34. arXiv:2403.07937  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speech Robust Bench: A Robustness Benchmark For Speech Recognition

    Authors: Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkila, Nicolas Kourtellis

    Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  35. arXiv:2403.06394  [pdf, other

    cs.CV

    FSViewFusion: Few-Shots View Generation of Novel Objects

    Authors: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim

    Abstract: Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode… ▽ More

    Submitted 12 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  36. Privacy-Respecting Type Error Telemetry at Scale

    Authors: Ben Greenman, Alan Jeffrey, Shriram Krishnamurthi, Mitesh Shah

    Abstract: Context: Roblox Studio lets millions of creators build interactive experiences by programming in a variant of Lua called Luau. The creators form a broad group, ranging from novices writing their first script to professional developers; thus, Luau must support a wide audience. As part of its efforts to support all kinds of programmers, Luau includes an optional, gradual type system and goes to grea… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Journal ref: The Art, Science, and Engineering of Programming, 2024, Vol. 8, Issue 3, Article 12

  37. arXiv:2402.10478  [pdf, other

    cs.CV cs.LG

    CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes

    Authors: Ishan Rajendrakumar Dave, Tristan de Blegiers, Chen Chen, Mubarak Shah

    Abstract: Malaria is a major health issue worldwide, and its diagnosis requires scalable solutions that can work effectively with low-cost microscopes (LCM). Deep learning-based methods have shown success in computer-aided diagnosis from microscopic images. However, these methods need annotated images that show cells affected by malaria parasites and their life stages. Annotating images from LCM significant… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Under Review. Project Page: https://daveishan.github.io/codamal-webpage/

  38. arXiv:2402.01980  [pdf, other

    cs.CL

    SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks

    Authors: Gourab Dey, Adithya V Ganesan, Yash Kumar Lal, Manal Shah, Shreyashee Sinha, Matthew Matero, Salvatore Giorgi, Vivek Kulkarni, H. Andrew Schwartz

    Abstract: Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about… ▽ More

    Submitted 14 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Short paper accepted to EACL 2024. 4 pgs, 2 tables

  39. arXiv:2402.01021  [pdf, other

    cs.SE

    Towards Understanding the Challenges of Bug Localization in Deep Learning Systems

    Authors: Sigma Jahan, Mehil B. Shah, Mohammad Masudur Rahman

    Abstract: Software bugs cost the global economy billions of dollars annually and claim ~50\% of the programming time from software developers. Locating these bugs is crucial for their resolution but challenging. It is even more challenging in deep-learning systems due to their black-box nature. Bugs in these systems are also hidden not only in the code but also in the models and training data, which might m… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  40. arXiv:2401.09549  [pdf, other

    cond-mat.mes-hall

    Interferometric Single-Shot Parity Measurement in an InAs-Al Hybrid Device

    Authors: Morteza Aghaee, Alejandro Alcaraz Ramirez, Zulfi Alam, Rizwan Ali, Mariusz Andrzejczuk, Andrey Antipov, Mikhail Astafev, Amin Barzegar, Bela Bauer, Jonathan Becker, Umesh Kumar Bhaskar, Alex Bocharov, Srini Boddapati, David Bohn, Jouri Bommer, Leo Bourdet, Arnaud Bousquet, Samuel Boutin, Lucas Casparis, Benjamin James Chapman, Sohail Chatoor, Anna Wulff Christensen, Cassandra Chua, Patrick Codd, William Cole , et al. (137 additional authors not shown)

    Abstract: The fusion of non-Abelian anyons or topological defects is a fundamental operation in measurement-only topological quantum computation. In topological superconductors, this operation amounts to a determination of the shared fermion parity of Majorana zero modes. As a step towards this, we implement a single-shot interferometric measurement of fermion parity in indium arsenide-aluminum heterostruct… ▽ More

    Submitted 2 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Added data on a second measurement of device A and a measurement of device B, expanded discussion of a trivial scenario. Refs added, author list updated

  41. arXiv:2401.09446  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Explainable Multimodal Sentiment Analysis on Bengali Memes

    Authors: Kazi Toufique Elahi, Tasnuva Binte Rahman, Shakil Shahriar, Samir Sarker, Sajib Kumar Saha Joy, Faisal Muhammad Shah

    Abstract: Memes have become a distinctive and effective form of communication in the digital era, attracting online communities and cutting across cultural barriers. Even though memes are frequently linked with humor, they have an amazing capacity to convey a wide range of emotions, including happiness, sarcasm, frustration, and more. Understanding and interpreting the sentiment underlying memes has become… ▽ More

    Submitted 20 December, 2023; originally announced January 2024.

  42. arXiv:2401.07310  [pdf, other

    cs.CL

    Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study

    Authors: Ahmadul Karim Chowdhury, Md. Saidur Rahman Sujon, Md. Shirajus Salekin Shafi, Tasin Ahmmad, Sifat Ahmed, Khan Md Hasib, Faisal Muhammad Shah

    Abstract: In an era where the silent struggle of underdiagnosed depression pervades globally, our research delves into the crucial link between mental health and social media. This work focuses on early detection of depression, particularly in extroverted social media users, using LLMs such as GPT 3.5, GPT 4 and our proposed GPT 3.5 fine-tuned model DepGPT, as well as advanced Deep learning models(LSTM, Bi-… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  43. arXiv:2401.04983  [pdf, ps, other

    math.DG

    The Funk-Finsler structure on the unit disc in the hyperbolic plane

    Authors: Ashok Kumar, Hemangi Madhusudan Shah, Bankteshwar Tiwari

    Abstract: In this paper, we construct the Funk-Finsler structure in various models of the hyperbolic plane. In particular, in the unit disc of the Klein model, it turns out to be a Randers metric, which is a non-Berwald Douglas metric. Further, using Finsler isometries we obtain the Funk-Finsler structures in other models of the hyperbolic plane. Finally, we also investigate the geometry of this Funk-Finsle… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  44. arXiv:2401.03069  [pdf, other

    cs.SE cs.LG

    Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

    Authors: Mehil B. Shah, Mohammad Masudur Rahman, Foutse Khomh

    Abstract: Context: Deep learning has achieved remarkable progress in various domains. However, like any software system, deep learning systems contain bugs, some of which can have severe impacts, as evidenced by crashes involving autonomous vehicles. Despite substantial advancements in deep learning techniques, little research has focused on reproducing deep learning bugs, which is an essential step for the… ▽ More

    Submitted 18 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Under Major Revision at the EMSE (Empirical Software Engineering) Journal

  45. arXiv:2312.13179  [pdf, other

    cs.CL

    Contextual Code Switching for Machine Translation using Language Models

    Authors: Arshad Kaji, Manan Shah

    Abstract: Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years. Their demonstrated state-of-the-art performance is achieved through methodologies such as zero-shot or few-shot prompting. These models undergo training on extensive datasets that encompass segments of the Internet and subsequently undergo fine-tuning tailored to specific tasks. Notab… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 4 pages, 1 figure, 2 tables

  46. arXiv:2312.13008  [pdf, other

    cs.CV cs.AI cs.LG

    No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

    Authors: Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

    Abstract: Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g., contrastive learning) that do not explicitly promote the learning of temporal features. We identify two factors that limit existing temporal self-su… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 (Main Technical Track)

  47. arXiv:2312.11868  [pdf, other

    cs.RO eess.SY

    Dynamic Loco-manipulation on HECTOR: Humanoid for Enhanced ConTrol and Open-source Research

    Authors: Junheng Li, Junchao Ma, Omar Kolt, Manas Shah, Quan Nguyen

    Abstract: Despite their remarkable advancement in locomotion and manipulation, humanoid robots remain challenged by a lack of synchronized loco-manipulation control, hindering their full dynamic potential. In this work, we introduce a versatile and effective approach to controlling and generalizing dynamic locomotion and loco-manipulation on humanoid robots via a Force-and-moment-based Model Predictive Cont… ▽ More

    Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: 14 pages, 13 figures

  48. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  49. arXiv:2312.09327  [pdf, other

    quant-ph

    Employing an operator form of the Rodrigues formula to calculate wavefunctions without differential equations

    Authors: Joseph R. Noonan, Maaz ur Rehman Shah, Luogen Xu, James. K. Freericks

    Abstract: The factorization method of Schrodinger shows us how to determine the energy eigenstates without needing to determine the wavefunctions in position or momentum space. A strategy to convert the energy eigenstates to wavefunctions is well known for the one-dimensional simple harmonic oscillator by employing the Rodrigues formula for the Hermite polynomials in position or momentum space. In this work… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: (10 pages, 1 figure, plus supplemental material)

  50. arXiv:2312.07013  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Photonic spin Hall effect in Haldane materials

    Authors: Muzamil Shah, Muhammad Sabieh Anwar, Reza Asgari, Gao Xianlong

    Abstract: The photonic spin Hall effect of light beams reflected from the surfaces of various two-dimensional hexagonal crystalline structures, considering their associated time-reversal $\mathcal{T}$ and inversion $\mathcal{I}$ symmetries, is investigated. Employing the Haldane model with tunable parameters as a generic model, we examine the longitudinal and transverse spin-separations of the reflected bea… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.