Skip to main content

Showing 51–100 of 421 results for author: Shah, M

  1. arXiv:2312.05719  [pdf, other

    cs.CV

    DVANet: Disentangling View and Action Features for Multi-View Action Recognition

    Authors: Nyle Siddiqui, Praveen Tirupattur, Mubarak Shah

    Abstract: In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video. When trying to classify action instances captured from multiple viewpoints, there is a higher degree of difficulty due to the difference in background, occlusion, and visibility of the captured action from different came… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  2. arXiv:2312.05623  [pdf, other

    cs.IT eess.SP

    Impact of Urban Street Geometry on the Detection Probability of Automotive Radars

    Authors: Mohammad Taha Shah, Ankit Kumar, Gourab Ghatak, Shobha Sundar Ram

    Abstract: Prior works have analyzed the performance of millimeter wave automotive radars in the presence of diverse clutter and interference scenarios using stochastic geometry tools instead of more time-consuming measurement studies or system-level simulations. In these works, the distributions of radars or discrete clutter scatterers were modeled as Poisson point processes in the Euclidean space. However,… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Radar Conference 2024 (RadarConf24)

  3. arXiv:2312.04850  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Impact of the Fizeau drag effect on Goos-Hänchen shifts in graphene

    Authors: Rafi Ud Din, Muzamil Shah, Reza Asgari, Gao Xianlong

    Abstract: We investigate the Goos-Hänchen shifts in reflection for a light beam within a graphene structure, utilizing the Fizeau drag effect induced by its massless Dirac electrons in incident light. The magnitudes of spatial and angular shifts for a light beam propagating against the direction of drifting electrons are significantly enhanced, while shifts for a beam co-propagating with the drifting electr… ▽ More

    Submitted 6 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 11 pages, 7 figures

  4. arXiv:2312.04548  [pdf, other

    cs.CV cs.AI cs.LG

    Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

    Authors: Aritra Dutta, Srijan Das, Jacob Nielsen, Rajatsubhra Chakraborty, Mubarak Shah

    Abstract: Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    ACM Class: I.4.0; I.4.8; I.5.1; I.5.4; I.2.10

  5. arXiv:2311.13435  [pdf, other

    cs.CV cs.AI

    PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

    Authors: Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan

    Abstract: Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data. The recent approaches extending image-based LMMs to videos either lack the grounding capabilities (e.g., VideoChat, Video-ChatGPT, Video-LLaMA) or do not utilize the audio-signals for better video understanding (e.g., Video-ChatGPT). Addressing these gaps, we propose PG-Video… ▽ More

    Submitted 13 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Technical Report

  6. Not Just Training, Also Testing: High School Youths' Perspective-Taking through Peer Testing Machine Learning-Powered Applications

    Authors: L. Morales-Navarro, M. Shah, Y. B. Kafai

    Abstract: Most attention in K-12 artificial intelligence and machine learning (AI/ML) education has been given to having youths train models, with much less attention to the equally important testing of models when creating machine learning applications. Testing ML applications allows for the evaluation of models against predictions and can help creators of applications identify and address failure and edge… ▽ More

    Submitted 14 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    ACM Class: K.3.2

  7. arXiv:2311.05548  [pdf, other

    cs.CV eess.IV

    L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks

    Authors: Mirat Shah, Vansh Jain, Anmol Chokshi, Guruprasad Parasnis, Pramod Bide

    Abstract: Generative Adversarial Networks (GANs) have risen to prominence in the field of deep learning, facilitating the generation of realistic data from random noise. The effectiveness of GANs often depends on the quality of feature extraction, a critical aspect of their architecture. This paper introduces L-WaveBlock, a novel and robust feature extractor that leverages the capabilities of the Discrete W… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 12 figures, 8 pages

  8. arXiv:2310.15324  [pdf, other

    cs.CV

    Videoprompter: an ensemble of foundational models for zero-shot video understanding

    Authors: Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: Vision-language models (VLMs) classify the query video by calculating a similarity score between the visual features and text-based class label representations. Recently, large language models (LLMs) have been used to enrich the text-based class labels by enhancing the descriptiveness of the class names. However, these improvements are restricted to the text-based classifier only, and the query vi… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  9. arXiv:2310.05932  [pdf, other

    cs.MA cs.AI eess.SY

    A Multi-Agent Systems Approach for Peer-to-Peer Energy Trading in Dairy Farming

    Authors: Mian Ibad Ali Shah, Abdul Wahid, Enda Barrett, Karl Mason

    Abstract: To achieve desired carbon emission reductions, integrating renewable generation and accelerating the adoption of peer-to-peer energy trading is crucial. This is especially important for energy-intensive farming, like dairy farming. However, integrating renewables and peer-to-peer trading presents challenges. To address this, we propose the Multi-Agent Peer-to-Peer Dairy Farm Energy Simulator (MAPD… ▽ More

    Submitted 21 August, 2023; originally announced October 2023.

    Comments: Proc. of the Artificial Intelligence for Sustainability, ECAI 2023, Eunika et al. (eds.), Sep 30- Oct 1, 2023, https://sites.google.com/view/ai4s. 2023

  10. arXiv:2310.04445  [pdf, other

    cs.CL cs.AI cs.LG

    LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

    Authors: Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh

    Abstract: It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private targe… ▽ More

    Submitted 21 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  11. arXiv:2309.16020  [pdf, other

    cs.CV cs.LG

    GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization

    Authors: Vicente Vivanco Cepeda, Gaurav Kumar Nayak, Mubarak Shah

    Abstract: Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. This task has considerable challenges due to immense variation in geographic landscapes. The image-to-image retrieval-based approaches fail to solve this problem on a global scale as it is not feasible to construct a large gallery of images covering the entire world. Instead, existing approaches div… ▽ More

    Submitted 21 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted at NeurIPS 2023

  12. A Review on AI Algorithms for Energy Management in E-Mobility Services

    Authors: Sen Yan, Maqsood Hussain Shah, Ji Li, Noel O'Connor, Mingming Liu

    Abstract: E-mobility, or electric mobility, has emerged as a pivotal solution to address pressing environmental and sustainability concerns in the transportation sector. The depletion of fossil fuels, escalating greenhouse gas emissions, and the imperative to combat climate change underscore the significance of transitioning to electric vehicles (EVs). This paper seeks to explore the potential of artificial… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 pages, 4 tables, 1 figure

  13. arXiv:2309.13962  [pdf, other

    cs.CV eess.IV

    Egocentric RGB+Depth Action Recognition in Industry-Like Settings

    Authors: Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

    Abstract: Action recognition from an egocentric viewpoint is a crucial perception task in robotics and enables a wide range of human-robot interactions. While most computer vision approaches prioritize the RGB camera, the Depth modality - which can further amplify the subtleties of actions from an egocentric perspective - remains underexplored. Our work focuses on recognizing actions from egocentric RGB and… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  14. arXiv:2309.10058  [pdf, other

    cs.LG cs.CR

    Dual Student Networks for Data-Free Model Stealing

    Authors: James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah

    Abstract: Existing data-free model stealing methods use a generator to produce samples in order to train a student model to match the target model outputs. To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space. We propose a Dual Student method where two stud… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Published in the ICLR 2023 - The Eleventh International Conference on Learning Representations

  15. arXiv:2309.03989  [pdf, other

    cs.CV

    CDFSL-V: Cross-Domain Few-Shot Learning for Videos

    Authors: Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

    Abstract: Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples, thereby reducing the challenges associated with collecting and annotating large-scale video datasets. Existing methods in video action recognition rely on large labeled datasets from the same domain. However, this setup is not realistic as novel categories may come from differ… ▽ More

    Submitted 15 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  16. arXiv:2309.02626  [pdf, other

    math.OC stat.ML

    Adaptive Consensus: A network pruning approach for decentralized optimization

    Authors: Suhail M. Shah, Albert S. Berahas, Raghu Bollapragada

    Abstract: We consider network-based decentralized optimization problems, where each node in the network possesses a local function and the objective is to collectively attain a consensus solution that minimizes the sum of all the local functions. A major challenge in decentralized optimization is the reliance on communication which remains a considerable bottleneck in many applications. To address this chal… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 35 pages, 3 figures

  17. arXiv:2309.02226  [pdf, ps, other

    math.DG

    Asymptotically harmonic manifolds of dimension 3 with minimal horospheres

    Authors: Jihun Kim, JeongHyeong Park, Hemangi Madhusudan Shah

    Abstract: In [14], it was shown that, if M is a 3-dimensional asymptotically harmonic with minimal horospheres, then M is flat. However, there is a gap in the proof of this paper. In this paper, we provide the correct proof of the result. Thus we complete the classification of asymptotically harmonic manifolds of dimension 3: An asymptotically harmonic manifold of dimension 3 is either a flat or real hyperb… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 7 pages

    MSC Class: Primary 53C35; Secondary 53C25

  18. arXiv:2308.13711  [pdf, other

    cs.CV cs.RO

    EventTransAct: A video transformer-based framework for Event-camera based action recognition

    Authors: Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, Mubarak Shah

    Abstract: Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing. Event cameras, with their ability to capture fast-moving objects at a high temporal resolution, offer new opportunities compared to standard action recognition in RGB videos… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: IROS 2023; The first two authors contributed equally

  19. arXiv:2308.13077  [pdf, other

    cs.CV

    Preserving Modality Structure Improves Multi-Modal Learning

    Authors: Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

    Abstract: Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations. These joint embeddings enable zero-shot cross-modal tasks like retrieval and classification. However, these methods often struggle to generalize well on out-of-domain data as they ignore the semantic struct… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  20. Software Startups -- A Research Agenda

    Authors: Michael Unterkalmsteiner, Pekka Abrahamsson, Xiaofeng Wang, Anh Nguyen-Duc, Syed M. Ali Shah, Sohaib Shahid Bajwa, Guido H. Baltes, Kieran Conboy, Eoin Cullina, Denis Dennehy, Henry Edison, Carlos Fernández-Sánchez, Juan Garbajosa, Tony Gorschek, Eriks Klotins, Laura Hokkanen, Fabio Kon, Ilaria Lunesu, Michele Marchesi, Lorraine Morgan, Markku Oivo, Christoph Selig, Pertti Seppänen, Roger Sweetman, Pasi Tyrväinen , et al. (2 additional authors not shown)

    Abstract: Software startup companies develop innovative, software-intensive products within limited time frames and with few resources, searching for sustainable and scalable business models. Software startups are quite distinct from traditional mature software companies, but also from micro-, small-, and medium-sized enterprises, introducing new challenges relevant for software engineering research. This p… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Journal ref: e-Informatica Softw. Eng. J. 10(1): 89-124 (2016)

  21. arXiv:2308.11405  [pdf, other

    cs.IT eess.SP

    Achievable Sum-rate of variants of QAM over Gaussian Multiple Access Channel with and without security

    Authors: Shifa Showkat, Zahid Bashir Dar, Shahid Mehraj Shah

    Abstract: The performance of next generation wireless systems (5G/6G and beyond) at the physical layer is primarily driven by the choice of digital modulation techniques that are bandwidth and power efficient, while maintaining high data rates. Achievable rates for Gaussian input and some finite constellations (BPSK/QPSK/QAM) are well studied in the literature. However, new variants of Quadrature Amplitude… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: 11 Figures, two tables. Accepted for publication in IEEE International Conference on Signal Processing and Computer Vision (SPCV-2023)

  22. arXiv:2308.11072  [pdf, other

    cs.CV cs.CR

    TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection

    Authors: Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah

    Abstract: Video anomaly detection (VAD) without human monitoring is a complex computer vision task that can have a positive impact on society if implemented successfully. While recent advances have made significant progress in solving this task, most existing approaches overlook a critical real-world concern: privacy. With the increasing popularity of artificial intelligence technologies, it becomes crucial… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  23. arXiv:2308.09693  [pdf, other

    cs.CV cs.LG eess.IV

    A Lightweight Transformer for Faster and Robust EBSD Data Collection

    Authors: Harry Dong, Sean Donegan, Megna Shah, Yuejie Chi

    Abstract: Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures tha… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  24. arXiv:2308.07705  [pdf, other

    cs.IT cs.LG

    Parametric entropy based Cluster Centriod Initialization for k-means clustering of various Image datasets

    Authors: Faheem Hussayn, Shahid M Shah

    Abstract: One of the most employed yet simple algorithm for cluster analysis is the k-means algorithm. k-means has successfully witnessed its use in artificial intelligence, market segmentation, fraud detection, data mining, psychology, etc., only to name a few. The k-means algorithm, however, does not always yield the best quality results. Its performance heavily depends upon the number of clusters supplie… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 6 Pages, 2 tables, one algorithm. Accepted for publication in IEEE International Conference on Signal Processing and Computer Vision (SPCV-2023)

  25. arXiv:2308.05430  [pdf, other

    cs.CV

    Ensemble Modeling for Multimodal Visual Action Recognition

    Authors: Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

    Abstract: In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset. Based on the underlying principle of focal loss, which captures the relationship between tail (scarce) classes and their prediction difficulties, we prop… ▽ More

    Submitted 25 September, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: 22nd International Conference on Image Analysis and Processing Workshops - Multimodal Action Recognition on the MECCANO Dataset, 2023

  26. arXiv:2308.03956  [pdf, other

    cs.LG cs.NE

    Fixed Inter-Neuron Covariability Induces Adversarial Robustness

    Authors: Muhammad Ahmed Shah, Bhiksha Raj

    Abstract: The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs) that raises question about their reliability when in real-world scenarios. On the other hand, human perception, which DNNs are supposed to emulate, is highly robust to such perturbations, indicating that there may be certain features of the human perception that make it robust but are not represented in t… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  27. arXiv:2308.03866  [pdf, other

    cs.CL cs.AI

    Trusting Language Models in Education

    Authors: Jogi Suda Neto, Li Deng, Thejaswi Raya, Reza Shahbazi, Nick Liu, Adhitya Venkatesh, Miral Shah, Neeru Khosla, Rodrigo Capobianco Guido

    Abstract: Language Models are being widely used in Education. Even though modern deep learning models achieve very good performance on question-answering tasks, sometimes they make errors. To avoid misleading students by showing wrong answers, it is important to calibrate the confidence - that is, the prediction probability - of these models. In our work, we propose to use an XGBoost on top of BERT to outpu… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  28. arXiv:2308.01987  [pdf, other

    cs.CL

    Bengali Fake Reviews: A Benchmark Dataset and Detection System

    Authors: G. M. Shahariar, Md. Tanvir Rouf Shawon, Faisal Muhammad Shah, Mohammad Shafiul Alam, Md. Shahriar Mahbub

    Abstract: The proliferation of fake reviews on various online platforms has created a major concern for both consumers and businesses. Such reviews can deceive customers and cause damage to the reputation of products or services, making it crucial to identify them. Although the detection of fake reviews has been extensively studied in English language, detecting fake reviews in non-English languages such as… ▽ More

    Submitted 4 May, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

  29. arXiv:2308.01472  [pdf, other

    cs.CV cs.CL cs.LG

    Reverse Stable Diffusion: What prompt was used to generate this image?

    Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah

    Abstract: Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  30. arXiv:2308.00854  [pdf, other

    cs.CV cs.AI

    Training on Foveated Images Improves Robustness to Adversarial Attacks

    Authors: Muhammad A. Shah, Bhiksha Raj

    Abstract: Deep neural networks (DNNs) have been shown to be vulnerable to adversarial attacks -- subtle, perceptually indistinguishable perturbations of inputs that change the response of the model. In the context of vision, we hypothesize that an important contributor to the robustness of human visual perception is constant exposure to low-fidelity visual stimuli in our peripheral vision. To investigate th… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  31. arXiv:2307.14942  [pdf, other

    math.OC

    A Stochastic Gradient Tracking Algorithm for Decentralized Optimization With Inexact Communication

    Authors: Suhail M. Shah, Raghu Bollapragada

    Abstract: Decentralized optimization is typically studied under the assumption of noise-free transmission. However, real-world scenarios often involve the presence of noise due to factors such as additive white Gaussian noise channels or probabilistic quantization of transmitted data. These sources of noise have the potential to degrade the performance of decentralized optimization algorithms if not effecti… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 34 pages, 4 figures

  32. arXiv:2307.13721  [pdf, other

    cs.CV cs.AI

    Foundational Models Defining a New Era in Vision: A Survey and Outlook

    Authors: Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

    Abstract: Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The complex relations between objects and their locations, ambiguities, and variations in the real-world environment can be better described in human language, naturally governed by grammatical rules and other modalities such as audio and depth. The models learned to bridge… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Project page: https://github.com/awaisrauf/Awesome-CV-Foundational-Models

  33. arXiv:2307.11671  [pdf, other

    physics.optics

    Adapted poling to break the nonlinear efficiency limit in nanophotonic lithium niobate waveguides

    Authors: Pao-Kang Chen, Ian Briggs, Chaohan Cui, Liang Zhang, Manav Shah, Linran Fan

    Abstract: Nonlinear frequency mixing is of critical importance in extending the wavelength range of optical sources. It is also indispensable for emerging applications such as quantum information and photonic signal processing. Conventional lithium niobate with periodic poling is the most widely used device for frequency mixing due to the strong second-order nonlinearity. The recent development of nanophoto… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  34. arXiv:2307.07395  [pdf, other

    eess.SP

    Flexible Beamforming in B5G for Improving Tethered UAV Coverage over Smart Environments

    Authors: Abdu Saif, Nor Shahida Mohd Shah, Soreen Ameen Fattah, Saeed Hamood Alsamhi, Santosh Kumar, Ali Saad Al khuraib

    Abstract: Unmanned Aerial Vehicles (UAVs) are being used for wireless communications in smart environments. However, the need for mobility, scalability of data transmission over wide areas, and the required coverage area make UAV beamforming essential for better coverage and user experience. To this end, we propose a flexible beamforming approach to improve tethered UAV coverage quality and maximize the num… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 6 pages, 7 figures

  35. arXiv:2307.07269  [pdf, other

    eess.IV cs.CV cs.LG

    Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

    Authors: Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

    Abstract: It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial at… ▽ More

    Submitted 20 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: This paper has been accepted in MICCAI 2023 conference

  36. arXiv:2307.06947  [pdf, other

    cs.CV cs.AI

    Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

    Authors: Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

    Abstract: Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison, convolutional designs for videos offer an efficient alternative but lack long-range dependency modeling. Towards achieving the best of both designs, this work prop… ▽ More

    Submitted 27 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV-2023. Camera-Ready version. Project page: https://TalalWasim.github.io/Video-FocalNets/

  37. arXiv:2306.12041  [pdf, other

    cs.CV cs.LG

    Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

    Authors: Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student de… ▽ More

    Submitted 9 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Accepted at CVPR 2024

  38. arXiv:2306.09239  [pdf, ps, other

    q-bio.NC cs.LG eess.IV

    Exploiting the Brain's Network Structure for Automatic Identification of ADHD Subjects

    Authors: Soumyabrata Dey, Ravishankar Rao, Mubarak Shah

    Abstract: Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from cont… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  39. arXiv:2306.09209  [pdf, ps, other

    cs.IT cs.GT eess.SP

    Bayesian Game Formulation of Power Allocation in Multiple Access Wiretap Channel with Incomplete CSI

    Authors: Basharat Rashid, Majed Haddad, Shahid Mehraj Shah

    Abstract: In this paper, we address the problem of distributed power allocation in a $K$ user fading multiple access wiretap channel, where global channel state information is limited, i.e., each user has knowledge of their own channel state with respect to Bob and Eve but only knows the distribution of other users' channel states. We model this problem as a Bayesian game, where each user is assumed to self… ▽ More

    Submitted 4 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 7 Pages, 2 Figures, submitted for possible publication

  40. arXiv:2306.06453  [pdf, other

    math.DG

    Isometric models of the Funk disc and the Busemann function

    Authors: Ashok Kumar, Hemangi Madhusudan Shah, Bankteshwar Tiwari

    Abstract: In this article, we find three isometric models of the Funk disc: Finsler upper half of the hyperboloid of two sheets model, the Finsler band model and the Finsler upper hemi sphere model; and we also find two new models of the Finsler-Poincaré disc. We explicitly describe the geodesics in each model. Moreover, we compute the Busemann function and consequently describe the horocycles in the Funk a… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  41. arXiv:2306.06325  [pdf, other

    cs.LG

    Explaining a machine learning decision to physicians via counterfactuals

    Authors: Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, Alexei Wagner

    Abstract: Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulte… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  42. arXiv:2305.17033  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Shuvanjan Haldar, Juan Eugenio Iglesias, Anastasia Janas , et al. (48 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20\%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. The MICCA… ▽ More

    Submitted 23 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

  43. arXiv:2305.06597  [pdf, ps, other

    math.DG

    Nature of Some Solitons on Almost coKähler Manifolds and Asymptotically Harmonic Manifolds

    Authors: Paritosh Ghosh, Hemangi Madhusudan Shah, Arindam Bhattacharyya

    Abstract: In this research, we study the nature of $η$-Einstein and gradient $η$-Einstein soliton in the framework of almost coKähler manifolds and $(κ, μ)$-almost coKähler manifolds. We find some expressions for scalar curvature of the almost coKähler manifold admitting $η$-Einstein soliton in various cases. We also prove that if a $(κ, μ)$-almost coKähler manifold admits a gradient $η$-Einstein soliton, t… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    MSC Class: 53B40; 53B20; 53C25; 53D15

  44. arXiv:2304.08682  [pdf, other

    cs.CV

    Learning Situation Hyper-Graphs for Video Question Answering

    Authors: Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah

    Abstract: Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation that describes situations as scene sub-graphs for video frames and hyper-edges for connected sub-graphs and has been proposed to capture all such information in a… ▽ More

    Submitted 6 May, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

  45. arXiv:2304.03410  [pdf, other

    cs.CV

    $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

    Authors: Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

    Abstract: Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database. Conventional methods generally adopt aggregated CNN features for global retrieval and RANSAC-based geometric verification for reranking. However, RANSAC only employs geometric information but ignores other possible information that could be useful for reranking, e.g. local fe… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: CVPR

  46. arXiv:2304.03307  [pdf, other

    cs.CV eess.IV

    Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

    Authors: Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: Adopting contrastive image-text pretrained models like CLIP towards video classification has gained attention due to its cost-effectiveness and competitive performance. However, recent works in this area face a trade-off. Finetuning the pretrained model to achieve strong supervised performance results in low zero-shot generalization. Similarly, freezing the backbone to retain zero-shot capability… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: Accepted at CVPR-2023. Codes/models available at https://github.com/TalalWasim/Vita-CLIP

  47. Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks

    Authors: Md. Tanvir Rouf Shawon, G. M. Shahariar, Faisal Muhammad Shah, Mohammad Shafiul Alam, Md. Shahriar Mahbub

    Abstract: This paper investigates the potential of semi-supervised Generative Adversarial Networks (GANs) to fine-tune pretrained language models in order to classify Bengali fake reviews from real reviews with a few annotated data. With the rise of social media and e-commerce, the ability to detect fake or deceptive reviews is becoming increasingly important in order to protect consumers from being misled… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  48. arXiv:2304.01200  [pdf, other

    cs.CV

    Video Instance Segmentation in an Open-World

    Authors: Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

    Abstract: Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it i… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: 9 pages, 5 figures

  49. arXiv:2303.17959  [pdf, other

    cs.CV eess.IV

    Diffusion Action Segmentation

    Authors: Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu

    Abstract: Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random no… ▽ More

    Submitted 11 August, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  50. arXiv:2303.16268  [pdf, other

    cs.CV cs.LG

    TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

    Authors: Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah

    Abstract: Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inducti… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR-2023