Skip to main content

Showing 1–13 of 13 results for author: Rizve, M N

  1. arXiv:2407.09073  [pdf, other

    cs.CV

    Open Vocabulary Multi-Label Video Classification

    Authors: Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

    Abstract: Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2403.14870  [pdf, other

    cs.CV cs.CL cs.LG

    VidLA: Video-Language Alignment at Scale

    Authors: Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

    Abstract: In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  3. arXiv:2310.08584  [pdf, other

    cs.CV

    Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

    Authors: Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis

    Abstract: Self-supervised learning has unlocked the potential of scaling up pretraining to billions of images, since annotation is unnecessary. But are we making the best use of data? How more economical can we be? In this work, we attempt to answer this question by making two contributions. First, we investigate first-person videos and introduce a "Walking Tours" dataset. These videos are high-resolution,… ▽ More

    Submitted 23 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024 (Best paper honorable mention). Project Page: https://shashankvkt.github.io/dora

  4. arXiv:2309.03989  [pdf, other

    cs.CV

    CDFSL-V: Cross-Domain Few-Shot Learning for Videos

    Authors: Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

    Abstract: Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples, thereby reducing the challenges associated with collecting and annotating large-scale video datasets. Existing methods in video action recognition rely on large labeled datasets from the same domain. However, this setup is not realistic as novel categories may come from differ… ▽ More

    Submitted 15 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  5. arXiv:2308.13077  [pdf, other

    cs.CV

    Preserving Modality Structure Improves Multi-Modal Learning

    Authors: Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

    Abstract: Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations. These joint embeddings enable zero-shot cross-modal tasks like retrieval and classification. However, these methods often struggle to generalize well on out-of-domain data as they ignore the semantic struct… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  6. arXiv:2303.16268  [pdf, other

    cs.CV cs.LG

    TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

    Authors: Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah

    Abstract: Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inducti… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR-2023

  7. arXiv:2207.02269  [pdf, other

    cs.CV cs.LG

    Towards Realistic Semi-Supervised Learning

    Authors: Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

    Abstract: Deep learning is pushing the state-of-the-art in many computer vision applications. However, it relies on large annotated data repositories, and capturing the unconstrained nature of the real-world data is yet to be solved. Semi-supervised learning (SSL) complements the annotated training data with a large corpus of unlabeled data to reduce annotation cost. The standard SSL approach assumes unlabe… ▽ More

    Submitted 28 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022 (Oral)

  8. arXiv:2207.02261  [pdf, other

    cs.CV cs.LG

    OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

    Authors: Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. One common assumption in most SSL methods is that the labeled and unlabeled data are from the same data distribution. Howeve… ▽ More

    Submitted 28 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  9. arXiv:2203.14542  [pdf, other

    cs.CV cs.LG

    UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning

    Authors: Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah

    Abstract: Supervised deep learning methods require a large repository of annotated data; hence, label noise is inevitable. Training with such noisy data negatively impacts the generalization performance of deep neural networks. To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data. Next, an off-the-shelf semi-supervise… ▽ More

    Submitted 26 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  10. arXiv:2103.01315  [pdf, other

    cs.CV cs.LG

    Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning

    Authors: Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: In many real-world problems, collecting a large number of labeled samples is infeasible. Few-shot learning (FSL) is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples. FSL tasks have been predominantly solved by leveraging the ideas from gradient-based meta-learning and metric learning approaches. Howe… ▽ More

    Submitted 19 April, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  11. TCLR: Temporal Contrastive Learning for Video Representation

    Authors: Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah

    Abstract: Contrastive learning has nearly closed the gap between supervised and self-supervised learning of image representations, and has also been explored for videos. However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension. We develop a new temporal contrastive learning framework consisting… ▽ More

    Submitted 30 March, 2022; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: Accepted to Computer Vision and Image Understanding (CVIU) Journal

  12. arXiv:2101.06329  [pdf, other

    cs.LG cs.CV

    In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

    Authors: Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

    Abstract: The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to generate for all data modalities. Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its origin… ▽ More

    Submitted 19 April, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: ICLR 2021

  13. arXiv:2004.11475  [pdf, other

    cs.CV eess.IV

    Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

    Authors: Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah

    Abstract: Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security… ▽ More

    Submitted 19 May, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: 9 pages