subscribe to arXiv mailings

Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

Authors: Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, Yu Xiang

Abstract: Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision metho… ▽ More Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method. Code is available at: https://github.com/YoungSean/NIDS-Net △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 22 pages, 9 figures, Code is available at: https://github.com/YoungSean/NIDS-Net

arXiv:2307.03073 [pdf, other]

Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning

Authors: Jishnu Jaykumar P, Kamalesh Palanisamy, Yu-Wei Chao, Xinya Du, Yu Xiang

Abstract: We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by unimodal prototypical networks for few-shot learning, we introduce Proto-CLIP which utilizes image prototypes and text prototypes for few-shot learning. Specifically, Proto-CLIP adapts the image and text encoder embeddings from CLIP in a joint fashion using few-shot exampl… ▽ More We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by unimodal prototypical networks for few-shot learning, we introduce Proto-CLIP which utilizes image prototypes and text prototypes for few-shot learning. Specifically, Proto-CLIP adapts the image and text encoder embeddings from CLIP in a joint fashion using few-shot examples. The embeddings from the two encoders are used to compute the respective prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of the corresponding classes. Such alignment is beneficial for few-shot classification due to the reinforced contributions from both types of prototypes. Proto-CLIP has both training-free and fine-tuned variants. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning, as well as in the real world for robot perception. The project page is available at https://irvlutd.github.io/Proto-CLIP △ Less

Submitted 14 July, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2306.15620 [pdf, other]

SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes

Authors: Ninad Khargonkar, Sai Haneesh Allu, Yangxiao Lu, Jishnu Jaykumar P, Balakrishnan Prabhakaran, Yu Xiang

Abstract: We present a new reproducible benchmark for evaluating robot manipulation in the real world, specifically focusing on pick-and-place. Our benchmark uses the YCB objects, a commonly used dataset in the robotics community, to ensure that our results are comparable to other studies. Additionally, the benchmark is designed to be easily reproducible in the real world, making it accessible to researcher… ▽ More We present a new reproducible benchmark for evaluating robot manipulation in the real world, specifically focusing on pick-and-place. Our benchmark uses the YCB objects, a commonly used dataset in the robotics community, to ensure that our results are comparable to other studies. Additionally, the benchmark is designed to be easily reproducible in the real world, making it accessible to researchers and practitioners. We also provide our experimental results and analyzes for model-based and model-free 6D robotic grasping on the benchmark, where representative algorithms are evaluated for object perception, grasping planning, and motion planning. We believe that our benchmark will be a valuable tool for advancing the field of robot manipulation. By providing a standardized evaluation framework, researchers can more easily compare different techniques and algorithms, leading to faster progress in developing robot manipulation methods. △ Less

Submitted 11 March, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted to ICRA 2024. Project page is available at https://irvlutd.github.io/SceneReplica

arXiv:2207.03333 [pdf, other]

FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments

Authors: Jishnu Jaykumar P, Yu-Wei Chao, Yu Xiang

Abstract: We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. We investigated (i) few-… ▽ More We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. We investigated (i) few-shot object classification and (ii) joint object segmentation and few-shot classification with the state-of-the-art methods for few-shot learning and meta-learning using our dataset. The evaluation results show that there is still a large margin to be improved for few-shot object classification in robotic environments. Our dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition. The dataset and code are available at https://irvlutd.github.io/FewSOL. △ Less

Submitted 5 March, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:1604.01833 [pdf, other]

System for Filtering Messages on Social Media Content

Authors: Jinju Joby P, Jyothi Korra

Abstract: The social networking era has left us with little privacy. The details of the social network users are published on Social Networking sites. Vulnerability has reached new heights due to the overpowering effects of social networking. The sites like Facebook, Twitter are having a huge set of users who publish their files, comments, messages in other users walls. These messages and comments could be… ▽ More The social networking era has left us with little privacy. The details of the social network users are published on Social Networking sites. Vulnerability has reached new heights due to the overpowering effects of social networking. The sites like Facebook, Twitter are having a huge set of users who publish their files, comments, messages in other users walls. These messages and comments could be of any nature. Even friends could post a comment that would harm a persons integrity. Thus there has to be a system which will monitor the messages and comments that are posted on the walls. If the messages are found to be neutral (does not have any harmful content), then it can be published. If the messages are found to have non-neutral content in them, then these messages would be blocked by the social network manager. The messages that are non-neutral would be of sexual, offensive, hatred, pun intended nature. Thus the social network manager can classify content as neutral and non-neutral and notify the user if there seems to be messages of non-neutral behavior. △ Less

Submitted 6 April, 2016; originally announced April 2016.

Showing 1–5 of 5 results for author: P, J J