Skip to main content

Showing 1–50 of 56 results for author: Sclaroff, S

  1. arXiv:2310.18946  [pdf, other

    cs.CV cs.MM

    Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

    Authors: Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko

    Abstract: In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: T-PAMI. arXiv admin note: substantial text overlap with arXiv:2204.03513

  2. arXiv:2308.01890  [pdf, other

    cs.CV cs.LG

    DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

    Authors: Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

    Abstract: Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between te… ▽ More

    Submitted 13 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: TPAMI. arXiv admin note: substantial text overlap with arXiv:2206.09541

  3. arXiv:2306.17848  [pdf, other

    cs.CV

    Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

    Authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

    Abstract: Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting propert… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  4. arXiv:2304.07500  [pdf, other

    cs.CV

    The 7th AI City Challenge

    Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

    Abstract: The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries. Track 1 was a brand new track that… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: Summary of the 7th AI City Challenge Workshop in conjunction with CVPR 2023

  5. arXiv:2211.16499  [pdf, other

    cs.CV cs.AI cs.LG

    Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

    Authors: Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

    Abstract: Modern deep neural networks tend to be evaluated on static test sets. One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations. For example, it is hard to study the robustness of these networks to variations of object scale, object pose, scene lighting and 3D occlusions. The main reason is that co… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Published at the Conference on Neural Information Processing Systems (NeurIPS) 2022

  6. arXiv:2204.11929  [pdf, other

    cs.CV

    Temporal Relevance Analysis for Video Action Models

    Authors: Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal

    Abstract: In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature. We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models based on layer-wise relevance propagation. We then conduct comprehensive experiments and in-depth analysis to provide a better unders… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  7. arXiv:2204.10380  [pdf, other

    cs.CV

    The 6th AI City Challenge

    Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Archana Venkatachalapathy, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Alice Li, Shangru Li, Rama Chellappa

    Abstract: The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), and brick and mortar retail businesses. The four challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries.… ▽ More

    Submitted 9 June, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: Summary of the 6th AI City Challenge Workshop in conjunction with CVPR 2022. arXiv admin note: text overlap with arXiv:2104.12233

  8. arXiv:2204.03513  [pdf, other

    cs.CV cs.AI cs.MM

    Many-to-many Splatting for Efficient Video Frame Interpolation

    Authors: Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko

    Abstract: Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant. Yet due to the inherent challenges of motion estimation (e.g. occlusions and discontinuities), most state-of-the-art interpolation approaches require subsequent refinement of the warped result to generate satisfying outputs, which drastically decreases the eff… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: CVPR2022, Project: https://github.com/feinanshan/M2M_VFI

  9. arXiv:2204.00172  [pdf, other

    cs.CV cs.LG

    A Unified Framework for Domain Adaptive Pose Estimation

    Authors: Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff

    Abstract: While pose estimation is an important computer vision task, it requires expensive annotation and suffers from domain shift. In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision. While several domain adaptive pose estimation models have been proposed recently, they are not… ▽ More

    Submitted 5 August, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

  10. arXiv:2203.11819  [pdf, other

    cs.CV

    A Broad Study of Pre-training for Domain Generalization and Adaptation

    Authors: Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko

    Abstract: Deep models must learn robust and transferable representations in order to perform well on new domains. While domain transfer methods (e.g., domain adaptation, domain generalization) have been proposed to learn transferable representations across domains, they are typically applied to ResNet backbones pre-trained on ImageNet. Thus, existing works pay little attention to the effects of pre-training… ▽ More

    Submitted 20 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  11. arXiv:2108.11974  [pdf, other

    cs.CV

    Learning Cross-modal Contrastive Features for Video Domain Adaptation

    Authors: Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker

    Abstract: Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted in ICCV'21

  12. arXiv:2108.10860  [pdf, other

    cs.CV

    Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density

    Authors: Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, Kate Saenko

    Abstract: Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains. However, optimal hyper-parameter selection is critical to achieving high accuracy and avoiding negative transfer. Supervised hyper-parameter validation is not possible without labeled target data, which raises the question: How can we validate unsupervised adaptation techniques in a re… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: ICCV2021

  13. arXiv:2106.04569  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Simulated Adversarial Testing of Face Recognition Models

    Authors: Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

    Abstract: Most machine learning models are validated and tested on fixed datasets. This can give an incomplete picture of the capabilities and weaknesses of the model. Such weaknesses can be revealed at test time in the real world. The risks involved in such failures can be loss of profits, loss of time or even loss of life in certain critical applications. In order to alleviate this issue, simulators can b… ▽ More

    Submitted 31 May, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  14. arXiv:2104.12233  [pdf, other

    cs.CV cs.AI

    The 5th AI City Challenge

    Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Xiaodong Yang, Yue Yao, Liang Zheng, Pranamesh Chakraborty, Christian E. Lopez, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff

    Abstract: The AI City Challenge was created with two goals in mind: (1) pushing the boundaries of research and development in intelligent video analysis for smarter cities use cases, and (2) assessing tasks where the level of performance is enough to cause real-world adoption. Transportation is a segment ripe for such adoption. The fifth AI City Challenge attracted 305 participating teams across 38 countrie… ▽ More

    Submitted 24 May, 2021; v1 submitted 25 April, 2021; originally announced April 2021.

    Comments: Summary of the 5th AI City Challenge Workshop in conjunction with CVPR 2021

  15. arXiv:2101.04741  [pdf, other

    cs.CV

    CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions

    Authors: Qi Feng, Vitaly Ablavsky, Stan Sclaroff

    Abstract: Natural Language (NL) descriptions can be one of the most convenient or the only way to interact with systems built to understand and detect city scale traffic patterns and vehicle-related events. In this paper, we extend the widely adopted CityFlow Benchmark with NL descriptions for vehicle targets and introduce the CityFlow-NL Benchmark. The CityFlow-NL contains more than 5,000 unique and precis… ▽ More

    Submitted 5 April, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: The code and data we use in this paper are available at: https://github.com/fredfung007/cityflow-nl

  16. arXiv:2008.00348  [pdf, other

    cs.CV

    Self-supervised Visual Attribute Learning for Fashion Compatibility

    Authors: Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer

    Abstract: Many self-supervised learning (SSL) methods have been successful in learning semantically meaningful visual representations by solving pretext tasks. However, prior work in SSL focuses on tasks like object recognition or detection, which aim to learn object shapes and assume that the features should be invariant to concepts like colors and textures. Thus, these SSL methods perform poorly on downst… ▽ More

    Submitted 11 August, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

    Comments: Accepted to VIPriors Workshop ICCV 2021

  17. arXiv:2007.03815  [pdf, other

    cs.CV cs.MM cs.RO

    Real-time Semantic Segmentation with Fast Attention

    Authors: Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

    Abstract: In deep CNN based models for semantic segmentation, high accuracy relies on rich spatial context (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel architecture that addresses both challenges and achieves state-of-the-art performance for semantic segmentation of high-resolution images and videos in re… ▽ More

    Submitted 9 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: project page: https://cs-people.bu.edu/pinghu/FANet.html

  18. arXiv:2006.06493  [pdf, other

    cs.CV cs.CR cs.LG cs.NE

    Protecting Against Image Translation Deepfakes by Leaking Universal Perturbations from Black-Box Neural Networks

    Authors: Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

    Abstract: In this work, we develop efficient disruptions of black-box image translation deepfake generation systems. We are the first to demonstrate black-box deepfake generation disruption by presenting image translation formulations of attacks initially proposed for classification models. Nevertheless, a naive adaptation of classification black-box attacks results in a prohibitive number of queries for im… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  19. arXiv:2004.01800  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    Temporally Distributed Networks for Fast Video Semantic Segmentation

    Authors: Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi

    Abstract: We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefo… ▽ More

    Submitted 6 April, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: [CVPR2020] Project: https://github.com/feinanshan/TDNet

  20. arXiv:2004.00180  [pdf, other

    cs.CV

    Spatio-Temporal Action Detection with Multi-Object Interaction

    Authors: Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

    Abstract: Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube". Nowadays, most spatio-temporal action detection datasets (e.g. UCF101-24, AVA, DALY) are annotated with action tubes that contain a single person performing the action, thus the predominant action detection models simply employ a person detection and tracking pip… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

  21. arXiv:2003.08264  [pdf, other

    cs.CV

    Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

    Authors: Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

    Abstract: Existing unsupervised domain adaptation methods aim to transfer knowledge from a label-rich source domain to an unlabeled target domain. However, obtaining labels for some source domains may be very expensive, making complete labeling as used in prior work impractical. In this work, we investigate a new domain adaptation scenario with sparsely labeled source data, where only a few examples in the… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

  22. arXiv:2003.06498  [pdf, other

    cs.CV

    Explainable Deep Classification Models for Domain Generalization

    Authors: Andrea Zunino, Sarah Adel Bargal, Riccardo Volpi, Mehrnoosh Sameki, Jianming Zhang, Stan Sclaroff, Vittorio Murino, Kate Saenko

    Abstract: Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

  23. arXiv:2003.01279  [pdf, other

    cs.CV cs.CR cs.CY cs.LG

    Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems

    Authors: Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

    Abstract: Face modification systems using deep learning have become increasingly powerful and accessible. Given images of a person's face, such systems can generate new images of that same person under different expressions and poses. Some systems can also modify targeted attributes such as hair color or age. This type of manipulated images and video have been coined Deepfakes. In order to prevent a malicio… ▽ More

    Submitted 27 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Accepted at CVPR 2020 Workshop on Adversarial Machine Learning in Computer Vision

  24. arXiv:2002.07953  [pdf, other

    cs.CV

    Universal Domain Adaptation through Self Supervision

    Authors: Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko

    Abstract: Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains. While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori. We propose a more universally applicable domain… ▽ More

    Submitted 5 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: Accepted to NeurIPS2020

  25. arXiv:2002.07362  [pdf, other

    cs.CV

    MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

    Authors: Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni

    Abstract: Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a ``slow-fast'' architecture, w… ▽ More

    Submitted 10 October, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

    Comments: Accepted in ICCV 2021 MTL Workshop

  26. arXiv:2002.05242  [pdf, other

    cs.CV cs.HC cs.LG

    Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System

    Authors: Nataniel Ruiz, Hao Yu, Danielle A. Allessio, Mona Jalal, Ajjen Joshi, Thomas Murray, John J. Magee, Jacob R. Whitehill, Vitaly Ablavsky, Ivon Arroyo, Beverly P. Woolf, Stan Sclaroff, Margrit Betke

    Abstract: In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student's face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning that the ability to predict such outcomes enables tutoring sys… ▽ More

    Submitted 8 April, 2022; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: Published at IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021 - Best Poster Award (4% award rate)

  27. arXiv:1912.10982  [pdf, other

    cs.CV

    DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition

    Authors: Nuno C. Garcia, Sarah Adel Bargal, Vitaly Ablavsky, Pietro Morerio, Vittorio Murino, Stan Sclaroff

    Abstract: In this work, we address the problem of learning an ensemble of specialist networks using multimodal data, while considering the realistic and challenging scenario of possible missing modalities at test time. Our goal is to leverage the complementary information of multiple modalities to the benefit of the ensemble and each individual network. We introduce a novel Distillation Multiple Choice Lear… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

  28. arXiv:1912.02048  [pdf, other

    cs.CV

    Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers

    Authors: Qi Feng, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff

    Abstract: We propose a novel Siamese Natural Language Tracker (SNLT), which brings the advancements in visual tracking to the tracking by natural language (NL) descriptions task. The proposed SNLT is applicable to a wide range of Siamese trackers, providing a new class of baselines for the tracking by NL task and promising future improvements from the advancements of Siamese trackers. The carefully designed… ▽ More

    Submitted 5 April, 2021; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: CVPR 2021

  29. arXiv:1909.03493  [pdf, other

    cs.CV cs.CL

    MULE: Multimodal Universal Language Embedding

    Authors: Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

    Abstract: Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. The… ▽ More

    Submitted 28 December, 2019; v1 submitted 8 September, 2019; originally announced September 2019.

    Comments: Accepted as an oral at AAAI 2020

  30. arXiv:1908.06327  [pdf, other

    cs.CV cs.CL

    Language Features Matter: Effective Language Representations for Vision-Language Tasks

    Authors: Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

    Abstract: Shouldn't language and vision features be treated equally in vision-language (VL) tasks? Many VL approaches treat the language component as an afterthought, using simple language models that are either built upon fixed word embeddings trained on text-only data or are learned from scratch. We believe that language features deserve more attention, and conduct experiments which compare different word… ▽ More

    Submitted 17 August, 2019; originally announced August 2019.

    Comments: ICCV 2019 accepted paper

  31. arXiv:1907.11751  [pdf, other

    cs.CV

    Real-time Visual Object Tracking with Natural Language Description

    Authors: Qi Feng, Vitaly Ablavsky, Qinxun Bai, Guorong Li, Stan Sclaroff

    Abstract: In recent years, deep-learning-based visual object trackers have been studied thoroughly, but handling occlusions and/or rapid motion of the target remains challenging. In this work, we argue that conditioning on the natural language (NL) description of a target provides information for longer-term invariance, and thus helps cope with typical tracking challenges. However, deriving a formulation to… ▽ More

    Submitted 3 December, 2019; v1 submitted 26 July, 2019; originally announced July 2019.

  32. arXiv:1906.04833  [pdf, other

    cs.CV cs.LG

    Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition

    Authors: Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff

    Abstract: Learning from a few examples is a challenging task for machine learning. While recent progress has been made for this problem, most of the existing methods ignore the compositionality in visual concept representation (e.g. objects are built from parts or composed of semantic attributes), which is key to the human ability to easily learn from a small number of examples. To enhance the few-shot lear… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

  33. arXiv:1906.02033  [pdf, other

    cs.CV

    Multi-way Encoding for Robustness

    Authors: Donghyun Kim, Sarah Adel Bargal, Jianming Zhang, Stan Sclaroff

    Abstract: Deep models are state-of-the-art for many computer vision tasks including image classification and object detection. However, it has been shown that deep models are vulnerable to adversarial examples. We highlight how one-hot encoding directly contributes to this vulnerability and propose breaking away from this widely-used, but highly-vulnerable mapping. We demonstrate that by leveraging a differ… ▽ More

    Submitted 15 January, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted at WACV 2020

  34. arXiv:1904.06487  [pdf, other

    cs.CV

    Semi-supervised Domain Adaptation via Minimax Entropy

    Authors: Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

    Abstract: Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision. However, we show that these techniques perform poorly when even a few labeled examples are available in the target. To address this semi-supervised domain adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach that adversaria… ▽ More

    Submitted 14 September, 2019; v1 submitted 13 April, 2019; originally announced April 2019.

    Comments: accepted to ICCV2019. ICCV paper version

  35. arXiv:1812.02626  [pdf, other

    cs.CV

    Guided Zoom: Questioning Network Evidence for Fine-grained Classification

    Authors: Sarah Adel Bargal, Andrea Zunino, Vitali Petsiuk, Jianming Zhang, Kate Saenko, Vittorio Murino, Stan Sclaroff

    Abstract: We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions. It does so by making sure the model has "the right reasons" for a prediction, defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep convolutional neural network makes a prediction is def… ▽ More

    Submitted 23 March, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

    Comments: BMVC 2019 Camera Ready Version

  36. Revisiting Image-Language Networks for Open-ended Phrase Detection

    Authors: Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

    Abstract: Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image. In this paper we address a more realistic version of the natural language grounding task where we must both identify whether the phrase is relevant to an image and localize the phrase. This can also be viewed as a generalization of object detection to… ▽ More

    Submitted 12 October, 2020; v1 submitted 17 November, 2018; originally announced November 2018.

    Comments: Accepted to TPAMI

  37. arXiv:1811.06868  [pdf, other

    cs.CV

    Cost-Aware Fine-Grained Recognition for IoTs Based on Sequential Fixations

    Authors: Hanxiao Wang, Venkatesh Saligrama, Stan Sclaroff, Vitaly Ablavsky

    Abstract: We consider the problem of fine-grained classification on an edge camera device that has limited power. The edge device must sparingly interact with the cloud to minimize communication bits to conserve power, and the cloud upon receiving the edge inputs returns a classification label. To deal with fine-grained classification, we adopt the perspective of sequential fixation with a foveated field-of… ▽ More

    Submitted 8 August, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

  38. arXiv:1808.01990  [pdf, other

    cs.LG cs.CV stat.ML

    Hashing with Binary Matrix Pursuit

    Authors: Fatih Cakir, Kun He, Stan Sclaroff

    Abstract: We propose theoretical and empirical improvements for two-stage hashing methods. We first provide a theoretical analysis on the quality of the binary codes and show that, under mild assumptions, a residual learning scheme can construct binary codes that fit any neighborhood structure with arbitrary accuracy. Secondly, we show that with high-capacity hash functions such as CNNs, binary code inferen… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Comments: 23 pages, 4 figures. In Proceedings of European Conference on Computer Vision (ECCV), 2018

  39. arXiv:1805.09092  [pdf, other

    cs.CV

    Excitation Dropout: Encouraging Plasticity in Deep Neural Networks

    Authors: Andrea Zunino, Sarah Adel Bargal, Pietro Morerio, Jianming Zhang, Stan Sclaroff, Vittorio Murino

    Abstract: We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contri… ▽ More

    Submitted 21 January, 2021; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: This work is published in the International Journal of Computer Vision (IJCV) in 2021

  40. arXiv:1804.05312  [pdf, other

    cs.CV

    Local Descriptors Optimized for Average Precision

    Authors: Kun He, Yan Lu, Stan Sclaroff

    Abstract: Extraction of local feature descriptors is a vital stage in the solution pipelines for numerous computer vision tasks. Learning-based approaches improve performance in certain tasks, but still cannot replace handcrafted features in general. In this paper, we improve the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows… ▽ More

    Submitted 17 April, 2018; v1 submitted 15 April, 2018; originally announced April 2018.

    Comments: 13 pages, 8 figures. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

  41. arXiv:1804.05113  [pdf, other

    cs.CV

    Multilevel Language and Vision Integration for Text-to-Clip Retrieval

    Authors: Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

    Abstract: We address the problem of text-based activity retrieval in video. Given a sentence describing an activity, our task is to retrieve matching clips from an untrimmed video. To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work. First, we inject text features early on whe… ▽ More

    Submitted 25 December, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

    Comments: AAAI 2019

  42. arXiv:1803.00974  [pdf, other

    cs.CV

    Hashing with Mutual Information

    Authors: Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff

    Abstract: Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning binary vector embeddings under a supervised setting, also known as hashing. We propose a novel supervised hashing method based on optimizing an information-theor… ▽ More

    Submitted 24 June, 2018; v1 submitted 2 March, 2018; originally announced March 2018.

  43. arXiv:1711.06778  [pdf, other

    cs.CV

    Excitation Backprop for RNNs

    Authors: Sarah Adel Bargal, Andrea Zunino, Donghyun Kim, Jianming Zhang, Vittorio Murino, Stan Sclaroff

    Abstract: Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such stu… ▽ More

    Submitted 8 March, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

    Comments: CVPR 2018 Camera Ready Version

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

  44. arXiv:1705.08562  [pdf, other

    stat.ML cs.CV cs.LG

    Hashing as Tie-Aware Learning to Rank

    Authors: Kun He, Fatih Cakir, Sarah Adel Bargal, Stan Sclaroff

    Abstract: Hashing, or learning binary embeddings of data, is frequently used in nearest neighbor retrieval. In this paper, we develop learning to rank formulations for hashing, aimed at directly optimizing ranking-based evaluation metrics such as Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We first observe that the integer-valued Hamming distance often leads to tied rankings, an… ▽ More

    Submitted 9 October, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: 15 pages, 3 figures. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

  45. arXiv:1705.00366  [pdf, other

    cs.CV

    Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

    Authors: Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman

    Abstract: We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight wid… ▽ More

    Submitted 30 April, 2017; originally announced May 2017.

  46. arXiv:1703.08919  [pdf, other

    cs.CV

    MIHash: Online Hashing with Mutual Information

    Authors: Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff

    Abstract: Learning-based hashing methods are widely used for nearest neighbor retrieval, and recently, online hashing methods have demonstrated good performance-complexity trade-offs by learning hash functions from streaming data. In this paper, we first address a key challenge for online hashing: the binary codes for indexed data must be recomputed to keep pace with updates to the hash functions. We propos… ▽ More

    Submitted 29 July, 2017; v1 submitted 26 March, 2017; originally announced March 2017.

    Comments: International Conference on Computer Vision (ICCV), 2017

  47. arXiv:1702.00583  [pdf, other

    cs.CV

    Automating Image Analysis by Annotating Landmarks with Deep Neural Networks

    Authors: Mikhail Breslav, Tyson L. Hedrick, Stan Sclaroff, Margrit Betke

    Abstract: Image and video analysis is often a crucial step in the study of animal behavior and kinematics. Often these analyses require that the position of one or more animal landmarks are annotated (marked) in numerous images. The process of annotating landmarks can require a significant amount of time and tedious labor, which motivates the need for algorithms that can automatically annotate landmarks. In… ▽ More

    Submitted 2 February, 2017; originally announced February 2017.

    Comments: 30 pages

  48. arXiv:1608.00507  [pdf, other

    cs.CV

    Top-down Neural Attention by Excitation Backprop

    Authors: Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff

    Abstract: We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop, to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. Furthermore, we introduce the co… ▽ More

    Submitted 1 August, 2016; originally announced August 2016.

    Comments: A shorter version of this paper is accepted at ECCV, 2016 (oral)

  49. arXiv:1607.07525  [pdf, other

    cs.CV

    Salient Object Subitizing

    Authors: Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

    Abstract: We study the problem of Salient Object Subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1-4). To this end, we present a salient object subitizing image dataset of about 14K everyday images which are annotated… ▽ More

    Submitted 25 July, 2016; originally announced July 2016.

  50. arXiv:1605.00707  [pdf, other

    cs.CV

    Discovering Useful Parts for Pose Estimation in Sparsely Annotated Datasets

    Authors: Mikhail Breslav, Tyson L. Hedrick, Stan Sclaroff, Margrit Betke

    Abstract: Our work introduces a novel way to increase pose estimation accuracy by discovering parts from unannotated regions of training images. Discovered parts are used to generate more accurate appearance likelihoods for traditional part-based models like Pictorial Structures [13] and its derivatives. Our experiments on images of a hawkmoth in flight show that our proposed approach significantly improves… ▽ More

    Submitted 2 May, 2016; originally announced May 2016.

    Comments: Accepted at WACV 2016