Skip to main content

Showing 1–50 of 52 results for author: Satoh, S

  1. arXiv:2405.17188  [pdf, other

    cs.CV

    The SkatingVerse Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

    Abstract: The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

    Authors: Avinash Anand, Raj Jaiswal, Pijush Bhuyan, Mohit Gupta, Siddhesh Bangar, Md. Modassir Imam, Rajiv Ratn Shah, Shin'ichi Satoh

    Abstract: The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition… ▽ More

    Submitted 19 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 8 pages, 2 figures, Workshop of 1st MMIR Deep Multimodal Learning for Information Retrieval

  3. RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

    Authors: Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

    Abstract: Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these… ▽ More

    Submitted 19 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures, MMAsia 2023 Proceedings of the 5th ACM International Conference on Multimedia in Asia

    Journal ref: In Proceedings of the 5th ACM International Conference on Multimedia in Asia 2023. Association for Computing Machinery, NY, USA, Article 74, pp. 1-6

  4. arXiv:2403.18158  [pdf, other

    cs.CV

    The Effects of Short Video-Sharing Services on Video Copy Detection

    Authors: Rintaro Yanagi, Yamato Okamoto, Shuhei Yokoo, Shin'ichi Satoh

    Abstract: The short video-sharing services that allow users to post 10-30 second videos (e.g., YouTube Shorts and TikTok) have attracted a lot of attention in recent years. However, conventional video copy detection (VCD) methods mainly focus on general video-sharing services (e.g., YouTube and Bilibili), and the effects of short video-sharing services on video copy detection are still unclear. Considering… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  5. arXiv:2401.16193  [pdf, other

    cs.LG cs.DB

    Contributing Dimension Structure of Deep Feature for Coreset Selection

    Authors: Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin'ichi Satoh

    Abstract: Coreset selection seeks to choose a subset of crucial training samples for efficient learning. It has gained traction in deep learning, particularly with the surge in training dataset sizes. Sample selection hinges on two main aspects: a sample's representation in enhancing performance and the role of sample diversity in averting overfitting. Existing methods typically measure both the representat… ▽ More

    Submitted 2 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 13 pages,11 figures, to be published in AAAI2024

  6. arXiv:2309.08372  [pdf, other

    cs.CV cs.MM

    Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval

    Authors: Kejun Lin, Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Shin'ichi Satoh

    Abstract: Person re-identification (re-ID) requires densely distributed cameras. In practice, the person of interest may not be captured by cameras and, therefore, needs to be retrieved using subjective information (e.g., sketches from witnesses). Previous research defines this case using the sketch as sketch re-identification (Sketch re-ID) and focuses on eliminating the domain gap. Actually, subjectivity… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: ACM Multimedia 2023

  7. arXiv:2306.11528  [pdf, other

    cs.CV

    TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting

    Authors: Liang Liao, Taorong Liu, Delin Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh

    Abstract: Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that… ▽ More

    Submitted 20 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Under review

  8. arXiv:2305.07290  [pdf, other

    cs.CV

    The 3rd Anti-UAV Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Jianan Li, Lei Jin, Jiaming Chu, Zhihao Zhang, Jun Wang, Jiangqiang Xia, Kai Wang, Yang Liu, Sadaf Gulshad, Jiaojiao Zhao, Tianyang Xu, Xuefeng Zhu, Shihan Liu, Zheng Zhu, Guibo Zhu, Zechao Li, Zheng Wang, Baigui Sun, Yandong Guo, Shin ichi Satoh, Junliang Xing, Jane Shen Shengmei

    Abstract: The 3rd Anti-UAV Workshop & Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking. The Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released. There are two main differences between this year's competition and the previous two. First, we have expanded the existing dataset, and for the first time, released a training set s… ▽ More

    Submitted 15 July, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Technical report for 3rd Anti-UAV Workshop and Challenge. arXiv admin note: text overlap with arXiv:2108.09909

  9. arXiv:2304.06430  [pdf, other

    cs.CV cs.AI

    Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

    Authors: Astha Verma, A V Subramanyam, Siddhesh Bangar, Naman Lal, Rajiv Ratn Shah, Shin'ichi Satoh

    Abstract: Certified defense methods against adversarial perturbations have been recently investigated in the black-box setting with a zeroth-order (ZO) perspective. However, these methods suffer from high model variance with low performance on high-dimensional datasets due to the ineffective design of the denoiser and are limited in their utilization of ZO techniques. To this end, we propose a certified ZO… ▽ More

    Submitted 6 July, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

  10. arXiv:2304.01816  [pdf, other

    cs.CV

    Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

    Authors: Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

    Abstract: Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images. However, our survey of 37 recent papers reveals that many works rely solely on automatic measures (e.g., FID) or perform poorly described human evaluations that are not reliable or repeatable. This paper proposes a standard… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  11. arXiv:2302.12253  [pdf, other

    cs.CV

    DisCO: Portrait Distortion Correction with Perspective-Aware 3D GANs

    Authors: Zhixiang Wang, Yu-Lun Liu, Jia-Bin Huang, Shin'ichi Satoh, Sizhuo Ma, Gurunandan Krishnan, Jian Wang

    Abstract: Close-up facial images captured at short distances often suffer from perspective distortion, resulting in exaggerated facial features and unnatural/unattractive appearances. We propose a simple yet effective method for correcting perspective distortions in a single close-up face. We first perform GAN inversion using a perspective-distorted input facial image by jointly optimizing the camera intrin… ▽ More

    Submitted 8 December, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: Project website: https://portrait-disco.github.io/

  12. arXiv:2212.05709  [pdf, other

    cs.CV

    HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design

    Authors: Hui Wei, Zhixiang Wang, Xuemei Jia, Yinqiang Zheng, Hao Tang, Shin'ichi Satoh, Zheng Wang

    Abstract: Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attention-grabbing}. To address the need for a physicall… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023

  13. arXiv:2211.07566  [pdf, other

    cs.CV

    Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

    Authors: Zelong Zeng, Fan Yang, Hong Liu, Shin'ichi Satoh

    Abstract: Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class information variation), which is harmful to the generalization of the trained model. To alleviate this problem, in this paper we propose Online Batch Diffusion… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: 14 pages

  14. arXiv:2210.03355  [pdf, other

    cs.CV

    Multiple Object Tracking from appearance by hierarchically clustering tracklets

    Authors: Andreu Girbau, Ferran Marqués, Shin'ichi Satoh

    Abstract: Current approaches in Multiple Object Tracking (MOT) rely on the spatio-temporal coherence between detections combined with object appearance to match objects from consecutive frames. In this work, we explore MOT using object appearances as the main source of association between objects in a video, using spatial and temporal priors as weighting factors. We form initial tracklets by leveraging on t… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: To be published in BMVC 2022

  15. arXiv:2209.15179  [pdf, other

    cs.CV

    Physical Adversarial Attack meets Computer Vision: A Decade Survey

    Authors: Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc Van Gool, Zheng Wang

    Abstract: Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision, their vulnerability to adversarial attacks remains a critical concern. Extensive research has demonstrated that incorporating sophisticated perturbations into input images can lead to a catastrophic degradation in DNNs' performance. This perplexing phenomenon not only exists in the digital space but also in the… ▽ More

    Submitted 1 October, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 19 pages. Under Review

  16. arXiv:2207.14498  [pdf, other

    cs.CV

    Reference-Guided Texture and Structure Inference for Image Inpainting

    Authors: Taorong Liu, Liang Liao, Zheng Wang, Shin'ichi Satoh

    Abstract: Existing learning-based image inpainting methods are still in challenge when facing complex semantic environments and diverse hole patterns. The prior information learned from the large scale training data is still insufficient for these situations. Reference images captured covering the same scenes share similar texture and structure priors with the corrupted images, which offers new prospects fo… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: IEEE International Conference on Image Processing(ICIP 2022)

  17. arXiv:2206.08880  [pdf, other

    cs.CV cs.LG

    Improving Generalization of Metric Learning via Listwise Self-distillation

    Authors: Zelong Zeng, Fan Yang, Zheng Wang, Shin'ichi Satoh

    Abstract: Most deep metric learning (DML) methods employ a strategy that forces all positive samples to be close in the embedding space while keeping them away from negative ones. However, such a strategy ignores the internal relationships of positive (negative) samples and often leads to overfitting, especially in the presence of hard samples and mislabeled samples. In this work, we propose a simple yet ef… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: 11 pages, 7 figures

  18. Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label Diffusion

    Authors: Liang Liao, Wenyi Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh

    Abstract: Understanding foggy image sequence in the driving scenes is critical for autonomous driving, but it remains a challenging task due to the difficulty in collecting and annotating real-world images of adverse weather. Recently, the self-training strategy has been considered a powerful solution for unsupervised domain adaptation, which iteratively adapts the model from the source domain to the target… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: IEEE Transactions on Image Processing 2022

  19. Geo-Localization via Ground-to-Satellite Cross-View Image Retrieval

    Authors: Zelong Zeng, Zheng Wang, Fan Yang, Shin'ichi Satoh

    Abstract: The large variation of viewpoint and irrelevant content around the target always hinder accurate image retrieval and its subsequent tasks. In this paper, we investigate an extremely challenging task: given a ground-view image of a landmark, we aim to achieve cross-view geo-localization by searching out its corresponding satellite-view images. Specifically, the challenge comes from the gap between… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: 13 pages, 10 figures

    Journal ref: IEEE Transactions on Multimedia (2022)

  20. arXiv:2204.00974  [pdf, other

    cs.CV

    Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature

    Authors: Zhixiang Wang, Xiang Ji, Jia-Bin Huang, Shin'ichi Satoh, Xiao Zhou, Yinqiang Zheng

    Abstract: Most computer vision systems assume distortion-free images as inputs. The widely used rolling-shutter (RS) image sensors, however, suffer from geometric distortion when the camera and object undergo motion during capture. Extensive researches have been conducted on correcting RS distortions. However, most of the existing work relies heavily on the prior assumptions of scenes or motions. Besides, t… ▽ More

    Submitted 25 June, 2022; v1 submitted 2 April, 2022; originally announced April 2022.

    Comments: CVPR2022, https://github.com/lightChaserX/neural-global-shutter

  21. arXiv:2203.14438  [pdf, other

    cs.CV

    Optimal Correction Cost for Object Detection Evaluation

    Authors: Mayu Otani, Riku Togashi, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

    Abstract: Mean Average Precision (mAP) is the primary evaluation measure for object detection. Although object detection has a broad range of applications, mAP evaluates detectors in terms of the performance of ranked instance retrieval. Such the assumption for the evaluation task does not suit some downstream tasks. To alleviate the gap between downstream tasks and the evaluation scenario, we propose Optim… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  22. Improving Camouflaged Object Detection with the Uncertainty of Pseudo-edge Labels

    Authors: Nobukatsu Kajiura, Hong Liu, Shin'ichi Satoh

    Abstract: This paper focuses on camouflaged object detection (COD), which is a task to detect objects hidden in the background. Most of the current COD models aim to highlight the target object directly while outputting ambiguous camouflaged boundaries. On the other hand, the performance of the models considering edge information is not yet satisfactory. To this end, we propose a new framework that makes fu… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

    Comments: Accepted to ACM Multimedia Asia 2021

  23. Scalable Personalised Item Ranking through Parametric Density Estimation

    Authors: Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin'ichi Satoh

    Abstract: Learning from implicit feedback is challenging because of the difficult nature of the one-class problem: we can observe only positive examples. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient du… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted by SIGIR'21

  24. arXiv:2101.07481  [pdf, other

    cs.IR

    Density-Ratio Based Personalised Ranking from Implicit Feedback

    Authors: Riku Togashi, Masahiro Kato, Mayu Otani, Shin'ichi Satoh

    Abstract: Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones. Most conventional methods cope with this issue by adopting a pairwise ranking approach with negative sampling. However, the pairwise ranking approach has a severe disadvantage in the convergence time owing to the quadratically increasing computational cost with respect to the… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Comments: Accepted by WWW 2021

  25. arXiv:2012.08054  [pdf, other

    cs.CV

    Image Inpainting Guided by Coherence Priors of Semantics and Textures

    Authors: Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh

    Abstract: Existing inpainting methods have achieved promising performance in recovering defected images of specific scenes. However, filling holes involving multiple semantic categories remains challenging due to the obscure semantic boundaries and the mixture of different semantic textures. In this paper, we introduce coherence priors between the semantics and textures which make it possible to concentrate… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

  26. arXiv:2011.05061  [pdf, other

    cs.IR

    Alleviating Cold-Start Problems in Recommendation through Pseudo-Labelling over Knowledge Graph

    Authors: Riku Togashi, Mayu Otani, Shin'ichi Satoh

    Abstract: Solving cold-start problems is indispensable to provide meaningful recommendation results for new users and items. Under sparsely observed data, unobserved user-item pairs are also a vital source for distilling latent users' information needs. Most present works leverage unobserved samples for extracting negative signals. However, such an optimisation strategy can lead to biased results toward alr… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: WSDM 2021

  27. arXiv:2008.05383  [pdf, other

    cs.CV

    Towards Unsupervised Crowd Counting via Regression-Detection Bi-knowledge Transfer

    Authors: Yuting Liu, Zheng Wang, Miaojing Shi, Shin'ichi Satoh, Qijun Zhao, Hongyu Yang

    Abstract: Unsupervised crowd counting is a challenging yet not largely explored task. In this paper, we explore it in a transfer learning setting where we learn to detect and count persons in an unlabeled target set by transferring bi-knowledge learnt from regression- and detection-based models in a labeled source set. The dual source knowledge of the two models is heterogeneous and complementary as they ca… ▽ More

    Submitted 27 September, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: This paper has been accepted by ACM MM 2020(Oral)

  28. arXiv:2007.13559  [pdf, other

    cs.CV cs.LG eess.IV

    MADGAN: unsupervised Medical Anomaly Detection GAN using multiple adjacent brain MRI slice reconstruction

    Authors: Changhee Han, Leonardo Rundo, Kohei Murao, Tomoyuki Noguchi, Yuki Shimahara, Zoltan Adam Milacski, Saori Koshino, Evis Sala, Hideki Nakayama, Shinichi Satoh

    Abstract: Unsupervised learning can discover various unseen abnormalities, relying on large-scale unannotated medical images of healthy subjects. Towards this, unsupervised methods reconstruct a 2D/3D single medical image to detect outliers either in the learned feature space or from high reconstruction loss. However, without considering continuity between multiple adjacent slices, they cannot directly disc… ▽ More

    Submitted 12 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 23 pages, 11 figures, submitted to BMC Bioinformatics. Extended version of arXiv:1906.06114

  29. An Entropy Clustering Approach for Assessing Visual Question Difficulty

    Authors: Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shun'ichi Satoh

    Abstract: We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the… ▽ More

    Submitted 2 September, 2022; v1 submitted 12 April, 2020; originally announced April 2020.

    Journal ref: IEEE Access, Vol.8, pp. 180633-180645, Sep 2020

  30. Rephrasing visual questions by specifying the entropy of the answer distribution

    Authors: Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shun'ichi Satoh

    Abstract: Visual question answering (VQA) is a task of answering a visual question that is a pair of question and image. Some visual questions are ambiguous and some are clear, and it may be appropriate to change the ambiguity of questions from situation to situation. However, this issue has not been addressed by any prior work. We propose a novel task, rephrasing the questions by controlling the ambiguity… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

    Comments: 10 pages

  31. arXiv:2003.06877  [pdf, other

    cs.CV

    Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes

    Authors: Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh

    Abstract: Completing a corrupted image with correct structures and reasonable textures for a mixed scene remains an elusive challenge. Since the missing hole in a mixed scene of a corrupted image often contains various semantic information, conventional two-stage approaches utilizing structural information often lead to the problem of unreliable structural prediction and ambiguous image texture generation.… ▽ More

    Submitted 10 July, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

  32. arXiv:1906.06114  [pdf, other

    eess.IV cs.CV

    GAN-based Multiple Adjacent Brain MRI Slice Reconstruction for Unsupervised Alzheimer's Disease Diagnosis

    Authors: Changhee Han, Leonardo Rundo, Kohei Murao, Zoltán Ádám Milacski, Kazuki Umemoto, Evis Sala, Hideki Nakayama, Shin'ichi Satoh

    Abstract: Unsupervised learning can discover various unseen diseases, relying on large-scale unannotated medical images of healthy subjects. Towards this, unsupervised methods reconstruct a single medical image to detect outliers either in the learned feature space or from high reconstruction loss. However, without considering continuity between multiple adjacent slices, they cannot directly discriminate di… ▽ More

    Submitted 16 March, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

    Comments: 10 pages, 4 figures, Accepted to Lecture Notes in Bioinformatics (LNBI) as a volume in the Springer series

  33. arXiv:1905.10048  [pdf, other

    cs.CV

    Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification

    Authors: Zheng Wang, Zhixiang Wang, Yinqiang Zheng, Yang Wu, Wenjun Zeng, Shin'ichi Satoh

    Abstract: An efficient and effective person re-identification (ReID) system relieves the users from painful and boring video watching and accelerates the process of video analysis. Recently, with the explosive demands of practical applications, a lot of research efforts have been dedicated to heterogeneous person re-identification (Hetero-ReID). In this paper, we provide a comprehensive review of state-of-t… ▽ More

    Submitted 27 April, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Accepted by IJCAI 2020. Project url: https://github.com/lightChaserX/Awesome-Hetero-reID

    Journal ref: IJCAI 2020

  34. arXiv:1905.04854  [pdf, other

    cs.CV cs.MM

    DotSCN: Group Re-identification via Domain-Transferred Single and Couple Representation Learning

    Authors: Ziling Huang, Zheng Wang, Chung-Chi Tsai, Shin'ichi Satoh, Chia-Wen Lin

    Abstract: Group re-identification (G-ReID) is an important yet less-studied task. Its challenges not only lie in appearance changes of individuals which have been well-investigated in general person re-identification (ReID), but also derive from group layout and membership changes. So the key task of G-ReID is to learn representations robust to such changes. To address this issue, we propose a Transferred S… ▽ More

    Submitted 13 October, 2020; v1 submitted 13 May, 2019; originally announced May 2019.

    Comments: accepted in IEEE Transctions on Circuits and Systems for Video Technology

  35. Illumination-Adaptive Person Re-identification

    Authors: Zelong Zeng, Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, Shin'ichi Satoh

    Abstract: Most person re-identification (ReID) approaches assume that person images are captured under relatively similar illumination conditions. In reality, long-term person retrieval is common, and person images are often captured under different illumination conditions at different times across a day. In this situation, the performances of existing ReID models often degrade dramatically. This paper addr… ▽ More

    Submitted 23 April, 2020; v1 submitted 11 May, 2019; originally announced May 2019.

    Comments: Accepted by TMM

  36. arXiv:1904.00838  [pdf

    cs.CV cs.AI

    Learning More with Less: GAN-based Medical Image Augmentation

    Authors: Changhee Han, Kohei Murao, Shin'ichi Satoh, Hideki Nakayama

    Abstract: Convolutional Neural Network (CNN)-based accurate prediction typically requires large-scale annotated training data. In Medical Imaging, however, both obtaining medical data and annotating them by expert physicians are challenging; to overcome this lack of data, Data Augmentation (DA) using Generative Adversarial Networks (GANs) is essential, since they can synthesize additional annotated training… ▽ More

    Submitted 29 May, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

    Comments: 6 pages, 2 figures, to appear in MEDICAL IMAGING TECHNOLOGY Special Issue

  37. Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images

    Authors: Changhee Han, Kohei Murao, Tomoyuki Noguchi, Yusuke Kawata, Fumiya Uchiyama, Leonardo Rundo, Hideki Nakayama, Shin'ichi Satoh

    Abstract: Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realis… ▽ More

    Submitted 22 August, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: 9 pages, 7 figures, accepted to CIKM 2019 (acceptance rate: 19%)

  38. arXiv:1811.10907  [pdf, other

    cs.CV cs.IR

    Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing

    Authors: Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, Shin'ichi Satoh

    Abstract: Diffusion is commonly used as a ranking or re-ranking method in retrieval tasks to achieve higher retrieval performance, and has attracted lots of attention in recent years. A downside to diffusion is that it performs slowly in comparison to the naive k-NN search, which causes a non-trivial online computational cost on large datasets. To overcome this weakness, we propose a novel diffusion techniq… ▽ More

    Submitted 4 January, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted by AAAI 2019

  39. arXiv:1808.03969  [pdf, other

    cs.CV cs.IR cs.MM

    Reconfigurable Inverted Index

    Authors: Yusuke Matsui, Ryota Hinami, Shin'ichi Satoh

    Abstract: Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decremen… ▽ More

    Submitted 12 August, 2018; originally announced August 2018.

    Comments: ACMMM 2018 (oral). Code: https://github.com/matsui528/rii

  40. Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Authors: Yaman Kumar, Mayank Aggarwal, Pratham Nawal, Shin'ichi Satoh, Rajiv Ratn Shah, Roger Zimmerman

    Abstract: Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation fr… ▽ More

    Submitted 12 August, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Korea

  41. arXiv:1804.06057  [pdf, other

    cs.MM

    Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video

    Authors: Ryota Hinami, Junwei Liang, Shin'ichi Satoh, Alexander Hauptmann

    Abstract: We tackle the problem of learning concept classifiers from videos on the web without using manually labeled data. Although metadata attached to videos (e.g., video titles, descriptions) can be of help collecting training data for the target concept, the collected data is often very noisy. The main challenge is therefore how to select good examples from noisy training data. Previous approaches firs… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

  42. arXiv:1803.00479  [pdf, other

    cs.IR

    Tracked Instance Search

    Authors: Andreu Girbau, Ryota Hinami, Shin'ichi Satoh

    Abstract: In this work we propose tracking as a generic addition to the instance search task. From video data perspective, much information that can be used is not taken into account in the traditional instance search approach. This work aims to provide insights on exploiting such existing information by means of tracking and the proper combination of the results, independently of the instance search system… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

    Comments: Accepted at ICASSP 2018

  43. Digital Watermarking for Deep Neural Networks

    Authors: Yuki Nagai, Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

    Abstract: Although deep neural networks have made tremendous progress in the area of multimedia representation, training neural models requires a large amount of data and time. It is well-known that utilizing trained models as initial weights often achieves lower training error than neural networks that are not pre-trained. A fine-tuning step helps to reduce both the computational cost and improve performan… ▽ More

    Submitted 6 February, 2018; originally announced February 2018.

    Comments: This is a pre-print of an article published in International Journal of Multimedia Information Retrieval. The final authenticated version is available online at: https://doi.org/10.1007/s13735-018-0147-1 . arXiv admin note: substantial text overlap with arXiv:1701.04082

  44. arXiv:1712.09532  [pdf, other

    cs.CV

    Consensus-based Sequence Training for Video Captioning

    Authors: Sang Phan, Gustav Eje Henter, Yusuke Miyao, Shin'ichi Satoh

    Abstract: Captioning models are typically trained using the cross-entropy loss. However, their performance is evaluated on other metrics designed to better correlate with human assessments. Recently, it has been shown that reinforcement learning (RL) can directly optimize these metrics in tasks such as captioning. However, this is computationally costly and requires specifying a baseline reward at each step… ▽ More

    Submitted 27 December, 2017; originally announced December 2017.

    Comments: 11 pages, 4 figures, 5 tables. Github repo at https://github.com/mynlp/cst_captioning

  45. arXiv:1711.09509  [pdf, other

    cs.CV

    Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation

    Authors: Ryota Hinami, Shin'ichi Satoh

    Abstract: Thanks to the success of object detection technology, we can retrieve objects of the specified classes even from huge image collections. However, the current state-of-the-art object detectors (such as Faster R-CNN) can only handle pre-specified classes. In addition, large amounts of positive and negative visual samples are required for training. In this paper, we address the problem of open-vocabu… ▽ More

    Submitted 4 September, 2018; v1 submitted 26 November, 2017; originally announced November 2017.

    Comments: Accepted to EMNLP 2018

  46. arXiv:1709.09121  [pdf, other

    cs.CV

    Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

    Authors: Ryota Hinami, Tao Mei, Shin'ichi Satoh

    Abstract: This paper addresses the problem of joint detection and recounting of abnormal events in videos. Recounting of abnormal events, i.e., explaining why they are judged to be abnormal, is an unexplored but critical task in video surveillance, because it helps human observers quickly judge if they are false alarms or not. To describe the events in the human-understandable form for event recounting, lea… ▽ More

    Submitted 26 September, 2017; originally announced September 2017.

    Comments: To appear in ICCV 2017

  47. arXiv:1709.09106  [pdf, other

    cs.MM cs.CV

    Region-Based Image Retrieval Revisited

    Authors: Ryota Hinami, Yusuke Matsui, Shin'ichi Satoh

    Abstract: Region-based image retrieval (RBIR) technique is revisited. In early attempts at RBIR in the late 90s, researchers found many ways to specify region-based queries and spatial relationships; however, the way to characterize the regions, such as by using color histograms, were very poor at that time. Here, we revisit RBIR by incorporating semantic specification of objects and intuitive specification… ▽ More

    Submitted 26 September, 2017; originally announced September 2017.

    Comments: To appear in ACM Multimedia 2017 (Oral)

  48. arXiv:1706.02342  [pdf, other

    cs.CV

    Active Learning for Structured Prediction from Partially Labeled Data

    Authors: Mehran Khodabandeh, Zhiwei Deng, Mostafa S. Ibrahim, Shinichi Satoh, Greg Mori

    Abstract: We propose a general purpose active learning algorithm for structured prediction, gathering labeled data for training a model that outputs a set of related labels for an image or video. Active learning starts with a limited initial training set, then iterates querying a user for labels on unlabeled data and retraining the model. We propose a novel algorithm for selecting data for labeling, choosin… ▽ More

    Submitted 9 June, 2017; v1 submitted 7 June, 2017; originally announced June 2017.

  49. Embedding Watermarks into Deep Neural Networks

    Authors: Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, Shin'ichi Satoh

    Abstract: Deep neural networks have recently achieved significant progress. Sharing trained models of these deep neural networks is very important in the rapid progress of researching or developing deep neural network systems. At the same time, it is necessary to protect the rights of shared trained models. To this end, we propose to use a digital watermarking technology to protect intellectual property or… ▽ More

    Submitted 20 April, 2017; v1 submitted 15 January, 2017; originally announced January 2017.

    Journal ref: ICMR '17 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pages 269-277

  50. arXiv:1610.06266  [pdf, other

    cs.CV

    Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

    Authors: Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

    Abstract: In this paper, we propose a stand-alone mobile visual search system based on binary features and the bag-of-visual words framework. The contribution of this study is three-fold: (1) We propose an adaptive substring extraction method that adaptively extracts informative bits from the original binary vector and stores them in the inverted index. These substrings are used to refine visual word-based… ▽ More

    Submitted 19 October, 2016; originally announced October 2016.