Skip to main content

Showing 1–50 of 69 results for author: Asano, Y

  1. Stable Tool-Use with Flexible Musculoskeletal Hands by Learning the Predictive Model of Sensor State Transition

    Authors: Kento Kawaharazuka, Kei Tsuzuki, Moritaka Onitsuka, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: The flexible under-actuated musculoskeletal hand is superior in its adaptability and impact resistance. On the other hand, since the relationship between sensors and actuators cannot be uniquely determined, almost all its controls are based on feedforward controls. When grasping and using a tool, the contact state of the hand gradually changes due to the inertia of the tool or impact of action, an… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at ICRA2020

  2. Musculoskeletal AutoEncoder: A Unified Online Acquisition Method of Intersensory Networks for State Estimation, Control, and Simulation of Musculoskeletal Humanoids

    Authors: Kento Kawaharazuka, Kei Tsuzuki, Moritaka Onitsuka, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: While the musculoskeletal humanoid has various biomimetic benefits, the modeling of its complex structure is difficult, and many learning-based systems have been developed so far. There are various methods, such as control methods using acquired relationships between joints and muscles represented by a data table or neural network, and state estimation methods using Extended Kalman Filter or table… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at IEEE Robotics and Automation Letters

  3. arXiv:2406.12658  [pdf, other

    cs.CV cs.LG

    Federated Learning with a Single Shared Image

    Authors: Sunny Soni, Aaqib Saeed, Yuki M. Asano

    Abstract: Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the transfer of knowledge gained from each client model with the server. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset on which pre… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 8 Pages, 3 Figures, Appendix 4 Pages, CVPRW 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7782-7790

  4. Toward Autonomous Driving by Musculoskeletal Humanoids: A Study of Developed Hardware and Learning-Based Software

    Authors: Kento Kawaharazuka, Kei Tsuzuki, Yuya Koga, Yusuke Omura, Tasuku Makabe, Koki Shinjo, Moritaka Onitsuka, Yuya Nagamatsu, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: This paper summarizes an autonomous driving project by musculoskeletal humanoids. The musculoskeletal humanoid, which mimics the human body in detail, has redundant sensors and a flexible body structure. These characteristics are suitable for motions with complex environmental contact, and the robot is expected to sit down on the car seat, step on the acceleration and brake pedals, and operate the… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted at IEEE Robotics and Automation Magazine

  5. arXiv:2405.17423  [pdf, other

    cs.CV cs.CL

    Privacy-Aware Visual Language Models

    Authors: Laurens Samson, Nimrod Barazani, Sennay Ghebreab, Yuki M. Asano

    Abstract: This paper aims to advance our understanding of how Visual Language Models (VLMs) handle privacy-sensitive information, a crucial concern as these technologies become integral to everyday life. To this end, we introduce a new benchmark PrivBench, which contains images from 8 sensitive categories such as passports, or fingerprints. We evaluate 10 state-of-the-art VLMs on this benchmark and observe… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: preprint

  6. arXiv:2405.14862  [pdf, other

    cs.CL

    Bitune: Bidirectional Instruction-Tuning

    Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

    Abstract: We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  7. arXiv:2405.11092  [pdf, other

    cs.HC cs.RO

    What metrics of participation balance predict outcomes of collaborative learning with a robot?

    Authors: Yuya Asano, Diane Litman, Quentin King-Shepard, Tristan Maidment, Tyree Langley, Teresa Davison, Timothy Nokes-Malach, Adriana Kovashka, Erin Walker

    Abstract: One of the keys to the success of collaborative learning is balanced participation by all learners, but this does not always happen naturally. Pedagogical robots have the potential to facilitate balance. However, it remains unclear what participation balance robots should aim at; various metrics have been proposed, but it is still an open question whether we should balance human participation in h… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: To appear in Seventeenth International Conference on Educational Data Mining (EDM 2024)

  8. arXiv:2404.17202  [pdf, other

    cs.CV

    Self-supervised visual learning in the low-data regime: a comparative evaluation

    Authors: Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Yuki M. Asano, Efstratios Gavves, Georgios Th. Papadopoulos

    Abstract: Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs), enabling unsupervised pretraining on a `pretext task' that does not require ground-truth labels/annotation. This allows efficient representation learning from massive amounts of unlabeled training data, which in turn leads to increased accuracy in a `downstream task' by exploi… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  9. A Method of Joint Angle Estimation Using Only Relative Changes in Muscle Lengths for Tendon-driven Humanoids with Complex Musculoskeletal Structures

    Authors: Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Yuki Asano, Kei Okada, Masayuki Inaba

    Abstract: Tendon-driven musculoskeletal humanoids typically have complex structures similar to those of human beings, such as ball joints and the scapula, in which encoders cannot be installed. Therefore, joint angles cannot be directly obtained and need to be estimated using the changes in muscle lengths. In previous studies, methods using table-search and extended kalman filter have been developed. These… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted at Humanoids2018

  10. TWIMP: Two-Wheel Inverted Musculoskeletal Pendulum as a Learning Control Platform in the Real World with Environmental Physical Contact

    Authors: Kento Kawaharazuka, Tasuku Makabe, Shogo Makino, Kei Tsuzuki, Yuya Nagamatsu, Yuki Asano, Takuma Shirai, Fumihito Sugai, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: By the recent spread of machine learning in the robotics field, a humanoid that can act, perceive, and learn in the real world through contact with the environment needs to be developed. In this study, as one of the choices, we propose a novel humanoid TWIMP, which combines a human mimetic musculoskeletal upper limb with a two-wheel inverted pendulum. By combining the benefit of a musculoskeletal… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted at Humanoids2018

  11. arXiv:2404.13381  [pdf, other

    cs.LG cs.CR cs.MA q-bio.PE

    DNA: Differentially private Neural Augmentation for contact tracing

    Authors: Rob Romijnders, Christos Louizos, Yuki M. Asano, Max Welling

    Abstract: The COVID19 pandemic had enormous economic and societal consequences. Contact tracing is an effective way to reduce infection rates by detecting potential virus carriers early. However, this was not generally adopted in the recent pandemic, and privacy concerns are cited as the most important reason. We substantially improve the privacy guarantees of the current state of the art in decentralized c… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Privacy Regulation and Protection in Machine Learning Workshop at ICLR 2024

  12. Online Learning of Joint-Muscle Mapping Using Vision in Tendon-driven Musculoskeletal Humanoids

    Authors: Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Yuki Asano, Kei Okada, Masayuki Inaba

    Abstract: The body structures of tendon-driven musculoskeletal humanoids are complex, and accurate modeling is difficult, because they are made by imitating the body structures of human beings. For this reason, we have not been able to move them accurately like ordinary humanoids driven by actuators in each axis, and large internal muscle tension and slack of tendon wires have emerged by the model error bet… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at IEEE Robotics and Automation Letters, 2018

  13. Long-time Self-body Image Acquisition and its Application to the Control of Musculoskeletal Structures

    Authors: Kento Kawaharazuka, Kei Tsuzuki, Shogo Makino, Moritaka Onitsuka, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: The tendon-driven musculoskeletal humanoid has many benefits that human beings have, but the modeling of its complex muscle and bone structures is difficult and conventional model-based controls cannot realize intended movements. Therefore, a learning control mechanism that acquires nonlinear relationships between joint angles, muscle tensions, and muscle lengths from the actual robot is necessary… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at IEEE Robotics and Automation Letters, 2019

  14. Online Self-body Image Acquisition Considering Changes in Muscle Routes Caused by Softness of Body Tissue for Tendon-driven Musculoskeletal Humanoids

    Authors: Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Ayaka Fujii, Yuki Asano, Kei Okada, Masayuki Inaba

    Abstract: Tendon-driven musculoskeletal humanoids have many benefits in terms of the flexible spine, multiple degrees of freedom, and variable stiffness. At the same time, because of its body complexity, there are problems in controllability. First, due to the large difference between the actual robot and its geometric model, it cannot move as intended and large internal muscle tension may emerge. Second, m… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at IROS2018

  15. Development of Musculoskeletal Legs with Planar Interskeletal Structures to Realize Human Comparable Moving Function

    Authors: Moritaka Onitsuka, Manabu Nishiura, Kento Kawaharazuka, Kei Tsuzuki, Yasunori Toshimitsu, Yusuke Omura, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: Musculoskeletal humanoids have been developed by imitating humans and expected to perform natural and dynamic motions as well as humans. To achieve desired motions stably in current musculoskeletal humanoids is not easy because they cannot maintain the sufficient moment arm of muscles in various postures. In this research, we discuss planar structures that spread across joint structures such as li… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: accepted at Humanoids2020

  16. High-Power, Flexible, Robust Hand: Development of Musculoskeletal Hand Using Machined Springs and Realization of Self-Weight Supporting Motion with Humanoid

    Authors: Shogo Makino, Kento Kawaharazuka, Masaya Kawamura, Yuki Asano, Kei Okada, Masayuki Inaba

    Abstract: Human can not only support their body during standing or walking, but also support them by hand, so that they can dangle a bar and others. But most humanoid robots support their body only in the foot and they use their hand just to manipulate objects because their hands are too weak to support their body. Strong hands are supposed to enable humanoid robots to act in much broader scene. Therefore,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: accepted at IROS2017

  17. Five-fingered Hand with Wide Range of Thumb Using Combination of Machined Springs and Variable Stiffness Joints

    Authors: Shogo Makino, Kento Kawaharazuka, Ayaka Fujii, Masaya Kawamura, Tasuku Makabe, Moritaka Onitsuka, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

    Abstract: Human hands can not only grasp objects of various shape and size and manipulate them in hands but also exert such a large gripping force that they can support the body in the situations such as dangling a bar and climbing a ladder. On the other hand, it is difficult for most robot hands to manage both. Therefore in this paper we developed the hand which can grasp various objects and exert large gr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: accepted at IROS2018

  18. arXiv:2402.16844  [pdf, other

    cs.LG cs.AI cs.CL

    Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

    Authors: Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi

    Abstract: Large language models (LLMs) have become ubiquitous in practice and are widely used for generation tasks such as translation, summarization and instruction following. However, their enormous size and reliance on autoregressive decoding increase deployment costs and complicate their use in latency-critical applications. In this work, we propose a hybrid approach that combines language models of dif… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  19. arXiv:2402.14957  [pdf, other

    cs.CV cs.LG

    The Common Stability Mechanism behind most Self-Supervised Learning Approaches

    Authors: Abhishek Jha, Matthew B. Blaschko, Yuki M. Asano, Tinne Tuytelaars

    Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniqu… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Additional visualizations (.gif): https://github.com/abskjha/CenterVectorSSL

  20. arXiv:2402.08657  [pdf, other

    cs.CV

    PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

    Authors: Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions without explicit spatial grounding. While it is possible to construct custom,… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  21. arXiv:2401.11485  [pdf, other

    cs.CV cs.GR eess.IV

    ColorVideoVDP: A visual difference predictor for image, video and display distortions

    Authors: Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro

    Abstract: ColorVideoVDP is a video and image quality metric that models spatial and temporal aspects of vision, for both luminance and color. The metric is built on novel psychophysical models of chromatic spatiotemporal contrast sensitivity and cross-channel contrast masking. It accounts for the viewing conditions, geometric, and photometric characteristics of the display. It was trained to predict common… ▽ More

    Submitted 2 July, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: 28 pages

    Journal ref: SIGGRAPH 2024 Technical Papers, Article 129

  22. arXiv:2401.05735  [pdf, other

    cs.CV cs.LG

    Object-Centric Diffusion for Efficient Video Editing

    Authors: Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

    Abstract: Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we c… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  23. arXiv:2312.17244  [pdf, other

    cs.LG cs.CL

    The LLM Surgeon

    Authors: Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort

    Abstract: State-of-the-art language models are becoming increasingly large in an effort to achieve the highest performance on large corpora of available textual data. However, the sheer size of the Transformer architectures makes it difficult to deploy models within computational, environmental or device-specific constraints. We explore data-driven compression of existing pretrained models as an alternative… ▽ More

    Submitted 20 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  24. arXiv:2312.11581  [pdf, other

    cs.CR cs.AI cs.LG

    Protect Your Score: Contact Tracing With Differential Privacy Guarantees

    Authors: Rob Romijnders, Christos Louizos, Yuki M. Asano, Max Welling

    Abstract: The pandemic in 2020 and 2021 had enormous economic and societal consequences, and studies show that contact tracing algorithms can be key in the early containment of the virus. While large strides have been made towards more effective contact tracing algorithms, we argue that privacy concerns currently hold deployment back. The essence of a contact tracing algorithm constitutes the communication… ▽ More

    Submitted 15 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  25. arXiv:2312.08895  [pdf, other

    cs.CV

    Motion Flow Matching for Human Motion Synthesis and Editing

    Authors: Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

    Abstract: Human motion synthesis is a fundamental task in computer animation. Recent methods based on diffusion models or GPT structure demonstrate commendable performance but exhibit drawbacks in terms of slow sampling speeds and error accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effective… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: WIP

  26. arXiv:2312.08892  [pdf, other

    cs.CV

    VaLID: Variable-Length Input Diffusion for Novel View Synthesis

    Authors: Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, Amirhossein Habibian

    Abstract: Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision. As this task is heavily under-constrained, some recent work, like Zero123, tries to solve this problem with generative modeling, specifically using pre-trained diffusion models. Although this strategy generalizes well… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: paper and supplementary material

  27. arXiv:2312.08825  [pdf, other

    cs.CV

    Guided Diffusion from Self-Supervised Diffusion Features

    Authors: Vincent Tao Hu, Yunlu Chen, Mathilde Caron, Yuki M. Asano, Cees G. M. Snoek, Bjorn Ommer

    Abstract: Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or classifier pretraining. That is why guidance was harnessed from self-supervised learning backbones, like DINO. However, recent studies have revealed that the feature representation derived from diffusion model itself is discriminative for numerous downstream tasks a… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Work In Progress

  28. arXiv:2312.04539  [pdf, other

    cs.CV

    Auto-Vocabulary Semantic Segmentation

    Authors: Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald

    Abstract: Open-ended image understanding tasks gained significant attention from the research community, particularly with the emergence of Vision-Language Models. Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, they operate without the need for training or fine-tuning. However, OVS methods typically require… ▽ More

    Submitted 20 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  29. arXiv:2311.17299  [pdf, other

    cs.LG cs.CV cs.DC

    Federated Fine-Tuning of Foundation Models via Probabilistic Masking

    Authors: Vasileios Tsouvalas, Yuki Asano, Aaqib Saeed

    Abstract: Foundation Models (FMs) have revolutionized machine learning with their adaptability and high performance across tasks; yet, their integration into Federated Learning (FL) is challenging due to substantial communication overhead from their extensive parameterization. Current communication-efficient FL strategies, such as gradient compression, reduce bitrates to around $1$ bit-per-parameter (bpp).… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 19 pages, 9 figures

  30. arXiv:2310.11454  [pdf, other

    cs.CL

    VeRA: Vector-based Random Matrix Adaptation

    Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

    Abstract: Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameter… ▽ More

    Submitted 16 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024, website: https://dkopi.github.io/vera

  31. arXiv:2310.08584  [pdf, other

    cs.CV

    Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

    Authors: Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis

    Abstract: Self-supervised learning has unlocked the potential of scaling up pretraining to billions of images, since annotation is unnecessary. But are we making the best use of data? How more economical can we be? In this work, we attempt to answer this question by making two contributions. First, we investigate first-person videos and introduce a "Walking Tours" dataset. These videos are high-resolution,… ▽ More

    Submitted 23 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024 (Best paper honorable mention). Project Page: https://shashankvkt.github.io/dora

  32. arXiv:2310.00500  [pdf, other

    cs.CV

    Self-Supervised Open-Ended Classification with Small Visual Language Models

    Authors: Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, Marcel Worring, Yuki M. Asano

    Abstract: We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models. Our approach imitates image captions in a self-supervised way based on clustering a large pool of images followed by assigning semantically-unrelated names to clusters. By doing so, we construct a training signal consisting of inter… ▽ More

    Submitted 6 December, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  33. arXiv:2308.11796  [pdf, other

    cs.CV

    Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

    Authors: Mohammadreza Salehi, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consisten… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  34. arXiv:2308.07350  [pdf, other

    cs.LG cs.AI

    Efficient Neural PDE-Solvers using Quantization Aware Training

    Authors: Winfried van den Dool, Tijmen Blankevoort, Max Welling, Yuki M. Asano

    Abstract: In the past years, the application of neural networks as an alternative to classical numerical methods to solve Partial Differential Equations has emerged as a potential paradigm shift in this century-old mathematical field. However, in terms of practical applicability, computational cost remains a substantial bottleneck. Classical approaches try to mitigate this challenge by limiting the spatial… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted at the ICCV 2023 Workshop on Resource Efficient Deep Learning for Computer Vision

  35. arXiv:2307.08727  [pdf, other

    cs.CV

    Learning to Count without Annotations

    Authors: Lukas Knobel, Tengda Han, Yuki M. Asano

    Abstract: While recent supervised methods for reference-based object counting continue to improve the performance on benchmark datasets, they have to rely on small datasets due to the cost associated with manually annotating dozens of objects in images. We propose UnCounTR, a model that can learn this task without requiring any manual annotations. To this end, we construct "Self-Collages", images with vario… ▽ More

    Submitted 29 March, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted at CVPR'24. Code available at https://github.com/lukasknobel/SelfCollages

  36. arXiv:2306.09643  [pdf, other

    cs.LG cs.AI stat.ME

    BISCUIT: Causal Representation Learning from Binary Interactions

    Authors: Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

    Abstract: Identifying the causal variables of an environment and how to intervene on them is of core value in applications such as robotics and embodied AI. While an agent can commonly interact with the environment and may implicitly perturb the behavior of some of these causal variables, often the targets it affects remain unknown. In this paper, we show that causal variables can still be identified for ma… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Published in: Uncertainty in Artificial Intelligence (UAI 2023). Project page: https://phlippe.github.io/BISCUIT/

  37. arXiv:2306.07302  [pdf, other

    cs.HC cs.AI cs.CL

    Impact of Experiencing Misrecognition by Teachable Agents on Learning and Rapport

    Authors: Yuya Asano, Diane Litman, Mingzhi Yu, Nikki Lobczowski, Timothy Nokes-Malach, Adriana Kovashka, Erin Walker

    Abstract: While speech-enabled teachable agents have some advantages over typing-based ones, they are vulnerable to errors stemming from misrecognition by automatic speech recognition (ASR). These errors may propagate, resulting in unexpected changes in the flow of conversation. We analyzed how such changes are linked with learning gains and learners' rapport with the agents. Our results show they are not r… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted to AIED 2023

  38. arXiv:2304.00961  [pdf, other

    cs.CV

    Self-Ordering Point Clouds

    Authors: Pengwan Yang, Cees G. M. Snoek, Yuki M. Asano

    Abstract: In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. Only a few works have tried to address this challenging vision problem, all with the help of hard to obtain point and cloud labels. Different from these works, we introduce the task of point-wise ordering in 3D point clouds through self-supervision, which we call sel… ▽ More

    Submitted 10 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

  39. arXiv:2302.00353  [pdf, other

    cs.LG cs.CV

    Towards Label-Efficient Incremental Learning: A Survey

    Authors: Mert Kilickaya, Joost van de Weijer, Yuki M. Asano

    Abstract: The current dominant paradigm when building a machine learning model is to iterate over a dataset over and over until convergence. Such an approach is non-incremental, as it assumes access to all images of all categories at once. However, for many applications, non-incremental learning is unrealistic. To that end, researchers study incremental learning, where a learner is required to adapt to an i… ▽ More

    Submitted 11 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

  40. arXiv:2301.02240  [pdf, other

    cs.CV

    Skip-Attention: Improving Vision Transformers by Paying Less Attention

    Authors: Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian

    Abstract: This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to ap… ▽ More

    Submitted 17 January, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  41. arXiv:2210.10820  [pdf, other

    cs.CV cs.CL cs.IR cs.LG

    VTC: Improving Video-Text Retrieval with User Comments

    Authors: Laura Hanu, James Thewlis, Yuki M. Asano, Christian Rupprecht

    Abstract: Multi-modal retrieval is an important problem for many applications, such as recommendation and search. Current benchmarks and even datasets are often manually constructed and consist of mostly clean samples where all modalities are well-correlated with the content. Thus, current video-text retrieval literature largely focuses on video titles or audio transcripts, while ignoring user comments, sin… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted paper at the European Conference on Computer Vision (ECCV) 2022

  42. arXiv:2210.06466  [pdf, other

    cs.CV

    Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers

    Authors: Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano

    Abstract: With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains. However, with model parameter counts reaching the billions, classical finetuning approaches are becoming increasingly limiting and even unfeasible when models become hosted as inference APIs, as in NLP. To this end, vi… ▽ More

    Submitted 19 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Tech report, 12 pages. Code: https://github.com/jochemloedeman/PGN

  43. arXiv:2210.06462  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Diffusion Models

    Authors: Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibili… ▽ More

    Submitted 27 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: CVPR 2023

  44. arXiv:2209.11842  [pdf, other

    cs.CL cs.HC cs.RO

    Comparison of Lexical Alignment with a Teachable Robot in Human-Robot and Human-Human-Robot Interactions

    Authors: Yuya Asano, Diane Litman, Mingzhi Yu, Nikki Lobczowski, Timothy Nokes-Malach, Adriana Kovashka, Erin Walker

    Abstract: Speakers build rapport in the process of aligning conversational behaviors with each other. Rapport engendered with a teachable agent while instructing domain material has been shown to promote learning. Past work on lexical alignment in the field of education suffers from limitations in both the measures used to quantify alignment and the types of interactions in which alignment with agents has b… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: To be published in SIGDial 2022

  45. arXiv:2209.03268  [pdf, other

    cs.CV

    Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing

    Authors: Iro Laina, Yuki M. Asano, Andrea Vedaldi

    Abstract: Self-supervised visual representation learning has recently attracted significant research interest. While a common way to evaluate self-supervised representations is through transfer to various downstream tasks, we instead investigate the problem of measuring their interpretability, i.e. understanding the semantics encoded in raw representations. We formulate the latter as estimating the mutual i… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Published at ICLR 2022. Appendix included, 26 pages

  46. arXiv:2206.06169  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems

    Authors: Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

    Abstract: Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measur… ▽ More

    Submitted 7 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Published at International Conference on Learning Representations (ICLR), 2023

  47. arXiv:2205.11374  [pdf, other

    cs.CL cs.AI

    Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

    Authors: Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk

    Abstract: The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted for the 4th Workshop on Gender Bias in Natural Language Processing at NAACL 2022

  48. arXiv:2204.13101  [pdf, other

    cs.CV

    Self-Supervised Learning of Object Parts for Semantic Segmentation

    Authors: Adrian Ziegler, Yuki M. Asano

    Abstract: Progress in self-supervised learning has brought strong general image representation learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks such as unsupervised image segmentation have not benefited from this trend as they require spatially-diverse representations. However, learning dense representations is challenging, as in the unsupervised context it is not… ▽ More

    Submitted 20 June, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR 2022

  49. arXiv:2204.08874  [pdf, other

    cs.CV

    Less than Few: Self-Shot Video Instance Segmentation

    Authors: Pengwan Yang, Yuki M. Asano, Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this paper is to bypass the need for labelled examples in few-shot video understanding at run time. While proven effective, in many practical video settings even labelling a few examples appears unrealistic. This is especially true as the level of details in spatio-temporal video understanding and with it, the complexity of annotations continues to increase. Rather than performing few-… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 25 pages, 5 figures, 13 tables

  50. arXiv:2202.03169  [pdf, other

    cs.LG cs.AI stat.ME

    CITRIS: Causal Identifiability from Temporal Intervened Sequences

    Authors: Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

    Abstract: Understanding the latent causal factors of a dynamical system from visual observations is considered a crucial step towards agents reasoning in complex environments. In this paper, we propose CITRIS, a variational autoencoder framework that learns causal representations from temporal sequences of images in which underlying causal factors have possibly been intervened upon. In contrast to the recen… ▽ More

    Submitted 15 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2022