Skip to main content

Showing 1–3 of 3 results for author: Bax, I

  1. arXiv:2407.08101  [pdf, other

    cs.CV

    Live Fitness Coaching as a Testbed for Situated Interaction

    Authors: Sunny Panchal, Apratim Bhattacharyya, Guillaume Berger, Antoine Mercier, Cornelius Bohm, Florian Dietrichkeit, Reza Pourreza, Xuanlin Li, Pulkit Madan, Mingu Lee, Mark Todorovich, Ingo Bax, Roland Memisevic

    Abstract: Tasks at the intersection of vision and language have had a profound impact in advancing the capabilities of vision-language models such as dialog-based assistants. However, models trained on existing tasks are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions where an AI model may proactively deliver tim… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: The benchmark and dataset is available here: https://developer.qualcomm.com/software/ai-datasets/qevd

  2. arXiv:2305.08191  [pdf, other

    cs.CV cs.LG

    Is end-to-end learning enough for fitness activity recognition?

    Authors: Antoine Mercier, Guillaume Berger, Sunny Panchal, Florian Letsch, Cornelius Boehm, Nahua Kang, Ingo Bax, Roland Memisevic

    Abstract: End-to-end learning has taken hold of many computer vision tasks, in particular, related to still images, with task-specific optimization yielding very strong performance. Nevertheless, human-centric action recognition is still largely dominated by hand-crafted pipelines, and only individual components are replaced by neural networks that typically operate on individual frames. As a testbed to stu… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

    Comments: 9 pages, 4 figures, 4 tables

  3. arXiv:1706.04261  [pdf, other

    cs.CV

    The "something something" video database for learning and evaluating visual common sense

    Authors: Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzyńska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau, Ingo Bax, Roland Memisevic

    Abstract: Neural networks trained on datasets such as ImageNet have led to major advances in visual object classification. One obstacle that prevents networks from reasoning more deeply about complex scenes and situations, and from integrating visual knowledge with natural language, like humans do, is their lack of common sense knowledge about the physical world. Videos, unlike still images, contain a wealt… ▽ More

    Submitted 15 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.