-
Live Fitness Coaching as a Testbed for Situated Interaction
Authors:
Sunny Panchal,
Apratim Bhattacharyya,
Guillaume Berger,
Antoine Mercier,
Cornelius Bohm,
Florian Dietrichkeit,
Reza Pourreza,
Xuanlin Li,
Pulkit Madan,
Mingu Lee,
Mark Todorovich,
Ingo Bax,
Roland Memisevic
Abstract:
Tasks at the intersection of vision and language have had a profound impact in advancing the capabilities of vision-language models such as dialog-based assistants. However, models trained on existing tasks are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions where an AI model may proactively deliver tim…
▽ More
Tasks at the intersection of vision and language have had a profound impact in advancing the capabilities of vision-language models such as dialog-based assistants. However, models trained on existing tasks are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions where an AI model may proactively deliver timely responses or feedback based on the unfolding situation in real-time are an open challenge. In this work, we present the QEVD benchmark and dataset which explores human-AI interaction in the challenging, yet controlled, real-world domain of fitness coaching - a task which intrinsically requires monitoring live user activity and providing timely feedback. It is the first benchmark that requires assistive vision-language models to recognize complex human actions, identify mistakes grounded in those actions, and provide appropriate feedback. Our experiments reveal the limitations of existing state of the art vision-language models for such asynchronous situated interactions. Motivated by this, we propose a simple end-to-end streaming baseline that can respond asynchronously to human actions with appropriate feedbacks at the appropriate time.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
Authors:
Antoine Mercier,
Ramin Nakhli,
Mahesh Reddy,
Rajeev Yasarla,
Hong Cai,
Fatih Porikli,
Guillaume Berger
Abstract:
Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power…
▽ More
Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding latent triplane. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs when comparing to existing approaches. Furthermore, HexaGen3D demonstrates strong generalization to new objects or compositions.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Efficient neural supersampling on a novel gaming dataset
Authors:
Antoine Mercier,
Ruan Erasmus,
Yashesh Savani,
Manik Dhingra,
Fatih Porikli,
Guillaume Berger
Abstract:
Real-time rendering for video games has become increasingly challenging due to the need for higher resolutions, framerates and photorealism. Supersampling has emerged as an effective solution to address this challenge. Our work introduces a novel neural algorithm for supersampling rendered content that is 4 times more efficient than existing methods while maintaining the same level of accuracy. Ad…
▽ More
Real-time rendering for video games has become increasingly challenging due to the need for higher resolutions, framerates and photorealism. Supersampling has emerged as an effective solution to address this challenge. Our work introduces a novel neural algorithm for supersampling rendered content that is 4 times more efficient than existing methods while maintaining the same level of accuracy. Additionally, we introduce a new dataset which provides auxiliary modalities such as motion vectors and depth generated using graphics rendering features like viewport jittering and mipmap biasing at different resolutions. We believe that this dataset fills a gap in the current dataset landscape and can serve as a valuable resource to help measure progress in the field and advance the state-of-the-art in super-resolution techniques for gaming content.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Is end-to-end learning enough for fitness activity recognition?
Authors:
Antoine Mercier,
Guillaume Berger,
Sunny Panchal,
Florian Letsch,
Cornelius Boehm,
Nahua Kang,
Ingo Bax,
Roland Memisevic
Abstract:
End-to-end learning has taken hold of many computer vision tasks, in particular, related to still images, with task-specific optimization yielding very strong performance. Nevertheless, human-centric action recognition is still largely dominated by hand-crafted pipelines, and only individual components are replaced by neural networks that typically operate on individual frames. As a testbed to stu…
▽ More
End-to-end learning has taken hold of many computer vision tasks, in particular, related to still images, with task-specific optimization yielding very strong performance. Nevertheless, human-centric action recognition is still largely dominated by hand-crafted pipelines, and only individual components are replaced by neural networks that typically operate on individual frames. As a testbed to study the relevance of such pipelines, we present a new fully annotated video dataset of fitness activities. Any recognition capabilities in this domain are almost exclusively a function of human poses and their temporal dynamics, so pose-based solutions should perform well. We show that, with this labelled data, end-to-end learning on raw pixels can compete with state-of-the-art action recognition pipelines based on pose estimation. We also show that end-to-end learning can support temporally fine-grained tasks such as real-time repetition counting.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms
Authors:
Guillaume Berger,
Manik Dhingra,
Antoine Mercier,
Yashesh Savani,
Sunny Panchal,
Fatih Porikli
Abstract:
In this work, we present QuickSRNet, an efficient super-resolution architecture for real-time applications on mobile platforms. Super-resolution clarifies, sharpens, and upscales an image to higher resolution. Applications such as gaming and video playback along with the ever-improving display capabilities of TVs, smartphones, and VR headsets are driving the need for efficient upscaling solutions.…
▽ More
In this work, we present QuickSRNet, an efficient super-resolution architecture for real-time applications on mobile platforms. Super-resolution clarifies, sharpens, and upscales an image to higher resolution. Applications such as gaming and video playback along with the ever-improving display capabilities of TVs, smartphones, and VR headsets are driving the need for efficient upscaling solutions. While existing deep learning-based super-resolution approaches achieve impressive results in terms of visual quality, enabling real-time DL-based super-resolution on mobile devices with compute, thermal, and power constraints is challenging. To address these challenges, we propose QuickSRNet, a simple yet effective architecture that provides better accuracy-to-latency trade-offs than existing neural architectures for single-image super resolution. We present training tricks to speed up existing residual-based super-resolution architectures while maintaining robustness to quantization. Our proposed architecture produces 1080p outputs via 2x upscaling in 2.2 ms on a modern smartphone, making it ideal for high-fps real-time applications.
△ Less
Submitted 14 May, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
Effective Resistance for Pandemics: Mobility Network Sparsification for High-Fidelity Epidemic Simulation
Authors:
Alexander M. Mercier,
Samuel V. Scarpino,
Cristopher Moore
Abstract:
Network science has increasingly become central to the field of epidemiology and our ability to respond to infectious disease threats. However, many networks derived from modern datasets are not just large, but dense, with a high ratio of edges to nodes. This includes human mobility networks where most locations have a large number of links to many other locations. Simulating large-scale epidemics…
▽ More
Network science has increasingly become central to the field of epidemiology and our ability to respond to infectious disease threats. However, many networks derived from modern datasets are not just large, but dense, with a high ratio of edges to nodes. This includes human mobility networks where most locations have a large number of links to many other locations. Simulating large-scale epidemics requires substantial computational resources and in many cases is practically infeasible. One way to reduce the computational cost of simulating epidemics on these networks is sparsification, where a representative subset of edges is selected based on some measure of their importance. We test several sparsification strategies, ranging from naive thresholding to random sampling of edges, on mobility data from the U.S. Following recent work in computer science, we find that the most accurate approach uses the effective resistances of edges, which prioritizes edges that are the only efficient way to travel between their endpoints. The resulting sparse network preserves many aspects of the behavior of an SIR model, including both global quantities, like the epidemic size, and local details of stochastic events, including the probability each node becomes infected and its distribution of arrival times. This holds even when the sparse network preserves fewer than $10\%$ of the edges of the original network. In addition to its practical utility, this method helps illuminate which links of a weighted, undirected network are most important to disease spread.
△ Less
Submitted 26 July, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Contagion-Preserving Network Sparsifiers: Exploring Epidemic Edge Importance Utilizing Effective Resistance
Authors:
Alexander Mercier
Abstract:
Network epidemiology has become a vital tool in understanding the effects of high-degree vertices, geographic and demographic communities, and other inhomogeneities in social structure on the spread of disease. However, many networks derived from modern datasets are quite dense, such as mobility networks where each location has links to a large number of potential destinations. One way to reduce t…
▽ More
Network epidemiology has become a vital tool in understanding the effects of high-degree vertices, geographic and demographic communities, and other inhomogeneities in social structure on the spread of disease. However, many networks derived from modern datasets are quite dense, such as mobility networks where each location has links to a large number of potential destinations. One way to reduce the computational effort of simulating epidemics on these networks is sparsification, where we select a representative subset of edges based on some measure of their importance. Recently an approach was proposed using an algorithm based on the effective resistance of the edges. We explore how effective resistance is correlated with the probability that an edge transmits disease in the SI model. We find that in some cases these two notions of edge importance are well correlated, making effective resistance a computationally efficient proxy for the importance of an edge to epidemic spread. In other cases, the correlation is weaker, and we discuss situations in which effective resistance is not a good proxy for epidemic importance.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Trusted Neural Networks for Safety-Constrained Autonomous Control
Authors:
Shalini Ghosh,
Amaury Mercier,
Dheeraj Pichapati,
Susmit Jha,
Vinod Yegneswaran,
Patrick Lincoln
Abstract:
We propose Trusted Neural Network (TNN) models, which are deep neural network models that satisfy safety constraints critical to the application domain. We investigate different mechanisms for incorporating rule-based knowledge in the form of first-order logic constraints into a TNN model, where rules that encode safety are accompanied by weights indicating their relative importance. This framewor…
▽ More
We propose Trusted Neural Network (TNN) models, which are deep neural network models that satisfy safety constraints critical to the application domain. We investigate different mechanisms for incorporating rule-based knowledge in the form of first-order logic constraints into a TNN model, where rules that encode safety are accompanied by weights indicating their relative importance. This framework allows the TNN model to learn from knowledge available in form of data as well as logical rules. We propose multiple approaches for solving this problem: (a) a multi-headed model structure that allows trade-off between satisfying logical constraints and fitting training data in a unified training framework, and (b) creating a constrained optimization problem and solving it in dual formulation by posing a new constrained loss function and using a proximal gradient descent algorithm. We demonstrate the efficacy of our TNN framework through experiments using the open-source TORCS~\cite{BernhardCAA15} 3D simulator for self-driving cars. Experiments using our first approach of a multi-headed TNN model, on a dataset generated by a customized version of TORCS, show that (1) adding safety constraints to a neural network model results in increased performance and safety, and (2) the improvement increases with increasing importance of the safety constraints. Experiments were also performed using the second approach of proximal algorithm for constrained optimization --- they demonstrate how the proposed method ensures that (1) the overall TNN model satisfies the constraints even when the training data violates some of the constraints, and (2) the proximal gradient descent algorithm on the constrained objective converges faster than the unconstrained version.
△ Less
Submitted 18 May, 2018;
originally announced May 2018.