subscribe to arXiv mailings

PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos

Authors: Steven Abreu, Tiffany D. Do, Karan Ahuja, Eric J. Gonzalez, Lee Payne, Daniel McDuff, Mar Gonzalez-Franco

Abstract: Intelligent assistance involves not only understanding but also action. Existing ego-centric video datasets contain rich annotations of the videos, but not of actions that an intelligent assistant could perform in the moment. To address this gap, we release PARSE-Ego4D, a new set of personal action recommendation annotations for the Ego4D dataset. We take a multi-stage approach to generating and e… ▽ More Intelligent assistance involves not only understanding but also action. Existing ego-centric video datasets contain rich annotations of the videos, but not of actions that an intelligent assistant could perform in the moment. To address this gap, we release PARSE-Ego4D, a new set of personal action recommendation annotations for the Ego4D dataset. We take a multi-stage approach to generating and evaluating these annotations. First, we used a prompt-engineered large language model (LLM) to generate context-aware action suggestions and identified over 18,000 action suggestions. While these synthetic action suggestions are valuable, the inherent limitations of LLMs necessitate human evaluation. To ensure high-quality and user-centered recommendations, we conducted a large-scale human annotation study that provides grounding in human preferences for all of PARSE-Ego4D. We analyze the inter-rater agreement and evaluate subjective preferences of participants. Based on our synthetic dataset and complete human annotations, we propose several new tasks for action suggestions based on ego-centric videos. We encourage novel solutions that improve latency and energy requirements. The annotations in PARSE-Ego4D will support researchers and developers who are working on building action recommendation systems for augmented and virtual reality systems. △ Less

Submitted 14 June, 2024; originally announced July 2024.

arXiv:2406.09579 [pdf, other]

Hovering Over the Key to Text Input in XR

Authors: Mar Gonzalez-Franco, Diar Abdlkarim, Arpit Bhatia, Stuart Macgregor, Jason Alexander Fotso-Puepi, Eric J Gonzalez, Hasti Seifi, Massimiliano Di Luca, Karan Ahuja

Abstract: Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboar… ▽ More Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboards (both physical and virtual) as essential tools. We discuss the unique challenges and opportunities presented by XR, synthesizing key trends from existing solutions. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2404.13274 [pdf, other]

Augmented Object Intelligence: Making the Analog World Interactable with XR-Objects

Authors: Mustafa Doga Dogan, Eric J. Gonzalez, Andrea Colaco, Karan Ahuja, Ruofei Du, Johnny Lee, Mar Gonzalez-Franco, David Kim

Abstract: Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper introduces Augmented Object Intelligence (AOI), a novel XR interaction paradigm designed to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a… ▽ More Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper introduces Augmented Object Intelligence (AOI), a novel XR interaction paradigm designed to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to vast digital functionalities. Our approach utilizes object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in rich and contextually relevant ways. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through a variety of use cases and a user study. △ Less

Submitted 22 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

arXiv:2309.10902 [pdf, other]

doi 10.3389/frvir.2023.1248915

VALID: A perceptually validated Virtual Avatar Library for Inclusion and Diversity

Authors: Tiffany D. Do, Steve Zelenty, Mar Gonzalez-Franco, Ryan P. McMahan

Abstract: As consumer adoption of immersive technologies grows, virtual avatars will play a prominent role in the future of social computing. However, as people begin to interact more frequently through virtual avatars, it is important to ensure that the research community has validated tools to evaluate the effects and consequences of such technologies. We present the first iteration of a new, freely avail… ▽ More As consumer adoption of immersive technologies grows, virtual avatars will play a prominent role in the future of social computing. However, as people begin to interact more frequently through virtual avatars, it is important to ensure that the research community has validated tools to evaluate the effects and consequences of such technologies. We present the first iteration of a new, freely available 3D avatar library called the Virtual Avatar Library for Inclusion and Diversity (VALID), which includes 210 fully rigged avatars with a focus on advancing racial diversity and inclusion. We present a detailed process for creating, iterating, and validating avatars of diversity. Through a large online study (n=132) with participants from 33 countries, we provide statistically validated labels for each avatar's perceived race and gender. Through our validation study, we also advance knowledge pertaining to the perception of an avatar's race. In particular, we found that avatars of some races were more accurately identified by participants of the same race. △ Less

Submitted 30 October, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Journal ref: Frontiers in Virtual Reality 4 (2023)

arXiv:2308.12469 [pdf, other]

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Authors: Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco

Abstract: Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations i… ▽ More Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper, we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. The project page is at \url{https://sites.google.com/view/diffseg/home}. △ Less

Submitted 2 April, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to CVPR2024

Journal ref: Conference on Computer Vision and Pattern Recognition, 2024

arXiv:2108.12390 [pdf]

doi 10.1145/3510463

Two-In-One: A Design Space for Mapping Unimanual Input into Bimanual Interactions in VR for Users with Limited Movement

Authors: Momona Yamagami, Sasa Junuzovic, Mar Gonzalez-Franco, Eyal Ofek, Edward Cutrell, John R. Porter, Andrew D. Wilson, Martez E. Mott

Abstract: Virtual Reality (VR) applications often require users to perform actions with two hands when performing tasks and interacting with objects in virtual environments. Although bimanual interactions in VR can resemble real-world interactions -- thus increasing realism and improving immersion -- they can also pose significant accessibility challenges to people with limited mobility, such as for people… ▽ More Virtual Reality (VR) applications often require users to perform actions with two hands when performing tasks and interacting with objects in virtual environments. Although bimanual interactions in VR can resemble real-world interactions -- thus increasing realism and improving immersion -- they can also pose significant accessibility challenges to people with limited mobility, such as for people who have full use of only one hand. An opportunity exists to create accessible techniques that take advantage of users' abilities, but designers currently lack structured tools to consider alternative approaches. To begin filling this gap, we propose Two-in-One, a design space that facilitates the creation of accessible methods for bimanual interactions in VR from unimanual input. Our design space comprises two dimensions, bimanual interactions and computer assistance, and we provide a detailed examination of issues to consider when creating new unimanual input techniques that map to bimanual interactions in VR. We used our design space to create three interaction techniques that we subsequently implemented for a subset of bimanual interactions and received user feedback through a video elicitation study with 17 people with limited mobility. Our findings explore complex tradeoffs associated with autonomy and agency and highlight the need for additional settings and methods to make VR accessible to people with limited mobility. △ Less

Submitted 20 April, 2024; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 26 pages, 3 figures, 6 tables

arXiv:2108.10829 [pdf, other]

doi 10.1145/3472749.3474821

HapticBots: Distributed Encountered-type Haptics for VR with Multiple Shape-changing Mobile Robots

Authors: Ryo Suzuki, Eyal Ofek, Mike Sinclair, Daneil Leithinger, Mar Gonzalez-Franco

Abstract: HapticBots introduces a novel encountered-type haptic approach for Virtual Reality (VR) based on multiple tabletop-size shape-changing robots. These robots move on a tabletop and change their height and orientation to haptically render various surfaces and objects on-demand. Compared to previous encountered-type haptic approaches like shape displays or robotic arms, our proposed approach has an ad… ▽ More HapticBots introduces a novel encountered-type haptic approach for Virtual Reality (VR) based on multiple tabletop-size shape-changing robots. These robots move on a tabletop and change their height and orientation to haptically render various surfaces and objects on-demand. Compared to previous encountered-type haptic approaches like shape displays or robotic arms, our proposed approach has an advantage in deployability, scalability, and generalizability -- these robots can be easily deployed due to their compact form factor. They can support multiple concurrent touch points in a large area thanks to the distributed nature of the robots. We propose and evaluate a novel set of interactions enabled by these robots which include: 1) rendering haptics for VR objects by providing just-in-time touch-points on the user's hand, 2) simulating continuous surfaces with the concurrent height and position change, and 3) enabling the user to pick up and move VR objects through graspable proxy objects. Finally, we demonstrate HapticBots with various applications, including remote collaboration, education and training, design and 3D modeling, and gaming and entertainment. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: UIST 2021

arXiv:1602.01944 [pdf]

Immersive Augmented Reality Training for Complex Manufacturing Scenarios

Authors: Mar Gonzalez-Franco, Julio Cermeron, Katie Li, Rodrigo Pizarro, Jacob Thorn, Windo Hutabarat, Ashutosh Tiwari, Pablo Bermell-Garcia

Abstract: In the complex manufacturing sector a considerable amount of resources are focused on developing new skills and training workers. In that context, increasing the effectiveness of those processes and reducing the investment required is an outstanding issue. In this paper we present an experiment that shows how modern Human Computer Interaction (HCI) metaphors such as collaborative mixed-reality can… ▽ More In the complex manufacturing sector a considerable amount of resources are focused on developing new skills and training workers. In that context, increasing the effectiveness of those processes and reducing the investment required is an outstanding issue. In this paper we present an experiment that shows how modern Human Computer Interaction (HCI) metaphors such as collaborative mixed-reality can be used to transmit procedural knowledge and could eventually replace other forms of face-to-face training. We implement a real-time Immersive Augmented Reality (IAR) setup with see-through cameras that allows for collaborative interactions that can simulate conventional forms of training. The obtained results indicate that people who took the IAR training achieved the same performance than people in the conventional face-to-face training condition. These results, their implications for future training and the use of HCI paradigms in this context are discussed in this paper. △ Less

Submitted 16 November, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

Comments: 6 pages, 4 figures, Video: https://youtu.be/xCbvmGkL6Y4

arXiv:1602.00238 [pdf]

doi 10.1145/2964284.2967200

Assessing 3D scan quality in Virtual Reality through paired-comparisons psychophysics test

Authors: Jacob Thorn, Rodrigo Pizarro, Bernhard Spanlang, Pablo Bermell-Garcia, Mar Gonzalez-Franco

Abstract: Consumer 3D scanners and depth cameras are increasingly being used to generate content and avatars for Virtual Reality (VR) environments and avoid the inconveniences of hand modeling; however, it is sometimes difficult to evaluate quantitatively the mesh quality at which 3D scans should be exported, and whether the object perception might be affected by its shading. We propose using a paired-compa… ▽ More Consumer 3D scanners and depth cameras are increasingly being used to generate content and avatars for Virtual Reality (VR) environments and avoid the inconveniences of hand modeling; however, it is sometimes difficult to evaluate quantitatively the mesh quality at which 3D scans should be exported, and whether the object perception might be affected by its shading. We propose using a paired-comparisons test based on psychophysics of perception to do that evaluation. As psychophysics is not subject to opinion, skill level, mental state, or economic situation it can be considered a quantitative way to measure how people perceive the mesh quality. In particular, we propose using the psychophysical measure for the comparison of four different levels of mesh quality (1K, 5K, 10K and 20K triangles). We present two studies within subjects: in one we investigate the quality perception variations of seeing an object in a regular screen monitor against an stereoscopic Head Mounted Display (HMD); while in the second experiment we aim at detecting the effects of shading into quality perception. At each iteration of the pair-test comparisons participants pick the mesh that they think had higher quality; by the end of the experiment we compile a preference matrix. The matrix evidences the correlation between real quality and assessed quality. Regarding the shading mode, we find an interaction with quality and shading when the model has high definition. Furthermore, we assess the subjective realism of the most/least preferred scans using an Immersive Augmented Reality (IAR) video-see-through setup. Results show higher levels of realism were perceived through the HMD than when using a monitor, although the quality was similarly perceived in both systems. △ Less

Submitted 30 January, 2017; v1 submitted 31 January, 2016; originally announced February 2016.

Comments: 9 pages, 10 figures, video: https://youtu.be/vC3Tx07szXU

Journal ref: Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM, New York, NY, USA, 147-151

Showing 1–9 of 9 results for author: Gonzalez-Franco, M