-
BusyHands: A Hand-Tool Interaction Database for Assembly Tasks Semantic Segmentation
Authors:
Roy Shilkrot,
Zhi Chai,
Minh Hoai
Abstract:
Visual segmentation has seen tremendous advancement recently with ready solutions for a wide variety of scene types, including human hands and other body parts. However, focus on segmentation of human hands while performing complex tasks, such as manual assembly, is still severely lacking. Segmenting hands from tools, work pieces, background and other body parts is extremely difficult because of s…
▽ More
Visual segmentation has seen tremendous advancement recently with ready solutions for a wide variety of scene types, including human hands and other body parts. However, focus on segmentation of human hands while performing complex tasks, such as manual assembly, is still severely lacking. Segmenting hands from tools, work pieces, background and other body parts is extremely difficult because of self-occlusions and intricate hand grips and poses. In this paper we introduce BusyHands, a large open dataset of pixel-level annotated images of hands performing 13 different tool-based assembly tasks, from both real-world captures and virtual-world renderings. A total of 7906 samples are included in our first-in-kind dataset, with both RGB and depth images as obtained from a Kinect V2 camera and Blender. We evaluate several state-of-the-art semantic segmentation methods on our dataset as a proposed performance benchmark.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.
-
Enhanced Touchable Projector-depth System with Deep Hand Pose Estimation
Authors:
Zhi Chai,
Roy Shilkrot
Abstract:
Touchable projection with structured light range cameras is a prolific medium for large interaction surfaces, affording multiple simultaneous users and simple, cheap setup. However robust touch detection in such projector-depth systems is difficult to achieve due to measurement noise. We propose a novel combination of surface touch detection and a deep network for hand pose estimation, which aids…
▽ More
Touchable projection with structured light range cameras is a prolific medium for large interaction surfaces, affording multiple simultaneous users and simple, cheap setup. However robust touch detection in such projector-depth systems is difficult to achieve due to measurement noise. We propose a novel combination of surface touch detection and a deep network for hand pose estimation, which aids in detecting both on- and above-surface hand gestures, disambiguating multiple touch fingers, as well as recovering fingertip positions in face of noisy input. We present the details of our GPU-accelerated system and an evaluation of its performance, as well as applications such as an enhanced virtual keyboard that utilizes the added features.
△ Less
Submitted 28 December, 2018;
originally announced December 2018.
-
Increase Apparent Public Speaking Fluency By Speech Augmentation
Authors:
Sagnik Das,
Nisha Gandhi,
Tejas Naik,
Roy Shilkrot
Abstract:
Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve thi…
▽ More
Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds 'uh' and 'um', the filler words and awkward long silences. Given any unrehearsed speech we segment and silence the filled pauses and doctor the duration of imposed silence as well as other long pauses ('disfluent') by a predictive model learned using professional speech dataset. Finally, we output a audio stream in which speaker sounds more fluent, confident and practiced compared to the original speech he/she recorded. According to our quantitative evaluation, we significantly increase the fluency of speech by reducing rate of pauses and fillers.
△ Less
Submitted 3 August, 2019; v1 submitted 8 December, 2018;
originally announced December 2018.