Skip to main content

Showing 1–45 of 45 results for author: Komura, T

  1. arXiv:2406.17988  [pdf, other

    cs.CV

    DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

    Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu

    Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 3 tables

  2. arXiv:2405.13729  [pdf, other

    cs.LG cs.AI cs.CV cs.GR

    ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

    Authors: Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang

    Abstract: In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2405.11690  [pdf, other

    cs.CV

    InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios

    Authors: Yinghao Huang, Leo Ho, Dafei Qin, Mingyi Shi, Taku Komura

    Abstract: We address the problem of accurate capture and expressive modelling of interactive behaviors happening between two persons in daily scenarios. Different from previous works which either only consider one person or focus on conversational gestures, we propose to simultaneously model the activities of two persons, and target objective-driven, dynamic, and coherent interactions which often span long… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally to this work

  4. arXiv:2404.15661  [pdf, other

    cs.GR cs.CG cs.CV

    CWF: Consolidating Weak Features in High-quality Mesh Simplification

    Authors: Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wenping Wang, Changhe Tu

    Abstract: In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, 22 figures

  5. arXiv:2404.15121  [pdf, other

    cs.GR cs.AI cs.CV

    Taming Diffusion Probabilistic Models for Character Control

    Authors: Rui Chen, Mingyi Shi, Shaoli Huang, Ping Tan, Taku Komura, Xuelin Chen

    Abstract: We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's his… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGGRAPH 2024 (Conference Track). Project page and source codes: https://aiganimation.github.io/CAMDM/

  6. arXiv:2403.03641  [pdf, other

    cs.GR

    Online Photon Guiding with 3D Gaussians for Caustics Rendering

    Authors: Jiawei Huang, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura

    Abstract: In production rendering systems, caustics are typically rendered via photon mapping and gathering, a process often hindered by insufficient photon density. In this paper, we propose a novel photon guiding method to improve the photon density and overall quality for caustic rendering. The key insight of our approach is the application of a global 3D Gaussian mixture model, used in conjunction with… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  7. arXiv:2401.12946  [pdf, other

    cs.CV cs.CG cs.GR

    Coverage Axis++: Efficient Inner Point Selection for 3D Shape Skeletonization

    Authors: Zimeng Wang, Zhiyang Dou, Rui Xu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Shiqing Xin, Taku Komura, Xiaoming Yuan, Wenping Wang

    Abstract: We introduce Coverage Axis++, a novel and efficient approach to 3D shape skeletonization. The current state-of-the-art approaches for this task often rely on the watertightness of the input or suffer from substantial computational costs, thereby limiting their practicality. To address this challenge, Coverage Axis++ proposes a heuristic algorithm to select skeletal points, offering a high-accuracy… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: SGP2024. Project Page: https://frank-zy-dou.github.io/projects/CoverageAxis++/index.html

  8. arXiv:2401.01391  [pdf, other

    cs.CV cs.GR cs.LG

    On Optimal Sampling for Learning SDF Using MLPs Equipped with Positional Encoding

    Authors: Guying Lin, Lei Yang, Yuan Liu, Congyi Zhang, Junhui Hou, Xiaogang Jin, Taku Komura, John Keyser, Wenping Wang

    Abstract: Neural implicit fields, such as the neural signed distance field (SDF) of a shape, have emerged as a powerful representation for many applications, e.g., encoding a 3D shape and performing collision detection. Typically, implicit fields are encoded by Multi-layer Perceptrons (MLP) with positional encoding (PE) to capture high-frequency geometric details. However, a notable side effect of such PE-e… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  9. arXiv:2312.04036  [pdf, other

    cs.CV cs.LG

    DiffusionPhase: Motion Diffusion in Frequency Domain

    Authors: Weilin Wan, Yiming Huang, Shutong Wu, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu

    Abstract: In this study, we introduce a learning-based method for generating high-quality human motion sequences from text descriptions (e.g., ``A person walks forward"). Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences, due to limited text-to-motion datasets and the pose representations used that often lack expressiveness or compactne… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  10. arXiv:2312.02256  [pdf, other

    cs.CV cs.AI cs.GR

    EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

    Authors: Wenyang Zhou, Zhiyang Dou, Zeyu Cao, Zhouyingcheng Liao, Jingbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wenping Wang, Lingjie Liu

    Abstract: We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a la… ▽ More

    Submitted 14 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project Page: https://frank-zy-dou.github.io/projects/EMDM/index.html

  11. arXiv:2311.17510  [pdf, other

    cs.CV

    StructRe: Rewriting for Structured Shape Modeling

    Authors: Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Taku Komura, Wenping Wang

    Abstract: Man-made 3D shapes are naturally organized in parts and hierarchies; such structures provide important constraints for shape reconstruction and generation. Modeling shape structures is difficult, because there can be multiple hierarchies for a given shape, causing ambiguity, and across different categories the shape structures are correlated with semantics, limiting generalization. We present Stru… ▽ More

    Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Our project page: https://jiepengwang.github.io/StructRe/

  12. arXiv:2311.17366  [pdf, other

    cs.CV

    Generative Hierarchical Temporal Transformer for Hand Action Recognition and Motion Prediction

    Authors: Yilin Wen, Hao Pan, Takehiko Ohkawa, Lei Yang, Jia Pan, Yoichi Sato, Taku Komura, Wenping Wang

    Abstract: We present a novel framework that concurrently tackles hand action recognition and 3D future hand motion prediction. While previous works focus on either recognition or prediction, we propose a generative Transformer VAE architecture to jointly capture both aspects, facilitating realistic motion prediction by leveraging the short-term hand motion and long-term action consistency observed across ti… ▽ More

    Submitted 24 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  13. arXiv:2311.17135  [pdf, other

    cs.CV cs.GR

    TLControl: Trajectory and Language Control for Human Motion Synthesis

    Authors: Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu

    Abstract: Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion s… ▽ More

    Submitted 12 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  14. arXiv:2311.17050  [pdf, other

    cs.CV cs.GR

    Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

    Authors: Zhengming Yu, Zhiyang Dou, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wenping Wang

    Abstract: We present Surf-D, a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies using Diffusion models. Previous methods explored shape generation with different representations and they suffer from limited topologies and poor geometry details. To generate high-quality surfaces of arbitrary topologies, we use the Unsigned Distance Field (UDF) as our surface representa… ▽ More

    Submitted 22 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Page: https://yzmblog.github.io/projects/SurfD/

  15. arXiv:2310.06851  [pdf, other

    cs.CV cs.AI cs.GR

    BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer

    Authors: Kunkun Pang, Dafei Qin, Yingruo Fan, Julian Habekost, Takaaki Shiratori, Junichi Yamagishi, Taku Komura

    Abstract: Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framew… ▽ More

    Submitted 6 September, 2023; originally announced October 2023.

    Comments: 12 pages, 13 figures

  16. arXiv:2309.11351  [pdf, other

    cs.GR cs.AI cs.LG

    C$\cdot$ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters

    Authors: Zhiyang Dou, Xuelin Chen, Qingnan Fan, Taku Komura, Wenping Wang

    Abstract: We present C$\cdot$ASE, an efficient and effective framework that learns conditional Adversarial Skill Embeddings for physics-based characters. Our physically simulated character can learn a diverse repertoire of skills while providing controllability in the form of direct manipulation of the skills to be performed. C$\cdot$ASE divides the heterogeneous skill motions into distinct subsets containi… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH Asia 2023

  17. arXiv:2309.08878  [pdf, other

    cs.GR

    Surface Extraction from Neural Unsigned Distance Fields

    Authors: Congyi Zhang, Guying Lin, Lei Yang, Xin Li, Taku Komura, Scott Schaefer, John Keyser, Wenping Wang

    Abstract: We propose a method, named DualMesh-UDF, to extract a surface from unsigned distance functions (UDFs), encoded by neural networks, or neural UDFs. Neural UDFs are becoming increasingly popular for surface representation because of their versatility in presenting surfaces with arbitrary topologies, as opposed to the signed distance function that is limited to representing a closed surface. However,… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  18. arXiv:2309.03453  [pdf, other

    cs.CV cs.AI cs.GR

    SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

    Authors: Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, Wenping Wang

    Abstract: In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To a… ▽ More

    Submitted 15 April, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 Spotlight. Project page: https://liuyuan-pal.github.io/SyncDreamer/ Code: https://github.com/liuyuan-pal/SyncDreamer

  19. arXiv:2308.13934  [pdf, other

    cs.GR

    Patch-Grid: An Efficient and Feature-Preserving Neural Implicit Surface Representation

    Authors: Guying Lin, Lei Yang, Congyi Zhang, Hao Pan, Yuhan Ping, Guodong Wei, Taku Komura, John Keyser, Wenping Wang

    Abstract: Neural implicit representations are known to be more compact for depicting 3D shapes than traditional discrete representations. However, the neural representations tend to round sharp corners or edges and struggle to represent surfaces with open boundaries. Moreover, they are slow to train. We present a unified neural implicit representation, called Patch-Grid, that fits to complex shapes efficien… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  20. arXiv:2308.12751  [pdf

    cs.GR cs.AI cs.LG cs.SE

    Motion In-Betweening with Phase Manifolds

    Authors: Paul Starke, Sebastian Starke, Taku Komura, Frank Steinicke

    Abstract: This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters by making use of phases variables learned by a Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network model, in which the phases cluster movements in both space and time with different expert weights. Each generated set of weights then produces a sequence of poses in a… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 17 pages, 11 figures, conference

    MSC Class: 68U99 ACM Class: I.3.8; I.3.m

    Journal ref: ACM SIGGRAPH / Eurographics Symposium on Computer Animation (SCA), August 4-6, 2023, Los Angeles, CA, USA

  21. arXiv:2308.09400  [pdf, other

    cs.GR

    GIPC: Fast and stable Gauss-Newton optimization of IPC barrier energy

    Authors: Kemeng Huang, Floyd Chitalu, Huancheng Lin, Taku Komura

    Abstract: Barrier functions are crucial for maintaining an intersection and inversion free simulation trajectory but existing methods which directly use distance can restrict implementation design and performance. We present an approach to rewriting the barrier function for arriving at an efficient and robust approximation of its Hessian. The key idea is to formulate a simplicial geometric measure of contac… ▽ More

    Submitted 24 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    ACM Class: I.3.5

  22. arXiv:2305.17398  [pdf, other

    cs.CV cs.GR

    NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

    Authors: Yuan Liu, Peng Wang, Cheng Lin, Xiaoxiao Long, Jiepeng Wang, Lingjie Liu, Taku Komura, Wenping Wang

    Abstract: We present a neural rendering-based method called NeRO for reconstructing the geometry and the BRDF of reflective objects from multiview images captured in an unknown environment. Multiview reconstruction of reflective objects is extremely challenging because specular reflections are view-dependent and thus violate the multiview consistency, which is the cornerstone for most multiview reconstructi… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to SIGGRAPH 2023. Project page: https://liuyuan-pal.github.io/NeRO/ Codes: https://github.com/liuyuan-pal/NeRO

  23. arXiv:2305.08296  [pdf, other

    cs.GR cs.AI

    Neural Face Rigging for Animating and Retargeting Facial Meshes in the Wild

    Authors: Dafei Qin, Jun Saito, Noam Aigerman, Thibault Groueix, Taku Komura

    Abstract: We propose an end-to-end deep-learning approach for automatic rigging and retargeting of 3D models of human faces in the wild. Our approach, called Neural Face Rigging (NFR), holds three key properties: (i) NFR's expression space maintains human-interpretable editing parameters for artistic controls; (ii) NFR is readily applicable to arbitrary facial meshes with different connectivity and expr… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH 2023(Conference Track), 13 pages, 15 figures

  24. arXiv:2303.15951  [pdf, other

    cs.CV cs.GR

    F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories

    Authors: Peng Wang, Yuan Liu, Zhaoxi Chen, Lingjie Liu, Ziwei Liu, Taku Komura, Christian Theobalt, Wenping Wang

    Abstract: This paper presents a novel grid-based NeRF called F2-NeRF (Fast-Free-NeRF) for novel view synthesis, which enables arbitrary input camera trajectories and only costs a few minutes for training. Existing fast grid-based NeRF training frameworks, like Instant-NGP, Plenoxels, DVGO, or TensoRF, are mainly designed for bounded scenes and rely on space warping to handle unbounded scenes. Existing two w… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project page: https://totoro97.github.io/projects/f2-nerf

  25. arXiv:2303.13796  [pdf, other

    cs.CV

    Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

    Authors: Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

    Abstract: As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body. The naive focal leng… ▽ More

    Submitted 24 August, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  26. arXiv:2303.08064  [pdf, other

    cs.CV cs.GR

    Online Neural Path Guiding with Normalized Anisotropic Spherical Gaussians

    Authors: Jiawei Huang, Akito Iizuka, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura

    Abstract: The variance reduction speed of physically-based rendering is heavily affected by the adopted importance sampling technique. In this paper we propose a novel online framework to learn the spatial-varying density model with a single small neural network using stochastic ray samples. To achieve this task, we propose a novel closed-form density model called the normalized anisotropic spherical gaussi… ▽ More

    Submitted 27 February, 2024; v1 submitted 11 March, 2023; originally announced March 2023.

    ACM Class: I.3

  27. arXiv:2211.14173  [pdf, other

    cs.CV

    NeuralUDF: Learning Unsigned Distance Fields for Multi-view Reconstruction of Surfaces with Arbitrary Topologies

    Authors: Xiaoxiao Long, Cheng Lin, Lingjie Liu, Yuan Liu, Peng Wang, Christian Theobalt, Taku Komura, Wenping Wang

    Abstract: We present a novel method, called NeuralUDF, for reconstructing surfaces with arbitrary topologies from 2D images via volume rendering. Recent advances in neural rendering based reconstruction have achieved compelling results. However, these methods are limited to objects with closed surfaces since they adopt Signed Distance Function (SDF) as surface representation which requires the target shape… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Visit our project page at https://www.xxlong.site/NeuralUDF/

  28. arXiv:2211.10705  [pdf, other

    cs.CV

    TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer

    Authors: Zhiyang Dou, Qingxuan Wu, Cheng Lin, Zeyu Cao, Qiangqiang Wu, Weilin Wan, Taku Komura, Wenping Wang

    Abstract: In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. Current SOTA performance is achieved by Transformer-based structures. However, they suffer from high model complexity and computation cost caused by redundant tokens. We propose token reduction strategies based on two important aspects, i.e.,… ▽ More

    Submitted 10 August, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

    Comments: Accepted to ICCV 2023

  29. arXiv:2209.09484  [pdf, other

    cs.CV cs.RO

    Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos

    Authors: Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, Wenping Wang

    Abstract: Understanding dynamic hand motions and actions from egocentric RGB videos is a fundamental yet challenging task due to self-occlusion and ambiguity. To address occlusion and ambiguity, we develop a transformer-based framework to exploit temporal information for robust estimation. Noticing the different temporal granularity of and the semantic correlation between hand pose estimation and action rec… ▽ More

    Submitted 28 March, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted by CVPR 2023; Project page: https://fylwen.github.io/htt.html

  30. arXiv:2207.04465  [pdf, other

    cs.CV cs.GR

    Progressively-connected Light Field Network for Efficient View Synthesis

    Authors: Peng Wang, Yuan Liu, Guying Lin, Jiatao Gu, Lingjie Liu, Taku Komura, Wenping Wang

    Abstract: This paper presents a Progressively-connected Light Field network (ProLiF), for the novel view synthesis of complex forward-facing scenes. ProLiF encodes a 4D light field, which allows rendering a large batch of rays in one training step for image- or patch-level losses. Directly learning a neural light field from images has difficulty in rendering multi-view consistent images due to its unawarene… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: Project page: https://totoro97.github.io/projects/prolif

  31. arXiv:2206.13597  [pdf, other

    cs.CV

    NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors

    Authors: Jiepeng Wang, Peng Wang, Xiaoxiao Long, Christian Theobalt, Taku Komura, Lingjie Liu, Wenping Wang

    Abstract: Reconstructing 3D indoor scenes from 2D images is an important task in many computer vision and graphics applications. A main challenge in this task is that large texture-less areas in typical indoor scenes make existing methods struggle to produce satisfactory reconstruction results. We propose a new method, named NeuRIS, for high quality reconstruction of indoor scenes. The key idea of NeuRIS is… ▽ More

    Submitted 16 October, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

  32. Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions

    Authors: Weilin Wan, Lei Yang, Lingjie Liu, Zhuoying Zhang, Ruixing Jia, Yi-King Choi, Jia Pan, Christian Theobalt, Taku Komura, Wenping Wang

    Abstract: Understanding human intentions during interactions has been a long-lasting theme, that has applications in human-robot interaction, virtual reality and surveillance. In this study, we focus on full-body human interactions with large-sized daily objects and aim to predict the future states of objects and humans given a sequential observation of human-object interaction. As there is no such dataset… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Journal ref: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 2, April 2022)

  33. arXiv:2206.05737  [pdf, other

    cs.CV

    SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views

    Authors: Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, Wenping Wang

    Abstract: We introduce SparseNeuS, a novel neural rendering based method for the task of surface reconstruction from multi-view images. This task becomes more difficult when only sparse images are provided as input, a scenario where existing neural reconstruction approaches usually produce incomplete or distorted results. Moreover, their inability of generalizing to unseen new scenes impedes their applicati… ▽ More

    Submitted 2 August, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: Project page: https://www.xxlong.site/SparseNeuS/

  34. arXiv:2204.10776  [pdf, other

    cs.CV

    Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

    Authors: Yuan Liu, Yilin Wen, Sida Peng, Cheng Lin, Xiaoxiao Long, Taku Komura, Wenping Wang

    Abstract: In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately p… ▽ More

    Submitted 26 January, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: Camera ready version for ECCV2022, Project page: https://liuyuan-pal.github.io/Gen6D/ Code: https://github.com/liuyuan-pal/Gen6D

  35. arXiv:2201.04439  [pdf, other

    cs.GR cs.AI cs.CV cs.LG

    Real-Time Style Modelling of Human Locomotion via Feature-Wise Transformations and Local Motion Phases

    Authors: Ian Mason, Sebastian Starke, Taku Komura

    Abstract: Controlling the manner in which a character moves in a real-time animation system is a challenging task with useful applications. Existing style transfer systems require access to a reference content motion clip, however, in real-time systems the future motion content is unknown and liable to change with user input. In this work we present a style modelling system that uses an animation synthesis… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  36. arXiv:2112.05329  [pdf, other

    cs.CV cs.GR

    FaceFormer: Speech-Driven 3D Facial Animation with Transformers

    Authors: Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping Wang, Taku Komura

    Abstract: Speech-driven 3D facial animation is challenging due to the complex geometry of human faces and the limited availability of 3D audio-visual data. Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, wh… ▽ More

    Submitted 16 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022

  37. arXiv:2112.02214  [pdf, other

    cs.CV cs.GR

    Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation

    Authors: Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping Wang, Taku Komura

    Abstract: Speech-driven 3D facial animation with accurate lip synchronization has been widely studied. However, synthesizing realistic motions for the entire face during speech has rarely been explored. In this work, we present a joint audio-text model to capture the contextual information for expressive speech-driven 3D facial animation. The existing datasets are collected to cover as many different phonem… ▽ More

    Submitted 7 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

  38. arXiv:2110.00965  [pdf, other

    cs.GR

    Coverage Axis: Inner Point Selection for 3D Shape Skeletonization

    Authors: Zhiyang Dou, Cheng Lin, Rui Xu, Lei Yang, Shiqing Xin, Taku Komura, Wenping Wang

    Abstract: In this paper, we present a simple yet effective formulation called Coverage Axis for 3D shape skeletonization. Inspired by the set cover problem, our key idea is to cover all the surface points using as few inside medial balls as possible. This formulation inherently induces a compact and expressive approximation of the Medial Axis Transform (MAT) of a given shape. Different from previous methods… ▽ More

    Submitted 26 January, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

  39. DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation

    Authors: Yilin Wen, Xiangyu Li, Hao Pan, Lei Yang, Zheng Wang, Taku Komura, Wenping Wang

    Abstract: Scalable 6D pose estimation for rigid objects from RGB images aims at handling multiple objects and generalizing to novel objects. Building on a well-known auto-encoding framework to cope with object symmetry and the lack of labeled training data, we achieve scalability by disentangling the latent representation of auto-encoder into shape and pose sub-spaces. The latent shape space models the simi… ▽ More

    Submitted 21 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: Accepted by European Conference on Computer Vision, 2022; Project page: https://fylwen.github.io/disp6d.html

  40. arXiv:2106.10689  [pdf, other

    cs.CV cs.GR

    NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

    Authors: Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, Wenping Wang

    Abstract: We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures… ▽ More

    Submitted 1 February, 2023; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: 23 pages

  41. arXiv:2006.12075  [pdf, other

    cs.CV cs.GR cs.LG

    MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

    Authors: Mingyi Shi, Kfir Aberman, Andreas Aristidou, Taku Komura, Dani Lischinski, Daniel Cohen-Or, Baoquan Chen

    Abstract: We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video.While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted to Transactions on Graphics (ToG) 2020. Project page: {https://rubbly.cn/publications/motioNet} Video: {https://youtu.be/8YubchlzvFA}

    ACM Class: I.4.5

    Journal ref: ACM Transaction on Graphics, 40(1), Article 1, 2020

  42. arXiv:2006.11620  [pdf, other

    cs.GR

    Technical Note: Generating Realistic Fighting Scenes by Game Tree

    Authors: Hubert P. H. Shum, Taku Komura

    Abstract: Recently, there have been a lot of researches to synthesize / edit the motion of a single avatar in the virtual environment. However, there has not been so much work of simulating continuous interactions of multiple avatars such as fighting. In this paper, we propose a new method to generate a realistic fighting scene based on motion capture data. We propose a new algorithm called the temporal exp… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

    Comments: 7 pages, 7 figures

    ACM Class: I.3.3

  43. Learning natural locomotion behaviors for humanoid robots using human knowledge

    Authors: Chuanyu Yang, Kai Yuan, Shuai Heng, Taku Komura, Zhibin Li

    Abstract: This paper presents a new learning framework that leverages the knowledge from imitation learning, deep reinforcement learning, and control theories to achieve human-style locomotion that is natural, dynamic, and robust for humanoids. We proposed novel approaches to introduce human bias, i.e. motion capture data and a special Multi-Expert network structure. We used the Multi-Expert network structu… ▽ More

    Submitted 11 February, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: university policy

  44. Learning Whole-body Motor Skills for Humanoids

    Authors: Chuanyu Yang, Kai Yuan, Wolfgang Merkt, Taku Komura, Sethu Vijayakumar, Zhibin Li

    Abstract: This paper presents a hierarchical framework for Deep Reinforcement Learning that acquires motor skills for a variety of push recovery and balancing behaviors, i.e., ankle, hip, foot tilting, and stepping strategies. The policy is trained in a physics simulator with realistic setting of robot model and low-level impedance control that are easy to transfer the learned skills to real robots. The adv… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)

  45. Emergence of Human-comparable Balancing Behaviors by Deep Reinforcement Learning

    Authors: Chuanyu Yang, Taku Komura, Zhibin Li

    Abstract: This paper presents a hierarchical framework based on deep reinforcement learning that learns a diversity of policies for humanoid balance control. Conventional zero moment point based controllers perform limited actions during under-actuation, whereas the proposed framework can perform human-like balancing behaviors such as active push-off of ankles. The learning is done through the design of an… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.