subscribe to arXiv mailings

doi 10.1145/3613904.3642628

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Authors: Fred Hohman, Chaoqun Wang, Jinmook Lee, Jochen Görtler, Dominik Moritz, Jeffrey P Bigham, Zhile Ren, Cecile Foret, Qi Shan, Xiaoyi Zhang

Abstract: On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML mo… ▽ More On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Proceedings of the 2024 ACM CHI Conference on Human Factors in Computing Systems

arXiv:2312.02189 [pdf, other]

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

Authors: Pengsheng Guo, Hans Hao, Adam Caccavale, Zhongzheng Ren, Edward Zhang, Qi Shan, Aditya Sankar, Alexander G. Schwing, Alex Colburn, Fangchang Ma

Abstract: In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the… ▽ More In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the diffusion network, and the 3D model representation. To overcome these limitations, we present StableDreamer, a methodology incorporating three advances. First, inspired by InstructNeRF2NeRF, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. This finding provides a novel tool to debug SDS, which we use to show the impact of time-annealing noise levels on reducing multi-faced geometries. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition. Based on this observation, StableDreamer introduces a two-stage training strategy that effectively combines these aspects, resulting in high-fidelity 3D models. Third, we adopt an anisotropic 3D Gaussians representation, replacing Neural Radiance Fields (NeRFs), to enhance the overall quality, reduce memory usage during training, and accelerate rendering speeds, and better capture semi-transparent objects. StableDreamer reduces multi-face geometries, generates fine details, and converges stably. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2307.08771 [pdf, other]

UPSCALE: Unconstrained Channel Pruning

Authors: Alvin Wan, Hanxiang Hao, Kaushik Patnaik, Yueyang Xu, Omer Hadad, David Güera, Zhile Ren, Qi Shan

Abstract: As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower th… ▽ More As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower than the unpruned model. As a workaround, pruners conventionally constrain certain channels to be pruned together. This fully eliminates memory copies but, as we show, significantly impairs accuracy. We now have a dilemma: Remove constraints but increase latency, or add constraints and impair accuracy. In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints. Using this insight, we design a generic algorithm UPSCALE to prune models with any pruning pattern. By removing constraints from existing pruners, we improve ImageNet accuracy for post-training pruned models by 2.1 points on average -- benefiting DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Furthermore, by reordering channels, UPSCALE improves inference speeds by up to 2x over a baseline export. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 29 pages, 26 figures, accepted to ICML 2023

arXiv:2303.17015 [pdf, other]

HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

Abstract: Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize… ▽ More Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize new data. To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. HyperDiffusion operates directly on MLP weights and generates new neural implicit fields encoded by synthesized MLP parameters. Specifically, a collection of MLPs is first optimized to faithfully represent individual data samples. Subsequently, a diffusion process is trained in this MLP weight space to model the underlying distribution of neural implicit fields. HyperDiffusion enables diffusion modeling over a implicit, compact, and yet high-fidelity representation of complex signals across 3D shapes and 4D mesh animations within one single unified framework. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Project page: https://ziyaerkoc.com/hyperdiffusion/ Video: https://www.youtube.com/watch?v=wjFpsKdo-II

arXiv:2208.08120 [pdf, other]

Highly dynamic locomotion control of biped robot enhanced by swing arms

Authors: Weijie Wang, Song Liu, Qinfeng Shan, Lihao Jia

Abstract: Swing arms have an irreplaceable role in promoting highly dynamic locomotion on bipedal robots by a larger angular momentum control space from the viewpoint of biomechanics. Few bipedal robots utilize swing arms and its redundancy characteristic of multiple degrees of freedom due to the lack of appropriate locomotion control strategies to perfectly integrate modeling and control. This paper presen… ▽ More Swing arms have an irreplaceable role in promoting highly dynamic locomotion on bipedal robots by a larger angular momentum control space from the viewpoint of biomechanics. Few bipedal robots utilize swing arms and its redundancy characteristic of multiple degrees of freedom due to the lack of appropriate locomotion control strategies to perfectly integrate modeling and control. This paper presents a kind of control strategy by modeling the bipedal robot as a flywheel-spring loaded inverted pendulum (F-SLIP) to extract characteristics of swing arms and using the whole-body controller (WBC) to achieve these characteristics, and also proposes a evaluation system including three aspects of agility defined by us, stability and energy consumption for the highly dynamic locomotion of bipedal robots. We design several sets of simulation experiments and analyze the effects of swing arms according to the evaluation system during the jumping motion of Purple (Purple energy rises in the east)V1.0, a kind of bipedal robot designed to test high explosive locomotion. Results show that Purple's agility is increased by more than 10 percent, stabilization time is reduced by a factor of two, and energy consumption is reduced by more than 20 percent after introducing swing arms. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: 7 pages, 12 figures

arXiv:2208.02879 [pdf, other]

PointConvFormer: Revenge of the Point-based Convolution

Authors: Wenxuan Wu, Li Fuxin, Qi Shan

Abstract: We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. In PointConvFormer, attention computed from feature difference between points in the neighbor… ▽ More We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. In PointConvFormer, attention computed from feature difference between points in the neighborhood is used to modify the convolutional weights at each point. Hence, we preserved the invariances from point convolution, whereas attention helps to select relevant points in the neighborhood for convolution. PointConvFormer is suitable for multiple tasks that require details at the point level, such as segmentation and scene flow estimation tasks. We experiment on both tasks with multiple datasets including ScanNet, SemanticKitti, FlyingThings3D and KITTI. Our results show that PointConvFormer offers a better accuracy-speed tradeoff than classic convolutions, regular transformers, and voxelized sparse convolution approaches. Visualizations show that PointConvFormer performs similarly to convolution on flat areas, whereas the neighborhood selection effect is stronger on object boundaries, showing that it has got the best of both worlds. △ Less

Submitted 10 May, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

Comments: Accepted at CVPR 2023

arXiv:2205.07763 [pdf, other]

FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction

Authors: Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang

Abstract: Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy i… ▽ More Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses. The core of our approach is a fast and robust multi-view reconstruction algorithm to jointly refine 3D geometry and camera pose estimation using learnable neural network modules. We provide a thorough benchmark of state-of-the-art approaches for this problem on ShapeNet. Our approach achieves best-in-class results. It is also two orders of magnitude faster than the recent optimization-based approach IDR. Our code is released at \url{https://github.com/zhenpeiyang/FvOR/} △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: CVPR 2022

arXiv:2204.02411 [pdf, other]

Texturify: Generating Textures on 3D Shape Surfaces

Authors: Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

Abstract: Texture cues on 3D objects are key to compelling visual representations, with the possibility to create high visual fidelity with inherent spatial consistency across different views. Since the availability of textured 3D shapes remains very limited, learning a 3D-supervised data-driven method that predicts a texture based on the 3D input is very challenging. We thus propose Texturify, a GAN-based… ▽ More Texture cues on 3D objects are key to compelling visual representations, with the possibility to create high visual fidelity with inherent spatial consistency across different views. Since the availability of textured 3D shapes remains very limited, learning a 3D-supervised data-driven method that predicts a texture based on the 3D input is very challenging. We thus propose Texturify, a GAN-based method that leverages a 3D shape dataset of an object class and learns to reproduce the distribution of appearances observed in real images by generating high-quality textures. In particular, our method does not require any 3D color supervision or correspondence between shape geometry and images to learn the texturing of 3D objects. Texturify operates directly on the surface of the 3D objects by introducing face convolutional operators on a hierarchical 4-RoSy parametrization to generate plausible object-specific textures. Employing differentiable rendering and adversarial losses that critique individual views and consistency across views, we effectively learn the high-quality surface texturing distribution from real-world images. Experiments on car and chair shape collections show that our approach outperforms state of the art by an average of 22% in FID score. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: Project Page: https://nihalsid.github.io/texturify

arXiv:2110.11919 [pdf, other]

Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration

Authors: Yichi Qian, Qiyi Shan, Hanjia Lyu, Jiebo Luo

Abstract: The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose information operations such as censorship/suspension on… ▽ More The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose information operations such as censorship/suspension on social media. In this paper, we focus on a study of censorship on Weibo, the counterpart of Twitter in China. Specifically, we 1) create a web-scraping pipeline and collect a large dataset solely focus on the reposts from Weibo; 2) discover the characteristics of users whose reposts contain censored information, in terms of gender, device, and account type; and 3) conduct a thematic analysis by extracting and analyzing topic information. Note that although the original posts are no longer visible, we can use comments users wrote when reposting the original post to infer the topic of the original content. We find that such efforts can recover the discussions around social events that triggered massive discussions but were later muted. Further, we show the variations of inferred topics across different user groups and time frames. △ Less

Submitted 23 July, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: Accepted for publication in Proceedings of the International Workshop on Social Sensing (SocialSens 2022): Special Edition on Belief Dynamics, 2022

arXiv:2107.05775 [pdf, other]

Fast and Explicit Neural View Synthesis

Authors: Pengsheng Guo, Miguel Angel Bautista, Alex Colburn, Liang Yang, Daniel Ulbricht, Joshua M. Susskind, Qi Shan

Abstract: We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radian… ▽ More We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radiance field representations have gained a lot of attention due to their expressive power, our simple approach obtains comparable or even better novel view reconstruction quality comparing with state-of-the-art baselines while increasing rendering speed by over 400x. Our model is trained in a category-agnostic manner and does not require scene-specific optimization. Therefore, it is able to generalize novel view synthesis to object categories not seen during training. In addition, we show that with our simple formulation, we can use view synthesis as a self-supervision signal for efficient learning of 3D geometry without explicit 3D supervision. △ Less

Submitted 8 December, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2104.13325 [pdf, other]

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Authors: Zhenpei Yang, Zhile Ren, Qi Shan, Qixing Huang

Abstract: Deep learning has made significant impacts on multi-view stereo systems. State-of-the-art approaches typically involve building a cost volume, followed by multiple 3D convolution operations to recover the input image's pixel-wise depth. While such end-to-end learning of plane-sweeping stereo advances public benchmarks' accuracy, they are typically very slow to compute. We present \ouralg, a highly… ▽ More Deep learning has made significant impacts on multi-view stereo systems. State-of-the-art approaches typically involve building a cost volume, followed by multiple 3D convolution operations to recover the input image's pixel-wise depth. While such end-to-end learning of plane-sweeping stereo advances public benchmarks' accuracy, they are typically very slow to compute. We present \ouralg, a highly efficient multi-view stereo algorithm that seamlessly integrates multi-view constraints into single-view networks via an attention mechanism. Since \ouralg only builds on 2D convolutions, it is at least $2\times$ faster than all the notable counterparts. Moreover, our algorithm produces precise depth estimations and 3D reconstructions, achieving state-of-the-art results on challenging benchmarks ScanNet, SUN3D, RGBD, and the classical DTU dataset. our algorithm also out-performs all other algorithms in the setting of inexact camera poses. Our code is released at \url{https://github.com/zhenpeiyang/MVS2D} △ Less

Submitted 11 December, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: Our code is released at https://github.com/zhenpeiyang/MVS2D

arXiv:2104.00024 [pdf, other]

RetrievalFuse: Neural 3D Scene Reconstruction with a Database

Authors: Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

Abstract: 3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scen… ▽ More 3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scene geometry from the training database. First, we learn to synthesize an initial estimate for a 3D scene, constructed by retrieving a top-k set of volumetric chunks from the scene database. These candidates are then refined to a final scene generation with an attention-based refinement that can effectively select the most consistent set of geometry from the candidates and combine them together to create an output scene, facilitating transfer of coherent structures and local detail from train scene geometry. We demonstrate our neural scene reconstruction with a database for the tasks of 3D super resolution and surface reconstruction from sparse point clouds, showing that our approach enables generation of more coherent, accurate 3D scenes, improving on average by over 8% in IoU over state-of-the-art scene reconstruction. △ Less

Submitted 10 August, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

Comments: Project Page: https://nihalsid.github.io/retrieval-fuse/

arXiv:2101.04893 [pdf, other]

Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels

Authors: Xiaoyi Zhang, Lilian de Greef, Amanda Swearngin, Samuel White, Kyle Murray, Lisa Yu, Qi Shan, Jeffrey Nichols, Jason Wu, Chris Fleizach, Aaron Everitt, Jeffrey P. Bigham

Abstract: Many accessibility features available on mobile platforms require applications (apps) to provide complete and accurate metadata describing user interface (UI) components. Unfortunately, many apps do not provide sufficient metadata for accessibility features to work as expected. In this paper, we explore inferring accessibility metadata for mobile apps from their pixels, as the visual interfaces of… ▽ More Many accessibility features available on mobile platforms require applications (apps) to provide complete and accurate metadata describing user interface (UI) components. Unfortunately, many apps do not provide sufficient metadata for accessibility features to work as expected. In this paper, we explore inferring accessibility metadata for mobile apps from their pixels, as the visual interfaces often best reflect an app's full functionality. We trained a robust, fast, memory-efficient, on-device model to detect UI elements using a dataset of 77,637 screens (from 4,068 iPhone apps) that we collected and annotated. To further improve UI detections and add semantic information, we introduced heuristics (e.g., UI grouping and ordering) and additional models (e.g., recognize UI content, state, interactivity). We built Screen Recognition to generate accessibility metadata to augment iOS VoiceOver. In a study with 9 screen reader users, we validated that our approach improves the accessibility of existing mobile apps, enabling even previously inaccessible apps to be used. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2006.07630 [pdf, other]

Equivariant Neural Rendering

Authors: Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Josh Susskind, Qi Shan

Abstract: We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer… ▽ More We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference. In addition, we introduce two challenging new datasets for scene representation and neural rendering, including scenes with complex lighting and backgrounds. Through experiments, we show that our model achieves compelling results on these datasets as well as on standard ShapeNet benchmarks. △ Less

Submitted 21 December, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: Add link to code

arXiv:1910.04099 [pdf, other]

Manhattan Room Layout Reconstruction from a Single 360 image: A Comparative Study of State-of-the-art Methods

Authors: Chuhang Zou, Jheng-Wei Su, Chi-Han Peng, Alex Colburn, Qi Shan, Peter Wonka, Hung-Kuo Chu, Derek Hoiem

Abstract: Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different d… ▽ More Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g. SegNet or ResNet), type of elements predicted (e.g. corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset [3], and introduce two depth-based evaluation metrics. △ Less

Submitted 25 December, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted by International Journal of Computer Vision (IJCV), 2021

arXiv:1803.08999 [pdf, other]

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

Authors: Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem

Abstract: We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to align… ▽ More We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to aligning the image based on vanishing points, predicting multiple layout elements (corners, boundaries, size and translation), and fitting a constrained Manhattan layout to the resulting predictions. Our method compares well in speed and accuracy to other existing work on panoramas, achieves among the best accuracy for perspective images, and can handle both cuboid-shaped and more general Manhattan layouts. △ Less

Submitted 23 March, 2018; originally announced March 2018.

Comments: CVPR2018

arXiv:1712.09004 [pdf, other]

RIDI: Robust IMU Double Integration

Authors: Hang Yan, Qi Shan, Yasutaka Furukawa

Abstract: This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear… ▽ More This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear accelerations and angular velocities, then corrects low-frequency bias in the linear accelerations, which are integrated twice to estimate positions. We have acquired training data with ground-truth motions across multiple human subjects and multiple phone placements (e.g., in a bag or a hand). The qualitatively and quantitatively evaluations have demonstrated that our algorithm has surprisingly shown comparable results to full Visual Inertial navigation. To our knowledge, this paper is the first to integrate sophisticated machine learning techniques with inertial navigation, potentially opening up a new line of research in the domain of data-driven inertial navigation. We will publicly share our code and data to facilitate further research. △ Less

Submitted 30 December, 2017; v1 submitted 24 December, 2017; originally announced December 2017.

arXiv:1612.01256 [pdf, other]

Panoramic Structure from Motion via Geometric Relationship Detection

Authors: Satoshi Ikehata, Ivaylo Boyadzhiev, Qi Shan, Yasutaka Furukawa

Abstract: This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perfo… ▽ More This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perform such detection, and our approach utilizes rough surface normal estimates from an image-to-normal deep network to discover geometric relationships among lines. The detected relationships provide exact geometric constraints in our line-based linear SfM formulation. A constrained linear least squares is used to reconstruct a 3D model and camera motions, followed by the bundle adjustment. We have validated our algorithm on challenging datasets, outperforming various state-of-the-art reconstruction techniques. △ Less

Submitted 5 December, 2016; originally announced December 2016.

arXiv:1608.05137 [pdf, other]

IM2CAD

Authors: Hamid Izadinia, Qi Shan, Steven M. Seitz

Abstract: Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodel… ▽ More Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement. △ Less

Submitted 23 April, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

Comments: To appear at CVPR 2017

Showing 1–19 of 19 results for author: Shan, Q