Skip to main content

Showing 1–19 of 19 results for author: Shan, Q

  1. arXiv:2404.03085  [pdf, other

    cs.HC cs.AI cs.LG

    Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

    Authors: Fred Hohman, Chaoqun Wang, Jinmook Lee, Jochen Görtler, Dominik Moritz, Jeffrey P Bigham, Zhile Ren, Cecile Foret, Qi Shan, Xiaoyi Zhang

    Abstract: On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML mo… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Proceedings of the 2024 ACM CHI Conference on Human Factors in Computing Systems

  2. arXiv:2312.02189  [pdf, other

    cs.CV cs.AI

    StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

    Authors: Pengsheng Guo, Hans Hao, Adam Caccavale, Zhongzheng Ren, Edward Zhang, Qi Shan, Aditya Sankar, Alexander G. Schwing, Alex Colburn, Fangchang Ma

    Abstract: In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  3. arXiv:2307.08771  [pdf, other

    cs.CV

    UPSCALE: Unconstrained Channel Pruning

    Authors: Alvin Wan, Hanxiang Hao, Kaushik Patnaik, Yueyang Xu, Omer Hadad, David Güera, Zhile Ren, Qi Shan

    Abstract: As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower th… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 29 pages, 26 figures, accepted to ICML 2023

  4. arXiv:2303.17015  [pdf, other

    cs.CV cs.LG

    HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

    Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

    Abstract: Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Project page: https://ziyaerkoc.com/hyperdiffusion/ Video: https://www.youtube.com/watch?v=wjFpsKdo-II

  5. arXiv:2208.08120  [pdf, other

    cs.RO

    Highly dynamic locomotion control of biped robot enhanced by swing arms

    Authors: Weijie Wang, Song Liu, Qinfeng Shan, Lihao Jia

    Abstract: Swing arms have an irreplaceable role in promoting highly dynamic locomotion on bipedal robots by a larger angular momentum control space from the viewpoint of biomechanics. Few bipedal robots utilize swing arms and its redundancy characteristic of multiple degrees of freedom due to the lack of appropriate locomotion control strategies to perfectly integrate modeling and control. This paper presen… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 7 pages, 12 figures

  6. arXiv:2208.02879  [pdf, other

    cs.CV cs.LG

    PointConvFormer: Revenge of the Point-based Convolution

    Authors: Wenxuan Wu, Li Fuxin, Qi Shan

    Abstract: We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. In PointConvFormer, attention computed from feature difference between points in the neighbor… ▽ More

    Submitted 10 May, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: Accepted at CVPR 2023

  7. arXiv:2205.07763  [pdf, other

    cs.CV

    FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction

    Authors: Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang

    Abstract: Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy i… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: CVPR 2022

  8. arXiv:2204.02411  [pdf, other

    cs.CV cs.GR

    Texturify: Generating Textures on 3D Shape Surfaces

    Authors: Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

    Abstract: Texture cues on 3D objects are key to compelling visual representations, with the possibility to create high visual fidelity with inherent spatial consistency across different views. Since the availability of textured 3D shapes remains very limited, learning a 3D-supervised data-driven method that predicts a texture based on the 3D input is very challenging. We thus propose Texturify, a GAN-based… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Project Page: https://nihalsid.github.io/texturify

  9. arXiv:2110.11919  [pdf, other

    cs.SI

    Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration

    Authors: Yichi Qian, Qiyi Shan, Hanjia Lyu, Jiebo Luo

    Abstract: The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose information operations such as censorship/suspension on… ▽ More

    Submitted 23 July, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in Proceedings of the International Workshop on Social Sensing (SocialSens 2022): Special Edition on Belief Dynamics, 2022

  10. arXiv:2107.05775  [pdf, other

    cs.CV cs.GR cs.LG

    Fast and Explicit Neural View Synthesis

    Authors: Pengsheng Guo, Miguel Angel Bautista, Alex Colburn, Liang Yang, Daniel Ulbricht, Joshua M. Susskind, Qi Shan

    Abstract: We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radian… ▽ More

    Submitted 8 December, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  11. arXiv:2104.13325  [pdf, other

    cs.CV

    MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

    Authors: Zhenpei Yang, Zhile Ren, Qi Shan, Qixing Huang

    Abstract: Deep learning has made significant impacts on multi-view stereo systems. State-of-the-art approaches typically involve building a cost volume, followed by multiple 3D convolution operations to recover the input image's pixel-wise depth. While such end-to-end learning of plane-sweeping stereo advances public benchmarks' accuracy, they are typically very slow to compute. We present \ouralg, a highly… ▽ More

    Submitted 11 December, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Our code is released at https://github.com/zhenpeiyang/MVS2D

  12. arXiv:2104.00024  [pdf, other

    cs.CV

    RetrievalFuse: Neural 3D Scene Reconstruction with a Database

    Authors: Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

    Abstract: 3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scen… ▽ More

    Submitted 10 August, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

    Comments: Project Page: https://nihalsid.github.io/retrieval-fuse/

  13. arXiv:2101.04893  [pdf, other

    cs.HC

    Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels

    Authors: Xiaoyi Zhang, Lilian de Greef, Amanda Swearngin, Samuel White, Kyle Murray, Lisa Yu, Qi Shan, Jeffrey Nichols, Jason Wu, Chris Fleizach, Aaron Everitt, Jeffrey P. Bigham

    Abstract: Many accessibility features available on mobile platforms require applications (apps) to provide complete and accurate metadata describing user interface (UI) components. Unfortunately, many apps do not provide sufficient metadata for accessibility features to work as expected. In this paper, we explore inferring accessibility metadata for mobile apps from their pixels, as the visual interfaces of… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  14. arXiv:2006.07630  [pdf, other

    cs.CV stat.ML

    Equivariant Neural Rendering

    Authors: Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Josh Susskind, Qi Shan

    Abstract: We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer… ▽ More

    Submitted 21 December, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: Add link to code

  15. arXiv:1910.04099  [pdf, other

    cs.CV cs.LG eess.IV

    Manhattan Room Layout Reconstruction from a Single 360 image: A Comparative Study of State-of-the-art Methods

    Authors: Chuhang Zou, Jheng-Wei Su, Chi-Han Peng, Alex Colburn, Qi Shan, Peter Wonka, Hung-Kuo Chu, Derek Hoiem

    Abstract: Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different d… ▽ More

    Submitted 25 December, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: Accepted by International Journal of Computer Vision (IJCV), 2021

  16. arXiv:1803.08999  [pdf, other

    cs.CV cs.AI

    LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

    Authors: Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem

    Abstract: We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to align… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

    Comments: CVPR2018

  17. arXiv:1712.09004  [pdf, other

    cs.CV

    RIDI: Robust IMU Double Integration

    Authors: Hang Yan, Qi Shan, Yasutaka Furukawa

    Abstract: This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear… ▽ More

    Submitted 30 December, 2017; v1 submitted 24 December, 2017; originally announced December 2017.

  18. arXiv:1612.01256  [pdf, other

    cs.CV

    Panoramic Structure from Motion via Geometric Relationship Detection

    Authors: Satoshi Ikehata, Ivaylo Boyadzhiev, Qi Shan, Yasutaka Furukawa

    Abstract: This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perfo… ▽ More

    Submitted 5 December, 2016; originally announced December 2016.

  19. arXiv:1608.05137  [pdf, other

    cs.CV

    IM2CAD

    Authors: Hamid Izadinia, Qi Shan, Steven M. Seitz

    Abstract: Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodel… ▽ More

    Submitted 23 April, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

    Comments: To appear at CVPR 2017