subscribe to arXiv mailings

doi 10.1145/3658226

fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence

Authors: Francis Williams, Jiahui Huang, Jonathan Swartz, Gergely Klár, Vijay Thakkar, Matthew Cong, Xuanchi Ren, Ruilong Li, Clement Fuji-Tsang, Sanja Fidler, Eftychios Sifakis, Ken Museth

Abstract: We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks wi… ▽ More We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks with no loss in efficiency: our operators match or exceed the performance of other frameworks with narrower scope. Furthermore, fVDB can process datasets with much larger footprint and spatial resolution than prior works, while providing a competitive memory footprint on small inputs. To achieve this combination of versatility and performance, fVDB relies on a single novel VDB index grid acceleration structure paired with several key innovations including GPU accelerated sparse grid construction, convolution using tensorcores, fast ray tracing kernels using a Hierarchical Digital Differential Analyzer algorithm (HDDA), and jagged tensors. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines, and we demonstrate its effectiveness on a number of representative tasks such as large-scale point-cloud segmentation, high resolution 3D generative modeling, unbounded scale Neural Radiance Fields, and large-scale point cloud reconstruction. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2404.08396 [pdf, other]

Joint Computation Offloading and Target Tracking in Integrated Sensing and Communication Enabled UAV Networks

Authors: Trinh Van Chien, Mai Dinh Cong, Nguyen Cong Luong, Tri Nhu Do, Dong In Kim, Symeon Chatzinotas

Abstract: In this paper, we investigate a joint computation offloading and target tracking in Integrated Sensing and Communication (ISAC)-enabled unmanned aerial vehicle (UAV) network. Therein, the UAV has a computing task that is partially offloaded to the ground UE for execution. Meanwhile, the UAV uses the offloading bit sequence to estimate the velocity of a ground target based on an autocorrelation fun… ▽ More In this paper, we investigate a joint computation offloading and target tracking in Integrated Sensing and Communication (ISAC)-enabled unmanned aerial vehicle (UAV) network. Therein, the UAV has a computing task that is partially offloaded to the ground UE for execution. Meanwhile, the UAV uses the offloading bit sequence to estimate the velocity of a ground target based on an autocorrelation function. The performance of the velocity estimation that is represented by Cramer-Rao lower bound (CRB) depends on the length of the offloading bit sequence and the UAV's location. Thus, we jointly optimize the task size for offloading and the UAV's location to minimize the overall computation latency and the CRB of the mean square error for velocity estimation subject to the UAV's budget. The problem is non-convex, and we propose a genetic algorithm to solve it. Simulation results are provided to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures, 1 table. Accepted by IEEE Communications Letters

arXiv:2403.09996 [pdf, other]

MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings

Authors: Yu Du, Yu Song, Ce Guo, Xiaojing Tian, Dong Liu, Ming Cong

Abstract: Due to their complex spatial structure and diverse geometric features, achieving high-precision and robust point cloud registration for complex Die Castings has been a significant challenge in the die-casting industry. Existing point cloud registration methods primarily optimize network models using well-established high-quality datasets, often neglecting practical application in real scenarios. T… ▽ More Due to their complex spatial structure and diverse geometric features, achieving high-precision and robust point cloud registration for complex Die Castings has been a significant challenge in the die-casting industry. Existing point cloud registration methods primarily optimize network models using well-established high-quality datasets, often neglecting practical application in real scenarios. To address this gap, this paper proposes a high-precision adaptive registration method called Multiscale Efficient Deep Closest Point (MEDPNet) and introduces a die-casting point cloud dataset, DieCastCloud, specifically designed to tackle the challenges of point cloud registration in the die-casting industry. The MEDPNet method performs coarse die-casting point cloud data registration using the Efficient-DCP method, followed by precision registration using the Multiscale feature fusion dual-channel registration (MDR) method. We enhance the modeling capability and computational efficiency of the model by replacing the attention mechanism of the Transformer in DCP with Efficient Attention and implementing a collaborative scale mechanism through the combination of serial and parallel blocks. Additionally, we propose the MDR method, which utilizes multilayer perceptrons (MLP), Normal Distributions Transform (NDT), and Iterative Closest Point (ICP) to achieve learnable adaptive fusion, enabling high-precision, scalable, and noise-resistant global point cloud registration. Our proposed method demonstrates excellent performance compared to state-of-the-art geometric and learning-based registration methods when applied to complex die-casting point cloud data. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2305.03216 [pdf, other]

Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution

Authors: Hyojoon Park, Sangeetha Grama Srinivasan, Matthew Cong, Doyub Kim, Byungsoo Kim, Jonathan Swartz, Ken Museth, Eftychios Sifakis

Abstract: We present a neural network-based simulation super-resolution framework that can efficiently and realistically enhance a facial performance produced by a low-cost, realtime physics-based simulation to a level of detail that closely approximates that of a reference-quality off-line simulator with much higher resolution (26x element count in our examples) and accurate physical modeling. Our approach… ▽ More We present a neural network-based simulation super-resolution framework that can efficiently and realistically enhance a facial performance produced by a low-cost, realtime physics-based simulation to a level of detail that closely approximates that of a reference-quality off-line simulator with much higher resolution (26x element count in our examples) and accurate physical modeling. Our approach is rooted in our ability to construct - via simulation - a training set of paired frames, from the low- and high-resolution simulators respectively, that are in semantic correspondence with each other. We use face animation as an exemplar of such a simulation domain, where creating this semantic congruence is achieved by simply dialing in the same muscle actuation controls and skeletal pose in the two simulators. Our proposed neural network super-resolution framework generalizes from this training set to unseen expressions, compensates for modeling discrepancies between the two simulations due to limited resolution or cost-cutting approximations in the real-time variant, and does not require any semantic descriptors or parameters to be provided as input, other than the result of the real-time simulation. We evaluate the efficacy of our pipeline on a variety of expressive performances and provide comparisons and ablation experiments for plausible variations and alternatives to our proposed scheme. △ Less

Submitted 9 August, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2112.15272 [pdf, ps, other]

ViNMT: Neural Machine Translation Toolkit

Authors: Nguyen Hoang Quan, Nguyen Thanh Dat, Nguyen Hoang Minh Cong, Nguyen Van Vinh, Ngo Thi Vinh, Nguyen Phuong Thai, Tran Hong Viet

Abstract: We present an open-source toolkit for neural machine translation (NMT). The new toolkit is mainly based on vaulted Transformer (Vaswani et al., 2017) along with many other improvements detailed below, in order to create a self-contained, simple to use, consistent and comprehensive framework for Machine Translation tasks of various domains. It is tooled to support both bilingual and multilingual tr… ▽ More We present an open-source toolkit for neural machine translation (NMT). The new toolkit is mainly based on vaulted Transformer (Vaswani et al., 2017) along with many other improvements detailed below, in order to create a self-contained, simple to use, consistent and comprehensive framework for Machine Translation tasks of various domains. It is tooled to support both bilingual and multilingual translation tasks, starting from building the model from respective corpora, to inferring new predictions or packaging the model to serving-capable JIT format. △ Less

Submitted 8 March, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

arXiv:2008.06680 [pdf, other]

A VCG-based Fair Incentive Mechanism for Federated Learning

Authors: Mingshu Cong, Han Yu, Xi Weng, Jiabao Qu, Yang Liu, Siu Ming Yiu

Abstract: The enduring value of the Vickrey-Clarke-Groves (VCG) mechanism has been highlighted due to its adoption by Facebook ad auctions. Our research delves into its utility in the collaborative virtual goods production (CVGP) game, which finds application in realms like federated learning and crowdsourcing, in which bidders take on the roles of suppliers rather than consumers. We introduce the Procureme… ▽ More The enduring value of the Vickrey-Clarke-Groves (VCG) mechanism has been highlighted due to its adoption by Facebook ad auctions. Our research delves into its utility in the collaborative virtual goods production (CVGP) game, which finds application in realms like federated learning and crowdsourcing, in which bidders take on the roles of suppliers rather than consumers. We introduce the Procurement-VCG (PVCG) sharing rule into existing VCG mechanisms such that they can handle capacity limits and the continuous strategy space characteristic of the reverse auction setting in CVGP games. Our main theoretical contribution provides mathematical proofs to show that PVCG is the first in the CVGP game context to simultaneously achieve truthfulness, Pareto efficiency, individual rationality, and weak budget balance. These properties suggest the potential for Pareto-efficient production in the digital planned economy. Moreover, to compute the PVCG payments in a noisy economic environment, we propose the Report-Interpolation-Maximization (RIM) method. RIM facilitates the learning of the optimal procurement level and PVCG payments through iterative interactions with suppliers. △ Less

Submitted 17 June, 2024; v1 submitted 15 August, 2020; originally announced August 2020.

arXiv:2007.14780 [pdf, other]

Optimal Procurement Auction for Cooperative Production of Virtual Products: Vickrey-Clarke-Groves Meet Cremer-McLean

Authors: Mingshu Cong, Xi Weng, Han Yu, Jiabao Qu, Siu Ming Yiu

Abstract: We set up a supply-side game-theoretic model for the cooperative production of virtual products. In our model, a group of producers collaboratively produce a virtual product by contributing costly input resources to a production coalition. Producers are capacitated, i.e., they cannot contribute more resources than their capacity limits. Our model is an abstraction of emerging internet-based busine… ▽ More We set up a supply-side game-theoretic model for the cooperative production of virtual products. In our model, a group of producers collaboratively produce a virtual product by contributing costly input resources to a production coalition. Producers are capacitated, i.e., they cannot contribute more resources than their capacity limits. Our model is an abstraction of emerging internet-based business models such as federated learning and crowd computing. To maintain an efficient and stable production coalition, the coordinator should share with producers the income brought by the virtual product. Besides the demand-side information asymmetry, another two sources of supply-side information asymmetry intertwined in this problem: 1) the capacity limit of each producer and 2) the cost incurred to each producer. In this paper, we rigorously prove that a supply-side mechanism from the VCG family, PVCG, can overcome such multiple information asymmetry and guarantee truthfulness. Furthermore, with some reasonable assumptions, PVCG simultaneously attains truthfulness, ex-post allocative efficiency, ex-post individual rationality, and ex-post weak budget balancedness on the supply side, easing the well-known tension between these four objectives in the mechanism design literature. △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:1903.00119 [pdf, other]

Local Geometric Indexing of High Resolution Data for Facial Reconstruction from Sparse Markers

Authors: Matthew Cong, Lana Lan, Ronald Fedkiw

Abstract: When considering sparse motion capture marker data, one typically struggles to balance its overfitting via a high dimensional blendshape system versus underfitting caused by smoothness constraints. With the current trend towards using more and more data, our aim is not to fit the motion capture markers with a parameterized (blendshape) model or to smoothly interpolate a surface through the marker… ▽ More When considering sparse motion capture marker data, one typically struggles to balance its overfitting via a high dimensional blendshape system versus underfitting caused by smoothness constraints. With the current trend towards using more and more data, our aim is not to fit the motion capture markers with a parameterized (blendshape) model or to smoothly interpolate a surface through the marker positions, but rather to find an instance in the high resolution dataset that contains local geometry to fit each marker. Just as is true for typical machine learning applications, this approach benefits from a plethora of data, and thus we also consider augmenting the dataset via specially designed physical simulations that target the high resolution dataset such that the simulation output lies on the same so-called manifold as the data targeted. △ Less

Submitted 4 September, 2019; v1 submitted 28 February, 2019; originally announced March 2019.

Comments: 8 pages. Includes figures which were previously redacted. Added acknowledgements section and minor changes to text

arXiv:1812.02836 [pdf, other]

High-Quality Face Capture Using Anatomical Muscles

Authors: Michael Bao, Matthew Cong, Stéphane Grabli, Ronald Fedkiw

Abstract: Muscle-based systems have the potential to provide both anatomical accuracy and semantic interpretability as compared to blendshape models; however, a lack of expressivity and differentiability has limited their impact. Thus, we propose modifying a recently developed rather expressive muscle-based system in order to make it fully-differentiable; in fact, our proposed modifications allow this physi… ▽ More Muscle-based systems have the potential to provide both anatomical accuracy and semantic interpretability as compared to blendshape models; however, a lack of expressivity and differentiability has limited their impact. Thus, we propose modifying a recently developed rather expressive muscle-based system in order to make it fully-differentiable; in fact, our proposed modifications allow this physically robust and anatomically accurate muscle model to conveniently be driven by an underlying blendshape basis. Our formulation is intuitive, natural, as well as monolithically and fully coupled such that one can differentiate the model from end to end, which makes it viable for both optimization and learning-based approaches for a variety of applications. We illustrate this with a number of examples including both shape matching of three-dimensional geometry as as well as the automatic determination of a three-dimensional facial pose from a single two-dimensional RGB image without using markers or depth information. △ Less

Submitted 6 December, 2018; originally announced December 2018.

arXiv:1307.0044 [pdf, other]

Movers and Shakers: Kinetic Energy Harvesting for the Internet of Things

Authors: Maria Gorlatova, John Sarik, Guy Grebla, Mina Cong, Ioannis Kymissis, Gil Zussman

Abstract: Numerous energy harvesting wireless devices that will serve as building blocks for the Internet of Things (IoT) are currently under development. However, there is still only limited understanding of the properties of various energy sources and their impact on energy harvesting adaptive algorithms. Hence, we focus on characterizing the kinetic (motion) energy that can be harvested by a wireless nod… ▽ More Numerous energy harvesting wireless devices that will serve as building blocks for the Internet of Things (IoT) are currently under development. However, there is still only limited understanding of the properties of various energy sources and their impact on energy harvesting adaptive algorithms. Hence, we focus on characterizing the kinetic (motion) energy that can be harvested by a wireless node with an IoT form factor and on developing energy allocation algorithms for such nodes. In this paper, we describe methods for estimating harvested energy from acceleration traces. To characterize the energy availability associated with specific human activities (e.g., relaxing, walking, cycling), we analyze a motion dataset with over 40 participants. Based on acceleration measurements that we collected for over 200 hours, we study energy generation processes associated with day-long human routines. We also briefly summarize our experiments with moving objects. We develop energy allocation algorithms that take into account practical IoT node design considerations, and evaluate the algorithms using the collected measurements. Our observations provide insights into the design of motion energy harvesters, IoT nodes, and energy harvesting adaptive algorithms. △ Less

Submitted 14 May, 2014; v1 submitted 28 June, 2013; originally announced July 2013.

Comments: 15 pages, 11 figures

arXiv:1106.4632 [pdf, other]

Inferring 3D Articulated Models for Box Packaging Robot

Authors: Heran Yang, Tiffany Low, Matthew Cong, Ashutosh Saxena

Abstract: Given a point cloud, we consider inferring kinematic models of 3D articulated objects such as boxes for the purpose of manipulating them. While previous work has shown how to extract a planar kinematic model (often represented as a linear chain), such planar models do not apply to 3D objects that are composed of segments often linked to the other segments in cyclic configurations. We present an ap… ▽ More Given a point cloud, we consider inferring kinematic models of 3D articulated objects such as boxes for the purpose of manipulating them. While previous work has shown how to extract a planar kinematic model (often represented as a linear chain), such planar models do not apply to 3D objects that are composed of segments often linked to the other segments in cyclic configurations. We present an approach for building a model that captures the relation between the input point cloud features and the object segment as well as the relation between the neighboring object segments. We use a conditional random field that allows us to model the dependencies between different segments of the object. We test our approach on inferring the kinematic structure from partial and noisy point cloud data for a wide variety of boxes including cake boxes, pizza boxes, and cardboard cartons of several sizes. The inferred structure enables our robot to successfully close these boxes by manipulating the flaps. △ Less

Submitted 23 June, 2011; originally announced June 2011.

Comments: For: RSS 2011 Workshop on Mobile Manipulation: Learning to Manipulate

Showing 1–11 of 11 results for author: Cong, M