subscribe to arXiv mailings

Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space

Authors: Mohamed Amine Ketata, Nicholas Gao, Johanna Sommer, Tom Wollschläger, Stephan Günnemann

Abstract: We introduce a new framework for molecular graph generation with 3D molecular generative models. Our Synthetic Coordinate Embedding (SyCo) framework maps molecular graphs to Euclidean point clouds via synthetic conformer coordinates and learns the inverse map using an E(n)-Equivariant Graph Neural Network (EGNN). The induced point cloud-structured latent space is well-suited to apply existing 3D m… ▽ More We introduce a new framework for molecular graph generation with 3D molecular generative models. Our Synthetic Coordinate Embedding (SyCo) framework maps molecular graphs to Euclidean point clouds via synthetic conformer coordinates and learns the inverse map using an E(n)-Equivariant Graph Neural Network (EGNN). The induced point cloud-structured latent space is well-suited to apply existing 3D molecular generative models. This approach simplifies the graph generation problem - without relying on molecular fragments nor autoregressive decoding - into a point cloud generation problem followed by node and edge classification tasks. Further, we propose a novel similarity-constrained optimization scheme for 3D diffusion models based on inpainting and guidance. As a concrete implementation of our framework, we develop EDM-SyCo based on the E(3) Equivariant Diffusion Model (EDM). EDM-SyCo achieves state-of-the-art performance in distribution learning of molecular graphs, outperforming the best non-autoregressive methods by more than 30% on ZINC250K and 16% on the large-scale GuacaMol dataset while improving conditional generation by up to 3.9 times. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.14762 [pdf, other]

Neural Pfaffians: Solving Many Many-Electron Schrödinger Equations

Authors: Nicholas Gao, Stephan Günnemann

Abstract: Neural wave functions accomplished unprecedented accuracies in approximating the ground state of many-electron systems, though at a high computational cost. Recent works proposed amortizing the cost by learning generalized wave functions across different structures and compounds instead of solving each problem independently. Enforcing the permutation antisymmetry of electrons in such generalized n… ▽ More Neural wave functions accomplished unprecedented accuracies in approximating the ground state of many-electron systems, though at a high computational cost. Recent works proposed amortizing the cost by learning generalized wave functions across different structures and compounds instead of solving each problem independently. Enforcing the permutation antisymmetry of electrons in such generalized neural wave functions remained challenging as existing methods require discrete orbital selection via non-learnable hand-crafted algorithms. This work tackles the problem by defining overparametrized, fully learnable neural wave functions suitable for generalization across molecules. We achieve this by relying on Pfaffians rather than Slater determinants. The Pfaffian allows us to enforce the antisymmetry on arbitrary electronic systems without any constraint on electronic spin configurations or molecular structure. Our empirical evaluation finds that a single neural Pfaffian calculates the ground state and ionization energies with chemical accuracy across various systems. On the TinyMol dataset, we outperform the `gold-standard' CCSD(T) CBS reference energies by 1.9m$E_h$ and reduce energy errors compared to previous generalized neural wave functions by up to an order of magnitude. △ Less

Submitted 6 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.14665 [pdf, other]

Illuminating the Unseen: Investigating the Context-induced Harms in Behavioral Sensing

Authors: Han Zhang, Vedant Das Swain, Leijie Wang, Nan Gao, Yilun Sheng, Xuhai Xu, Flora D. Salim, Koustuv Saha, Anind K. Dey, Jennifer Mankoff

Abstract: Behavioral sensing technologies are rapidly evolving across a range of well-being applications. Despite its potential, concerns about the responsible use of such technology are escalating. In response, recent research within the sensing technology has started to address these issues. While promising, they primarily focus on broad demographic categories and overlook more nuanced, context-specific i… ▽ More Behavioral sensing technologies are rapidly evolving across a range of well-being applications. Despite its potential, concerns about the responsible use of such technology are escalating. In response, recent research within the sensing technology has started to address these issues. While promising, they primarily focus on broad demographic categories and overlook more nuanced, context-specific identities. These approaches lack grounding within domain-specific harms that arise from deploying sensing technology in diverse social, environmental, and technological settings. Additionally, existing frameworks for evaluating harms are designed for a generic ML life cycle, and fail to adapt to the dynamic and longitudinal considerations for behavioral sensing technology. To address these gaps, we introduce a framework specifically designed for evaluating behavioral sensing technologies. This framework emphasizes a comprehensive understanding of context, particularly the situated identities of users and the deployment settings of the sensing technology. It also highlights the necessity for iterative harm mitigation and continuous maintenance to adapt to the evolving nature of technology and its use. We demonstrate the feasibility and generalizability of our framework through post-hoc evaluations on two real-world behavioral sensing studies conducted in different international contexts, involving varied population demographics and machine learning tasks. Our evaluations provide empirical evidence of both situated identity-based harm and more domain-specific harms, and discuss the trade-offs introduced by implementing bias mitigation techniques. △ Less

Submitted 5 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 26 pages, 8 tables, and 1 figure (excluding appendix)

MSC Class: 68U35 ACM Class: H.5.0; I.2.m

arXiv:2404.07200 [pdf, other]

Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

Authors: Shaoxiang Qin, Fuyuan Lyu, Wenhui Peng, Dingyang Geng, Ju Wang, Naiping Gao, Xue Liu, Liangzhu Leon Wang

Abstract: In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinc… ▽ More In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinct low-frequency bias, which limits FNO's effectiveness in learning high-frequency information from PDE data. To tackle this challenge, we introduce SpecBoost, an ensemble learning framework that employs multiple FNOs to better capture high-frequency information. Specifically, a secondary FNO is utilized to learn the overlooked high-frequency information from the prediction residual of the initial FNO. Experiments demonstrate that SpecBoost noticeably enhances FNO's prediction accuracy on diverse PDE applications, achieving an up to 71% improvement. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.02411 [pdf, other]

A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion

Authors: Zeyu Zhao, Nan Gao, Zhi Zeng, Guixuan Zhang, Jie Liu, Shuwu Zhang

Abstract: Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing rich editing capabilities for content creators other than high-level specialized measures like style conditioning. To resolve this, we propose a unified framew… ▽ More Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing rich editing capabilities for content creators other than high-level specialized measures like style conditioning. To resolve this, we propose a unified framework utilizing diffusion inversion that enables multi-level editing capabilities for co-speech gesture generation without re-training. The method takes advantage of two key capabilities of invertible diffusion models. The first is that through inversion, we can reconstruct the intermediate noise from gestures and regenerate new gestures from the noise. This can be used to obtain gestures with high-level similarities to the original gestures for different speech conditions. The second is that this reconstruction reduces activation caching requirements during gradient calculation, making the direct optimization on input noises possible on current hardware with limited memory. With different loss functions designed for, e.g., joint rotation or velocity, we can control various low-level details by automatically tweaking the input noises through optimization. Extensive experiments on multiple use cases show that this framework succeeds in unifying high-level and low-level co-speech gesture editing. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.10098 [pdf, other]

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

Authors: Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He

Abstract: Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in divers… ▽ More Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in diverse degraded scenes and heterogeneous domains. Specifically, the first diffusion stage aligns the restored face with spatial feature embedding of the low-quality face based on AdaIN, which synthesizes degradation-removal results but with uncontrollable artifacts for some hard cases. Based on Stage I, Stage II considers information compression using manifold information bottleneck (MIB) and finetunes the first diffusion model to improve facial fidelity. DiffMAC effectively fights against blind degradation patterns and synthesizes high-quality faces with attribute and identity consistencies. Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings. The source code and models will be public. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 15 pages, 12 figures

arXiv:2403.05249 [pdf, other]

On Representing Electronic Wave Functions with Sign Equivariant Neural Networks

Authors: Nicholas Gao, Stephan Günnemann

Abstract: Recent neural networks demonstrated impressively accurate approximations of electronic ground-state wave functions. Such neural networks typically consist of a permutation-equivariant neural network followed by a permutation-antisymmetric operation to enforce the electronic exchange symmetry. While accurate, such neural networks are computationally expensive. In this work, we explore the flipped a… ▽ More Recent neural networks demonstrated impressively accurate approximations of electronic ground-state wave functions. Such neural networks typically consist of a permutation-equivariant neural network followed by a permutation-antisymmetric operation to enforce the electronic exchange symmetry. While accurate, such neural networks are computationally expensive. In this work, we explore the flipped approach, where we first compute antisymmetric quantities based on the electronic coordinates and then apply sign equivariant neural networks to preserve the antisymmetry. While this approach promises acceleration thanks to the lower-dimensional representation, we demonstrate that it reduces to a Jastrow factor, a commonly used permutation-invariant multiplicative factor in the wave function. Our empirical results support this further, finding little to no improvements over baselines. We conclude with neither theoretical nor empirical advantages of sign equivariant functions for representing electronic wave functions within the evaluation of this work. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Published at Workshop on AI4DifferentialEquations in Science at ICLR 2024

arXiv:2401.13221 [pdf, other]

Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration

Authors: Yimin Xu, Nanxi Gao, Zhongyun Shan, Fei Chao, Rongrong Ji

Abstract: In contrast to traditional image restoration methods, all-in-one image restoration techniques are gaining increased attention for their ability to restore images affected by diverse and unknown corruption types and levels. However, contemporary all-in-one image restoration methods omit task-wise difficulties and employ the same networks to reconstruct images afflicted by diverse degradations. This… ▽ More In contrast to traditional image restoration methods, all-in-one image restoration techniques are gaining increased attention for their ability to restore images affected by diverse and unknown corruption types and levels. However, contemporary all-in-one image restoration methods omit task-wise difficulties and employ the same networks to reconstruct images afflicted by diverse degradations. This practice leads to an underestimation of the task correlations and suboptimal allocation of computational resources. To elucidate task-wise complexities, we introduce a novel concept positing that intricate image degradation can be represented in terms of elementary degradation. Building upon this foundation, we propose an innovative approach, termed the Unified-Width Adaptive Dynamic Network (U-WADN), consisting of two pivotal components: a Width Adaptive Backbone (WAB) and a Width Selector (WS). The WAB incorporates several nested sub-networks with varying widths, which facilitates the selection of the most apt computations tailored to each task, thereby striking a balance between accuracy and computational efficiency during runtime. For different inputs, the WS automatically selects the most appropriate sub-network width, taking into account both task-specific and sample-specific complexities. Extensive experiments across a variety of image restoration tasks demonstrate that the proposed U-WADN achieves better performance while simultaneously reducing up to 32.3\% of FLOPs and providing approximately 15.7\% real-time acceleration. The code has been made available at \url{https://github.com/xuyimin0926/U-WADN}. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2311.15496 [pdf, ps, other]

Critiquing Self-report Practices for Human Mental and Wellbeing Computing at Ubicomp

Authors: Nan Gao, Soundariya Ananthan, Chun Yu, Yuntao Wang, Flora D. Salim

Abstract: Computing human mental and wellbeing is crucial to various domains, including health, education, and entertainment. However, the reliance on self-reporting in traditional research to establish ground truth often leads to methodological inconsistencies and susceptibility to response biases, thus hindering the effectiveness of modelling. This paper presents the first systematic methodological review… ▽ More Computing human mental and wellbeing is crucial to various domains, including health, education, and entertainment. However, the reliance on self-reporting in traditional research to establish ground truth often leads to methodological inconsistencies and susceptibility to response biases, thus hindering the effectiveness of modelling. This paper presents the first systematic methodological review of self-reporting practices in Ubicomp within the context of human mental and wellbeing computing. Drawing from existing survey research, we establish guidelines for self-reporting in human wellbeing studies and identify shortcomings in current practices at Ubicomp community. Furthermore, we explore the reliability of self-report as a means of ground truth and propose directions for improving ground truth measurement in this field. Ultimately, we emphasize the urgent need for methodological advancements to enhance human mental and wellbeing computing. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.05457 [pdf, other]

Automated Mobile Sensing Strategies Generation for Human Behaviour Understanding

Authors: Nan Gao, Zhuolei Yu, Chun Yu, Yuntao Wang, Flora D. Salim, Yuanchun Shi

Abstract: Mobile sensing plays a crucial role in generating digital traces to understand human daily lives. However, studying behaviours like mood or sleep quality in smartphone users requires carefully designed mobile sensing strategies such as sensor selection and feature construction. This process is time-consuming, burdensome, and requires expertise in multiple domains. Furthermore, the resulting sensin… ▽ More Mobile sensing plays a crucial role in generating digital traces to understand human daily lives. However, studying behaviours like mood or sleep quality in smartphone users requires carefully designed mobile sensing strategies such as sensor selection and feature construction. This process is time-consuming, burdensome, and requires expertise in multiple domains. Furthermore, the resulting sensing framework lacks generalizability, making it difficult to apply to different scenarios. To address these challenges, we propose an automated mobile sensing strategy for human behaviour understanding. First, we establish a knowledge base and consolidate rules for effective feature construction, data collection, and model selection. Then, we introduce the multi-granular human behaviour representation and design procedures for leveraging large language models to generate strategies. Our approach is validated through blind comparative studies and usability evaluation. Ultimately, our approach holds the potential to revolutionise the field of mobile sensing and its applications. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.13304 [pdf, other]

"Living Within Four Walls": Exploring Emotional and Social Dynamics in Mobile Usage During Home Confinement

Authors: Nan Gao, Sam Nolan, Kaixin Ji, Shakila Khan Rumi, Judith Simone Heinisch, Christoph Anderson, Klaus David, Flora D. Salim

Abstract: Home confinement, a situation experienced by individuals for reasons ranging from medical quarantines, rehabilitation needs, disability accommodations, and remote working, is a common yet impactful aspect of modern life. While essential in various scenarios, confinement within the home environment can profoundly influence psychological well-being and digital device usage. In this study, we delve i… ▽ More Home confinement, a situation experienced by individuals for reasons ranging from medical quarantines, rehabilitation needs, disability accommodations, and remote working, is a common yet impactful aspect of modern life. While essential in various scenarios, confinement within the home environment can profoundly influence psychological well-being and digital device usage. In this study, we delve into these effects, utilising the COVID-19 lockdown as a special case study to draw insights extending to various homebound situations. We conducted an in-situ study with 32 participants living in states affected by COVID-19 lockdowns for three weeks and analysed their emotions, well-being, social roles, and mobile usage behaviours. We extracted user activity from app usage records in an unsupervised manner, and experimental results revealed that app usage behaviours are effective indicators of emotional well-being in confined environments. Our research has great potential for developing supportive strategies and remote programs, not only for people facing similar medical isolation situations, but also for individuals in long-term home confinement, such as those with chronic illnesses, recovering from surgery, or adapting to permanent remote work arrangements. △ Less

Submitted 8 June, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.07736 [pdf, other]

RIS-Assisted Wireless Link Signatures for Specific Emitter Identification

Authors: Ning Gao, Shuchen Meng, Cen Li, Shengguo Meng, Wankai Tang, Shi Jin, Michail Matthaiou

Abstract: The physical layer authentication (PLA) is a promising technology which can enhance the access security of a massive number of devices in the near future. In this paper, we propose a reconfigurable intelligent surface (RIS)-assisted PLA system, in which the legitimate transmitter can customize the channel fingerprints during PLA by controlling the ON-OFF state of the RIS. Without loss of generalit… ▽ More The physical layer authentication (PLA) is a promising technology which can enhance the access security of a massive number of devices in the near future. In this paper, we propose a reconfigurable intelligent surface (RIS)-assisted PLA system, in which the legitimate transmitter can customize the channel fingerprints during PLA by controlling the ON-OFF state of the RIS. Without loss of generality, we use the received signal strength (RSS) based spoofing detection approach to analyze the feasibility of the proposed architecture. Specifically, based on the RSS, we derive the statistical properties of PLA and give some interesting insights, which showcase that the RIS-assisted PLA is theoretically feasible. Then, we derive the optimal detection threshold to maximize the performance in the context of the presented performance metrics. Next, the actual feasibility of the proposed system is verified via proof-of-concept experiments on a RIS-assisted PLA prototype platform. The experiment results show that there are 3.5% and 76% performance improvements when the transmission sources are at different locations and at the same location, respectively. △ Less

Submitted 7 March, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.16528 [pdf, other]

SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects

Authors: Ning Gao, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann

Abstract: To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive… ▽ More To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object using only a small number of cluttered reference images. Unlike existing methods, SA6D does not require object-centric reference images or any additional object information, making it a more generalizable and scalable solution across categories. We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods, particularly in cluttered scenes with occlusions, while requiring fewer reference images. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Journal ref: Conference on Robot Learning (CoRL), 2023

arXiv:2308.11369 [pdf, other]

Enhancing Interpretable Object Abstraction via Clustering-based Slot Initialization

Authors: Ning Gao, Bernard Hohmann, Gerhard Neumann

Abstract: Object-centric representations using slots have shown the advances towards efficient, flexible and interpretable abstraction from low-level perceptual features in a compositional scene. Current approaches randomize the initial state of slots followed by an iterative refinement. As we show in this paper, the random slot initialization significantly affects the accuracy of the final slot prediction.… ▽ More Object-centric representations using slots have shown the advances towards efficient, flexible and interpretable abstraction from low-level perceptual features in a compositional scene. Current approaches randomize the initial state of slots followed by an iterative refinement. As we show in this paper, the random slot initialization significantly affects the accuracy of the final slot prediction. Moreover, current approaches require a predetermined number of slots from prior knowledge of the data, which limits the applicability in the real world. In our work, we initialize the slot representations with clustering algorithms conditioned on the perceptual input features. This requires an additional layer in the architecture to initialize the slots given the identified clusters. We design permutation invariant and permutation equivariant versions of this layer to enable the exchangeable slot representations after clustering. Additionally, we employ mean-shift clustering to automatically identify the number of slots for a given scene. We evaluate our method on object discovery and novel view synthesis tasks with various datasets. The results show that our method outperforms prior works consistently, especially for complex scenes. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Journal ref: The 34th British Machine Vision Conference (BMVC), 2023

arXiv:2307.08946 [pdf, other]

EsaNet: Environment Semantics Enabled Physical Layer Authentication

Authors: Ning Gao, Qiying Huang, Cen Li, Shi Jin, Michail Matthaiou

Abstract: Wireless networks are vulnerable to physical layer spoofing attacks due to the wireless broadcast nature, thus, integrating communications and security (ICAS) is urgently needed for 6G endogenous security. In this letter, we propose an environment semantics enabled physical layer authentication network based on deep learning, namely EsaNet, to authenticate the spoofing from the underlying wireless… ▽ More Wireless networks are vulnerable to physical layer spoofing attacks due to the wireless broadcast nature, thus, integrating communications and security (ICAS) is urgently needed for 6G endogenous security. In this letter, we propose an environment semantics enabled physical layer authentication network based on deep learning, namely EsaNet, to authenticate the spoofing from the underlying wireless protocol. Specifically, the frequency independent wireless channel fingerprint (FiFP) is extracted from the channel state information (CSI) of a massive multi-input multi-output (MIMO) system based on environment semantics knowledge. Then, we transform the received signal into a two-dimensional red green blue (RGB) image and apply the you only look once (YOLO), a single-stage object detection network, to quickly capture the FiFP. Next, a lightweight classification network is designed to distinguish the legitimate from the illegitimate users. Finally, the experimental results show that the proposed EsaNet can effectively detect physical layer spoofing attacks and is robust in time-varying wireless environments. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.08423 [pdf, other]

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science. △ Less

Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.15670 [pdf, other]

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Authors: Haoyi Jiang, Tianheng Cheng, Naiyu Gao, Haoyang Zhang, Tianwei Lin, Wenyu Liu, Xinggang Wang

Abstract: `3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves… ▽ More `3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves into the integration of instance queries to orchestrate 2D-to-3D reconstruction and 3D scene modeling. Leveraging our proposed Serial Instance-Propagated Attentions, Symphonies dynamically encodes instance-centric semantics, facilitating intricate interactions between image-based and volumetric domains. Simultaneously, Symphonies enables holistic scene comprehension by capturing context through the efficient fusion of instance queries, alleviating geometric ambiguity such as occlusion and perspective errors through contextual scene reasoning. Experimental results demonstrate that Symphonies achieves state-of-the-art performance on challenging benchmarks SemanticKITTI and SSCBench-KITTI-360, yielding remarkable mIoU scores of 15.04 and 18.58, respectively. These results showcase the paradigm's promising advancements. The code is available at https://github.com/hustvl/Symphonies. △ Less

Submitted 22 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Technical report. Code and models at: https://github.com/hustvl/Symphonies

arXiv:2306.14916 [pdf, other]

Uncertainty Estimation for Molecules: Desiderata and Methods

Authors: Tom Wollschläger, Nicholas Gao, Bertrand Charpentier, Mohamed Amine Ketata, Stephan Günnemann

Abstract: Graph Neural Networks (GNNs) are promising surrogates for quantum mechanical calculations as they establish unprecedented low errors on collections of molecular dynamics (MD) trajectories. Thanks to their fast inference times they promise to accelerate computational chemistry applications. Unfortunately, despite low in-distribution (ID) errors, such GNNs might be horribly wrong for out-of-distribu… ▽ More Graph Neural Networks (GNNs) are promising surrogates for quantum mechanical calculations as they establish unprecedented low errors on collections of molecular dynamics (MD) trajectories. Thanks to their fast inference times they promise to accelerate computational chemistry applications. Unfortunately, despite low in-distribution (ID) errors, such GNNs might be horribly wrong for out-of-distribution (OOD) samples. Uncertainty estimation (UE) may aid in such situations by communicating the model's certainty about its prediction. Here, we take a closer look at the problem and identify six key desiderata for UE in molecular force fields, three 'physics-informed' and three 'application-focused' ones. To overview the field, we survey existing methods from the field of UE and analyze how they fit to the set desiderata. By our analysis, we conclude that none of the previous works satisfies all criteria. To fill this gap, we propose Localized Neural Kernel (LNK) a Gaussian Process (GP)-based extension to existing GNNs satisfying the desiderata. In our extensive experimental evaluation, we test four different UE with three different backbones and two datasets. In out-of-equilibrium detection, we find LNK yielding up to 2.5 and 2.1 times lower errors in terms of AUC-ROC score than dropout or evidential regression-based methods while maintaining high predictive performance. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Published as conference paper at ICML 2023

arXiv:2305.15817 [pdf, other]

Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term

Authors: Yun Yue, Jiadi Jiang, Zhiling Ye, Ning Gao, Yongchao Liu, Ke Zhang

Abstract: Deep Neural Networks (DNNs) generalization is known to be closely related to the flatness of minima, leading to the development of Sharpness-Aware Minimization (SAM) for seeking flatter minima and better generalization. In this paper, we revisit the loss of SAM and propose a more general method, called WSAM, by incorporating sharpness as a regularization term. We prove its generalization bound thr… ▽ More Deep Neural Networks (DNNs) generalization is known to be closely related to the flatness of minima, leading to the development of Sharpness-Aware Minimization (SAM) for seeking flatter minima and better generalization. In this paper, we revisit the loss of SAM and propose a more general method, called WSAM, by incorporating sharpness as a regularization term. We prove its generalization bound through the combination of PAC and Bayes-PAC techniques, and evaluate its performance on various public datasets. The results demonstrate that WSAM achieves improved generalization, or is at least highly competitive, compared to the vanilla optimizer, SAM and its variants. The code is available at https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers. △ Less

Submitted 9 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 10 pages. Accepted as a conference paper at KDD '23

arXiv:2305.08604 [pdf, other]

A Survey of Blockchain and Artificial Intelligence for 6G Wireless Communications

Authors: Yiping Zuo, Jiajia Guo, Ning Gao, Yongxu Zhu, Shi Jin, Xiao Li

Abstract: The research on the sixth-generation (6G) wireless communications for the development of future mobile communication networks has been officially launched around the world. 6G networks face multifarious challenges, such as resource-constrained mobile devices, difficult wireless resource management, high complexity of heterogeneous network architectures, explosive computing and storage requirements… ▽ More The research on the sixth-generation (6G) wireless communications for the development of future mobile communication networks has been officially launched around the world. 6G networks face multifarious challenges, such as resource-constrained mobile devices, difficult wireless resource management, high complexity of heterogeneous network architectures, explosive computing and storage requirements, privacy and security threats. To address these challenges, deploying blockchain and artificial intelligence (AI) in 6G networks may realize new breakthroughs in advancing network performances in terms of security, privacy, efficiency, cost, and more. In this paper, we provide a detailed survey of existing works on the application of blockchain and AI to 6G wireless communications. More specifically, we start with a brief overview of blockchain and AI. Then, we mainly review the recent advances in the fusion of blockchain and AI, and highlight the inevitable trend of deploying both blockchain and AI in wireless communications. Furthermore, we extensively explore integrating blockchain and AI for wireless communication systems, involving secure services and Internet of Things (IoT) smart applications. Particularly, some of the most talked-about key services based on blockchain and AI are introduced, such as spectrum management, computation allocation, content caching, and security and privacy. Moreover, we also focus on some important IoT smart applications supported by blockchain and AI, covering smart healthcare, smart transportation, smart grid, and unmanned aerial vehicles (UAVs). We also analyze the open issues and research challenges for the joint deployment of blockchain and AI in 6G wireless communications. Lastly, based on lots of existing meaningful works, this paper aims to provide a comprehensive survey of blockchain and AI in 6G networks. △ Less

Submitted 7 September, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.03708 [pdf, other]

Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

Authors: Gongning Luo, Kuanquan Wang, Jun Liu, Shuo Li, Xinjie Liang, Xiangyu Li, Shaowei Gan, Wei Wang, Suyu Dong, Wenyi Wang, Pengxin Yu, Enyou Liu, Hongrong Wei, Na Wang, Jia Guo, Huiqi Li, Zhao Zhang, Ziwei Zhao, Na Gao, Nan An, Ashkan Pakzad, Bojidar Rangelov, Jiaqi Dou, Song Tian, Zeyu Liu , et al. (5 additional authors not shown)

Abstract: Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi… ▽ More Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.13013 [pdf, other]

doi 10.1109/LRA.2024.3359544

GesGPT: Speech Gesture Synthesis With Text Parsing from ChatGPT

Authors: Nan Gao, Zeyu Zhao, Zhi Zeng, Shuwu Zhang, Dongdong Weng, Yihua Bao

Abstract: Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter,… ▽ More Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models , such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures. △ Less

Submitted 27 May, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

Journal ref: IEEE Robotics and Automation Letters 9 (2024) 3

arXiv:2303.04791 [pdf, other]

Ewald-based Long-Range Message Passing for Molecular Graphs

Authors: Arthur Kosmala, Johannes Gasteiger, Nicholas Gao, Stephan Günnemann

Abstract: Neural architectures that learn potential energy surfaces from molecular data have undergone fast improvement in recent years. A key driver of this success is the Message Passing Neural Network (MPNN) paradigm. Its favorable scaling with system size partly relies upon a spatial distance limit on messages. While this focus on locality is a useful inductive bias, it also impedes the learning of long… ▽ More Neural architectures that learn potential energy surfaces from molecular data have undergone fast improvement in recent years. A key driver of this success is the Message Passing Neural Network (MPNN) paradigm. Its favorable scaling with system size partly relies upon a spatial distance limit on messages. While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method. It can serve as an augmentation on top of existing MPNN architectures as it is computationally inexpensive and agnostic to architectural details. We test the approach with four baseline models and two datasets containing diverse periodic (OC20) and aperiodic structures (OE62). We observe robust improvements in energy mean absolute errors across all models and datasets, averaging 10% on OC20 and 16% on OE62. Our analysis shows an outsize impact of these improvements on structures with high long-range contributions to the ground truth energy. △ Less

Submitted 6 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Published at the 40th International Conference on Machine Learning (ICML 2023)

arXiv:2302.04168 [pdf, other]

Generalizing Neural Wave Functions

Authors: Nicholas Gao, Stephan Günnemann

Abstract: Recent neural network-based wave functions have achieved state-of-the-art accuracies in modeling ab-initio ground-state potential energy surface. However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave f… ▽ More Recent neural network-based wave functions have achieved state-of-the-art accuracies in modeling ab-initio ground-state potential energy surface. However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave functions to different molecules. Globe learns representations of local electronic structures that generalize across molecules via spatial message passing by connecting molecular orbitals to covalent bonds. Further, we propose a size-consistent wave function Ansatz, the Molecular orbital network (Moon), tailored to jointly solve Schrödinger equations of different molecules. In our experiments, we find Moon converging in 4.5 times fewer steps to similar accuracy as previous methods or to lower energies given the same time. Further, our analysis shows that Moon's energy estimate scales additively with increased system sizes, unlike previous work where we observe divergence. In both computational chemistry and machine learning, we are the first to demonstrate that a single wave function can solve the Schrödinger equation of molecules with different atoms jointly. △ Less

Submitted 31 May, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: Published at the 40th International Conference on Machine Learning (ICML 2023)

arXiv:2301.01882 [pdf, other]

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Authors: Fei He, Haoyang Zhang, Naiyu Gao, Jian Jia, Yanhu Shan, Xin Zhao, Kaiqi Huang

Abstract: Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this… ▽ More Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: NeurIPS 2022

arXiv:2212.03396 [pdf, other]

Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling

Authors: Yifei Zhang, Neng Gao, Cunqing Ma

Abstract: Prototype-based interpretability methods provide intuitive explanations of model prediction by comparing samples to a reference set of memorized exemplars or typical representatives in terms of similarity. In the field of sequential data modeling, similarity calculations of prototypes are usually based on encoded representation vectors. However, due to highly recursive functions, there is usually… ▽ More Prototype-based interpretability methods provide intuitive explanations of model prediction by comparing samples to a reference set of memorized exemplars or typical representatives in terms of similarity. In the field of sequential data modeling, similarity calculations of prototypes are usually based on encoded representation vectors. However, due to highly recursive functions, there is usually a non-negligible disparity between the prototype-based explanations and the original input. In this work, we propose a Self-Explaining Selective Model (SESM) that uses a linear combination of prototypical concepts to explain its own predictions. The model employs the idea of case-based reasoning by selecting sub-sequences of the input that mostly activate different concepts as prototypical parts, which users can compare to sub-sequences selected from different example inputs to understand model decisions. For better interpretability, we design multiple constraints including diversity, stability, and locality as training objectives. Extensive experiments in different domains demonstrate that our method exhibits promising interpretability and competitive accuracy. △ Less

Submitted 16 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: To be appeared in AAAI 2023

arXiv:2212.01461 [pdf, other]

Learning Disentangled Label Representations for Multi-label Classification

Authors: Jian Jia, Fei He, Naiyu Gao, Xiaotang Chen, Kaiqi Huang

Abstract: Although various methods have been proposed for multi-label classification, most approaches still follow the feature learning mechanism of the single-label (multi-class) classification, namely, learning a shared image feature to classify multiple labels. However, we find this One-shared-Feature-for-Multiple-Labels (OFML) mechanism is not conducive to learning discriminative label features and make… ▽ More Although various methods have been proposed for multi-label classification, most approaches still follow the feature learning mechanism of the single-label (multi-class) classification, namely, learning a shared image feature to classify multiple labels. However, we find this One-shared-Feature-for-Multiple-Labels (OFML) mechanism is not conducive to learning discriminative label features and makes the model non-robustness. For the first time, we mathematically prove that the inferiority of the OFML mechanism is that the optimal learned image feature cannot maintain high similarities with multiple classifiers simultaneously in the context of minimizing cross-entropy loss. To address the limitations of the OFML mechanism, we introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning (DLFL) framework to learn a disentangled representation for each label. The specificity of the framework lies in a feature disentangle module, which contains learnable semantic queries and a Semantic Spatial Cross-Attention (SSCA) module. Specifically, learnable semantic queries maintain semantic consistency between different images of the same label. The SSCA module localizes the label-related spatial regions and aggregates located region features into the corresponding label feature to achieve feature disentanglement. We achieve state-of-the-art performance on eight datasets of three tasks, \ie, multi-label classification, pedestrian attribute recognition, and continual multi-label learning. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 17 pages, 9 figures

arXiv:2210.04747 [pdf, other]

An NLoS-based Enhanced Sensing Method for MmWave Communication System

Authors: Shiwen He, Kangli Cai, Shiyue Huang, Zhenyu Anz, Wei Huang, Ning Gao

Abstract: The millimeter-wave (mmWave)-based Wi-Fi sensing technology has recently attracted extensive attention since it provides a possibility to realize higher sensing accuracy. However, current works mainly concentrate on sensing scenarios where the line-of-sight (LoS) path exists, which significantly limits their applications. To address the problem, we propose an enhanced mmWave sensing algorithm in t… ▽ More The millimeter-wave (mmWave)-based Wi-Fi sensing technology has recently attracted extensive attention since it provides a possibility to realize higher sensing accuracy. However, current works mainly concentrate on sensing scenarios where the line-of-sight (LoS) path exists, which significantly limits their applications. To address the problem, we propose an enhanced mmWave sensing algorithm in the 3D non-line-of-sight environment (mm3NLoS), aiming to sense the direction and distance of the target when the LoS path is weak or blocked. Specifically, we first adopt the directional beam to estimate the azimuth/elevation angle of arrival (AoA) and angle of departure (AoD) of the reflection path. Then, the distance of the related path is measured by the fine timing measurement protocol. Finally, we transform the AoA and AoD of the multiple non-line-of-sight (NLoS) paths into the direction vector and then obtain the information of targets based on the geometric relationship. The simulation results demonstrate that mm3NLoS can achieve a centimeter-level error with a 2m spacing. Compared to the prior work, it can significantly reduce the performance degradation under the NLoS condition. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2210.02337 [pdf, other]

When Physical Layer Key Generation Meets RIS: Opportunities, Challenges, and Road Ahead

Authors: Ning Gao, Yu Han, Nannan Li, Shi Jin, Michail Matthaiou

Abstract: Physical layer key generation (PLKG) is a promising technology to obtain symmetric keys between a pair of wireless communication users in a plug-and-play manner. The shared entropy source almost entirely comes from the intrinsic randomness of the radio channel, which is highly dependent on the wireless environments. However, in some static/block fading wireless environments, the intrinsic randomne… ▽ More Physical layer key generation (PLKG) is a promising technology to obtain symmetric keys between a pair of wireless communication users in a plug-and-play manner. The shared entropy source almost entirely comes from the intrinsic randomness of the radio channel, which is highly dependent on the wireless environments. However, in some static/block fading wireless environments, the intrinsic randomness of the wireless channel is hard to be guaranteed. Very recently, thanks to reconfigurable intelligent surfaces (RISs) with their excellent ability on electromagnetic wave control, the wireless channel environment can be customized. In this article, we overview the RISaided PLKG in static indoor environments, including its channel model and hardware architectures. Then, we propose potential application scenarios and analyze the design challenges of RIS aided PLKG, including channel reciprocity, RIS reconfiguration speed and RIS deployment via proof-of-concept experiments on a RIS-aided PLKG prototype system. In particular, our experimental results show that the key generation rate is 15-fold higher than that without RIS in a static indoor environment. Next, we design a RIS jamming attack via a prototype experiment and discuss its possible attack-defense countermeasures. Finally, several conclusions and future directions are identified. △ Less

Submitted 3 July, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2208.06210 [pdf, other]

doi 10.1103/PhysRevLett.130.170201

Measuring incompatibility and clustering quantum observables with a quantum switch

Authors: Ning Gao, Dantong Li, Anchit Mishra, Junchen Yan, Kyrylo Simonov, Giulio Chiribella

Abstract: The existence of incompatible observables is a cornerstone of quantum mechanics and a valuable resource in quantum technologies. Here we introduce a measure of incompatibility, called the mutual eigenspace disturbance (MED), which quantifies the amount of disturbance induced by the measurement of a sharp observable on the eigenspaces of another. The MED provides a metric on the space of von Neuman… ▽ More The existence of incompatible observables is a cornerstone of quantum mechanics and a valuable resource in quantum technologies. Here we introduce a measure of incompatibility, called the mutual eigenspace disturbance (MED), which quantifies the amount of disturbance induced by the measurement of a sharp observable on the eigenspaces of another. The MED provides a metric on the space of von Neumann measurements, and can be efficiently estimated by letting the measurement processes act in an indefinite order, using a setup known as the quantum switch, which also allows one to quantify the noncommutativity of arbitrary quantum processes. Thanks to these features, the MED can be used in quantum machine learning tasks. We demonstrate this application by providing an unsupervised algorithm that clusters unknown von Neumann measurements. Our algorithm is robust to noise can be used to identify groups of observers that share approximately the same measurement context. △ Less

Submitted 9 May, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

Comments: 13 pages, 2 figures

Journal ref: Phys. Rev. Lett. 130, 170201 (2023)

arXiv:2207.10959 [pdf, other]

doi 10.1609/aaai.v36i1.19965

QueryProp: Object Query Propagation for High-Performance Video Object Detection

Authors: Fei He, Naiyu Gao, Jian Jia, Xin Zhao, Kaiqi Huang

Abstract: Video object detection has been an important yet challenging topic in computer vision. Traditional methods mainly focus on designing the image-level or box-level feature propagation strategies to exploit temporal information. This paper argues that with a more effective and efficient feature propagation framework, video object detectors can gain improvement in terms of both accuracy and speed. For… ▽ More Video object detection has been an important yet challenging topic in computer vision. Traditional methods mainly focus on designing the image-level or box-level feature propagation strategies to exploit temporal information. This paper argues that with a more effective and efficient feature propagation framework, video object detectors can gain improvement in terms of both accuracy and speed. For this purpose, this paper studies object-level feature propagation, and proposes an object query propagation (QueryProp) framework for high-performance video object detection. The proposed QueryProp contains two propagation strategies: 1) query propagation is performed from sparse key frames to dense non-key frames to reduce the redundant computation on non-key frames; 2) query propagation is performed from previous key frames to the current key frame to improve feature representation by temporal context modeling. To further facilitate query propagation, an adaptive propagation gate is designed to achieve flexible key frame selection. We conduct extensive experiments on the ImageNet VID dataset. QueryProp achieves comparable accuracy with state-of-the-art methods and strikes a decent accuracy/speed trade-off. Code is available at https://github.com/hf1995/QueryProp. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: This paper is accepted to AAAI2022

arXiv:2207.03405 [pdf, other]

Investigating the Effects of Mood & Usage Behaviour on Notification Response Time

Authors: Judith S. Heinisch, Nan Gao, Christoph Anderson, Shohreh Deldari, Klaus David, Flora Salim

Abstract: Notifications are one of the most prevailing mechanisms on smartphones and personal computers to convey timely and important information. Despite these benefits, smartphone notifications demand individuals' attention and can cause stress and frustration when delivered at inopportune timings. This paper investigates the effect of individuals' smartphone usage behavior and mood on notification respo… ▽ More Notifications are one of the most prevailing mechanisms on smartphones and personal computers to convey timely and important information. Despite these benefits, smartphone notifications demand individuals' attention and can cause stress and frustration when delivered at inopportune timings. This paper investigates the effect of individuals' smartphone usage behavior and mood on notification response time. We conduct an in-the-wild study with more than 18 participants for five weeks. Extensive experiment results show that the proposed regression model is able to accurately predict the response time of smartphone notifications using current user's mood and physiological signals. We explored the effect of different features for each participant to choose the most important user-oriented features in order to to achieve a meaningful and personalised notification response prediction. On average, our regression model achieved over all participants an MAE of 0.7764 ms and RMSE of 1.0527 ms. We also investigate how physiological signals (collected from E4 wristbands) are used as an indicator for mood and discuss the individual differences in application usage and categories of smartphone applications on the response time of notifications. Our research sheds light on the future intelligent notification management system. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.07162 [pdf, other]

Category-Agnostic 6D Pose Estimation with Conditional Neural Processes

Authors: Yumeng Li, Ning Gao, Hanna Ziesche, Gerhard Neumann

Abstract: We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an… ▽ More We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an encoder to capture texture and geometry of an object in a latent representation, based on very few RGB-D images and ground-truth keypoints. The latent representation is then used by a simultaneously meta-trained decoder to predict the 6D pose of the object in new images. Furthermore, we propose a novel geometry-aware decoder for the keypoint prediction using a Graph Neural Network (GNN), which explicitly takes geometric constraints specific to each object into consideration. To evaluate our algorithm, extensive experiments are conducted on the \linemod dataset, and on our new fully-annotated synthetic datasets generated from Multiple Categories in Multiple Scenes (MCMS). Experimental results demonstrate that our model performs well on unseen objects with very different shapes and appearances. Remarkably, our model also shows robust performance on occluded scenes although trained fully on data without occlusion. To our knowledge, this is the first work exploring \textbf{cross-category level} 6D pose estimation. △ Less

Submitted 19 October, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted at CVPR2022 workshop: Women in Computer Vision (WiCV)

Journal ref: CVPR2022 workshop: Women in Computer Vision (WiCV)

arXiv:2206.00468 [pdf, other]

PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation

Authors: Naiyu Gao, Fei He, Jian Jia, Yanhu Shan, Haoyang Zhang, Xin Zhao, Kaiqi Huang

Abstract: This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these t… ▽ More This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these two tasks, thus failing to exploit handy instance-level semantic cues to boost depth accuracy while also producing sub-optimal depth maps. To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks. Specifically, instead of predicting depth for all pixels at a time, we generate instance-specific kernels to predict depth and segmentation masks for each instance. Moreover, leveraging the instance-wise depth estimation scheme, we add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show the effectiveness and promise of our method. We hope our unified solution to DPS can lead a new paradigm in this area. Code is available at https://github.com/NaiyuGao/PanopticDepth. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: CVPR2022

arXiv:2205.14962 [pdf, other]

Sampling-free Inference for Ab-Initio Potential Energy Surface Networks

Authors: Nicholas Gao, Stephan Günnemann

Abstract: Recently, it has been shown that neural networks not only approximate the ground-state wave functions of a single molecular system well but can also generalize to multiple geometries. While such generalization significantly speeds up training, each energy evaluation still requires Monte Carlo integration which limits the evaluation to a few geometries. In this work, we address the inference shortc… ▽ More Recently, it has been shown that neural networks not only approximate the ground-state wave functions of a single molecular system well but can also generalize to multiple geometries. While such generalization significantly speeds up training, each energy evaluation still requires Monte Carlo integration which limits the evaluation to a few geometries. In this work, we address the inference shortcomings by proposing the Potential learning from ab-initio Networks (PlaNet) framework, in which we simultaneously train a surrogate model in addition to the neural wave function. At inference time, the surrogate avoids expensive Monte-Carlo integration by directly estimating the energy, accelerating the process from hours to milliseconds. In this way, we can accurately model high-resolution multi-dimensional energy surfaces for larger systems that previously were unobtainable via neural wave functions. Finally, we explore an additional inductive bias by introducing physically-motivated restricted neural wave function models. We implement such a function with several additional improvements in the new PESNet++ model. In our experimental evaluation, PlaNet accelerates inference by 7 orders of magnitude for larger molecules like ethanol while preserving accuracy. Compared to previous energy surface networks, PESNet++ reduces energy errors by up to 74%. △ Less

Submitted 6 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Published as a conference paper at ICLR 2023

arXiv:2205.11110 [pdf, other]

Meta-Learning Regrasping Strategies for Physical-Agnostic Objects

Authors: Ning Gao, Jingyu Zhang, Ruijie Chen, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann

Abstract: Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth im… ▽ More Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth images. ConDex efficiently acquires physical embeddings from limited trials, enabling precise grasping point estimation. Furthermore, ConDex is capable of updating the predicted grasping quality iteratively from new trials in an online fashion. To the best of our knowledge, we are the first who generate two object datasets focusing on inhomogeneous physical properties with varying mass distributions and friction coefficients. Extensive evaluations in simulation demonstrate ConDex's superior performance over DexNet-2.0 and existing meta-learning-based grasping pipelines. Furthermore, ConDex shows robust generalization to previously unseen real-world objects despite training solely in the simulation. The synthetic and real-world datasets will be published as well. △ Less

Submitted 14 September, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted as spotlight in ICRA 2022 Workshop: Scaling Robot Learning

arXiv:2205.07646 [pdf, other]

A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Authors: Liang Huang, Senjie Liang, Feiyang Ye, Nan Gao

Abstract: Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast A… ▽ More Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 9 pages, 4 figures

arXiv:2203.04905 [pdf, other]

What Matters For Meta-Learning Vision Regression Tasks?

Authors: Ning Gao, Hanna Ziesche, Ngo Anh Vien, Michael Volpp, Gerhard Neumann

Abstract: Meta-learning is widely used in few-shot classification and function regression due to its ability to quickly adapt to unseen tasks. However, it has not yet been well explored on regression tasks with high dimensional inputs such as images. This paper makes two main contributions that help understand this barely explored area. \emph{First}, we design two new types of cross-category level vision re… ▽ More Meta-learning is widely used in few-shot classification and function regression due to its ability to quickly adapt to unseen tasks. However, it has not yet been well explored on regression tasks with high dimensional inputs such as images. This paper makes two main contributions that help understand this barely explored area. \emph{First}, we design two new types of cross-category level vision regression tasks, namely object discovery and pose estimation of unprecedented complexity in the meta-learning domain for computer vision. To this end, we (i) exhaustively evaluate common meta-learning techniques on these tasks, and (ii) quantitatively analyze the effect of various deep learning techniques commonly used in recent meta-learning algorithms in order to strengthen the generalization capability: data augmentation, domain randomization, task augmentation and meta-regularization. Finally, we (iii) provide some insights and practical recommendations for training meta-learning algorithms on vision regression tasks. \emph{Second}, we propose the addition of functional contrastive learning (FCL) over the task representations in Conditional Neural Processes (CNPs) and train in an end-to-end fashion. The experimental results show that the results of prior work are misleading as a consequence of a poor choice of the loss function as well as too small meta-training sets. Specifically, we find that CNPs outperform MAML on most tasks without fine-tuning. Furthermore, we observe that naive task augmentation without a tailored design results in underfitting. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted at CVPR 2022

arXiv:2112.12342 [pdf, other]

doi 10.1145/3550335

Individual and Group-wise Classroom Seating Experience: Effects on Student Engagement in Different Courses

Authors: Nan Gao, Mohammad Saiedur Rahaman, Wei Shao, Kaixin Ji, Flora D. Salim

Abstract: Seating location in the classroom can affect student engagement, attention and academic performance by providing better visibility, improved movement, and participation in discussions. Existing studies typically explore how traditional seating arrangements (e.g. grouped tables or traditional rows) influence students' perceived engagement, without considering group seating behaviours under more fle… ▽ More Seating location in the classroom can affect student engagement, attention and academic performance by providing better visibility, improved movement, and participation in discussions. Existing studies typically explore how traditional seating arrangements (e.g. grouped tables or traditional rows) influence students' perceived engagement, without considering group seating behaviours under more flexible seating arrangements. Furthermore, survey-based measures of student engagement are prone to subjectivity and various response bias. Therefore, in this research, we investigate how individual and group-wise classroom seating experiences affect student engagement using wearable physiological sensors. We conducted a field study at a high school and collected survey and wearable data from 23 students in 10 courses over four weeks. We aim to answer the following research questions: 1. How does the seating proximity between students relate to their perceived learning engagement? 2. How do students' group seating behaviours relate to their physiologically-based measures of engagement (i.e. physiological arousal and physiological synchrony)? Experiment results indicate that the individual and group-wise classroom seating experience is associated with perceived student engagement and physiologically-based engagement measured from electrodermal activity. We also find that students who sit close together are more likely to have similar learning engagement and tend to have high physiological synchrony. This research opens up opportunities to explore the implications of flexible seating arrangements and has great potential to maximize student engagement by suggesting intelligent seating choices in the future. △ Less

Submitted 23 July, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: The manuscript has been accepted by IMWUT

Journal ref: IMWUT. 6(3), 1-23 (2022)

arXiv:2110.05064 [pdf, other]

Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions

Authors: Nicholas Gao, Stephan Günnemann

Abstract: Solving the Schrödinger equation is key to many quantum mechanical properties. However, an analytical solution is only tractable for single-electron systems. Recently, neural networks succeeded at modeling wave functions of many-electron systems. Together with the variational Monte-Carlo (VMC) framework, this led to solutions on par with the best known classical methods. Still, these neural method… ▽ More Solving the Schrödinger equation is key to many quantum mechanical properties. However, an analytical solution is only tractable for single-electron systems. Recently, neural networks succeeded at modeling wave functions of many-electron systems. Together with the variational Monte-Carlo (VMC) framework, this led to solutions on par with the best known classical methods. Still, these neural methods require tremendous amounts of computational resources as one has to train a separate model for each molecular geometry. In this work, we combine a Graph Neural Network (GNN) with a neural wave function to simultaneously solve the Schrödinger equation for multiple geometries via VMC. This enables us to model continuous subsets of the potential energy surface with a single training pass. Compared to existing state-of-the-art networks, our Potential Energy Surface Network PESNet speeds up training for multiple geometries by up to 40 times while matching or surpassing their accuracy. This may open the path to accurate and orders of magnitude cheaper quantum mechanical calculations. △ Less

Submitted 29 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: Published as a conference paper at ICLR 2022

arXiv:2107.00389 [pdf, other]

Investigating the Reliability of Self-report Data in the Wild: The Quest for Ground Truth

Authors: Nan Gao, Mohammad Saiedur Rahaman, Wei Shao, Flora D. Salim

Abstract: Inferring human mental state (e.g., emotion, depression, engagement) with sensing technology is one of the most valuable challenges in the affective computing area, which has a profound impact in all industries interacting with humans. The self-report survey is the most common way to quantify how people think, but prone to subjectivity and various responses bias. It is usually used as the ground t… ▽ More Inferring human mental state (e.g., emotion, depression, engagement) with sensing technology is one of the most valuable challenges in the affective computing area, which has a profound impact in all industries interacting with humans. The self-report survey is the most common way to quantify how people think, but prone to subjectivity and various responses bias. It is usually used as the ground truth for human mental state prediction. In recent years, many data-driven machine learning models are built based on self-report annotations as the target value. In this research, we investigate the reliability of self-report surveys in the wild by studying the confidence level of responses and survey completion time. We conduct a case study (i.e., student engagement inference) by recruiting 23 students in a high school setting over a period of 4 weeks. Our participants volunteered 488 self-reported responses and data from their wearable sensors. We also find the physiologically measured student engagement and perceived student engagement are not always consistent. The findings from this research have great potential to benefit future studies in predicting engagement, depression, stress, and other emotion-related states in the field of affective computing and sensing technologies. △ Less

Submitted 29 November, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

arXiv:2105.06637 [pdf, other]

Understanding occupants' behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables

Authors: Nan Gao, Max Marschall, Jane Burry, Simon Watkins, Flora D. Salim

Abstract: We conducted a field study at a K-12 private school in the suburbs of Melbourne, Australia. The data capture contained two elements: First, a 5-month longitudinal field study In-Gauge using two outdoor weather stations, as well as indoor weather stations in 17 classrooms and temperature sensors on the vents of occupant-controlled room air-conditioners; these were collated into individual datasets… ▽ More We conducted a field study at a K-12 private school in the suburbs of Melbourne, Australia. The data capture contained two elements: First, a 5-month longitudinal field study In-Gauge using two outdoor weather stations, as well as indoor weather stations in 17 classrooms and temperature sensors on the vents of occupant-controlled room air-conditioners; these were collated into individual datasets for each classroom at a 5-minute logging frequency, including additional data on occupant presence. The dataset was used to derive predictive models of how occupants operate room air-conditioning units. Second, we tracked 23 students and 6 teachers in a 4-week cross-sectional study En-Gage, using wearable sensors to log physiological data, as well as daily surveys to query the occupants' thermal comfort, learning engagement, emotions and seating behaviours. Overall, the combined dataset could be used to analyse the relationships between indoor/outdoor climates and students' behaviours/mental states on campus, which provide opportunities for the future design of intelligent feedback systems to benefit both students and staff. △ Less

Submitted 22 April, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: This paper has been accepted by Nature Scientific Data. The link for the datasets: https://rmit.figshare.com/articles/dataset/In-Gauge_and_En-Gage_Datasets/14578908

arXiv:2105.00950 [pdf, other]

3-D Deployment of UAV Swarm for Massive MIMO Communications

Authors: Ning Gao, Xiao Li, Shi Jin, Michail Matthaiou

Abstract: We consider the uplink transmission between a multi-antenna ground station and an unmanned aerial vehicle (UAV) swarm. The UAVs are assumed as intelligent agents, which can explore their optimal three dimensional (3-D) deployment to maximize the channel capacity of the multiple input multiple output (MIMO) system. Specifically, considering the limitations of each UAV in accessing the global inform… ▽ More We consider the uplink transmission between a multi-antenna ground station and an unmanned aerial vehicle (UAV) swarm. The UAVs are assumed as intelligent agents, which can explore their optimal three dimensional (3-D) deployment to maximize the channel capacity of the multiple input multiple output (MIMO) system. Specifically, considering the limitations of each UAV in accessing the global information of the network, we focus on a decentralized control strategy by noting that each UAV in the swarm can only utilize the local information to achieve the optimal 3-D deployment. In this case, the optimization problem can be divided into several optimization sub-problems with respect to the rank function. Due to the non-convex nature of the rank function and the fact that the optimization sub-problems are coupled, the original problem is NP-hard and, thus, cannot be solved with standard convex optimization solvers. Interestingly, we can relax the constraint condition of each sub-problem and solve the optimization problem by a formulated UAVs channel capacity maximization game. We analyze such game according to the designed reward function and the potential function. Then, we discuss the existence of the pure Nash equilibrium in the game. To achieve the best Nash equilibrium of the MIMO system, we develop a decentralized learning algorithm, namely decentralized UAVs channel capacity learning. The details of the algorithm are provided, and then, the convergence, the effectiveness and the computational complexity are analyzed, respectively. Moreover, we give some insightful remarks based on the proofs and the theoretical analysis. Also, extensive simulations illustrate that the developed learning algorithm can achieve a high MIMO channel capacity by optimizing the 3-D UAV swarm deployment with the local information. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2009.13342 [pdf, other]

doi 10.1109/TIP.2021.3090522

Learning Category- and Instance-Aware Pixel Embedding for Fast Panoptic Segmentation

Authors: Naiyu Gao, Yanhu Shan, Xin Zhao, Kaiqi Huang

Abstract: Panoptic segmentation (PS) is a complex scene understanding task that requires providing high-quality segmentation for both thing objects and stuff regions. Previous methods handle these two classes with semantic and instance segmentation modules separately, following with heuristic fusion or additional modules to resolve the conflicts between the two outputs. This work simplifies this pipeline of… ▽ More Panoptic segmentation (PS) is a complex scene understanding task that requires providing high-quality segmentation for both thing objects and stuff regions. Previous methods handle these two classes with semantic and instance segmentation modules separately, following with heuristic fusion or additional modules to resolve the conflicts between the two outputs. This work simplifies this pipeline of PS by consistently modeling the two classes with a novel PS framework, which extends a detection model with an extra module to predict category- and instance-aware pixel embedding (CIAE). CIAE is a novel pixel-wise embedding feature that encodes both semantic-classification and instance-distinction information. At the inference process, PS results are simply derived by assigning each pixel to a detected instance or a stuff class according to the learned embedding. Our method not only demonstrates fast inference speed but also the first one-stage method to achieve comparable performance to two-stage methods on the challenging COCO benchmark. △ Less

Submitted 15 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2008.08903 [pdf, other]

doi 10.1145/3474838

Generative Adversarial Networks for Spatio-temporal Data: A Survey

Authors: Nan Gao, Hao Xue, Wei Shao, Sichen Zhao, Kyle Kai Qin, Arian Prabowo, Mohammad Saiedur Rahaman, Flora D. Salim

Abstract: Generative Adversarial Networks (GANs) have shown remarkable success in producing realistic-looking images in the computer vision area. Recently, GAN-based techniques are shown to be promising for spatio-temporal-based applications such as trajectory prediction, events generation and time-series data imputation. While several reviews for GANs in computer vision have been presented, no one has cons… ▽ More Generative Adversarial Networks (GANs) have shown remarkable success in producing realistic-looking images in the computer vision area. Recently, GAN-based techniques are shown to be promising for spatio-temporal-based applications such as trajectory prediction, events generation and time-series data imputation. While several reviews for GANs in computer vision have been presented, no one has considered addressing the practical applications and challenges relevant to spatio-temporal data. In this paper, we have conducted a comprehensive review of the recent developments of GANs for spatio-temporal data. We summarise the application of popular GAN architectures for spatio-temporal data and the common practices for evaluating the performance of spatio-temporal applications with GANs. Finally, we point out future research directions to benefit researchers in this area. △ Less

Submitted 29 July, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

Comments: This paper has been accepted by ACM Transactions on Intelligent Systems and Technology (TIST)

arXiv:2007.04831 [pdf, other]

n-Gage: Predicting in-class Emotional, Behavioural and Cognitive Engagement in the Wild

Authors: Nan Gao, Wei Shao, Mohammad Saiedur Rahaman, Flora D. Salim

Abstract: The study of student engagement has attracted growing interests to address problems such as low academic performance, disaffection, and high dropout rates. Existing approaches to measuring student engagement typically rely on survey-based instruments. While effective, those approaches are time-consuming and labour-intensive. Meanwhile, both the response rate and quality of the survey are usually p… ▽ More The study of student engagement has attracted growing interests to address problems such as low academic performance, disaffection, and high dropout rates. Existing approaches to measuring student engagement typically rely on survey-based instruments. While effective, those approaches are time-consuming and labour-intensive. Meanwhile, both the response rate and quality of the survey are usually poor. As an alternative, in this paper, we investigate whether we can infer and predict engagement at multiple dimensions, just using sensors. We hypothesize that multidimensional student engagement can be translated into physiological responses and activity changes during the class, and also be affected by the environmental changes. Therefore, we aim to explore the following questions: Can we measure the multiple dimensions of high school student's learning engagement including emotional, behavioural and cognitive engagement with sensing data in the wild? Can we derive the activity, physiological, and environmental factors contributing to the different dimensions of student engagement? If yes, which sensors are the most useful in differentiating each dimension of the engagement? Then, we conduct an in-situ study in a high school from 23 students and 6 teachers in 144 classes over 11 courses for 4 weeks. We present the n-Gage, a student engagement sensing system using a combination of sensors from wearables and environments to automatically detect student in-class multidimensional learning engagement. Experiment results show that n-Gage can accurately predict multidimensional student engagement in real-world scenarios with an average MAE of 0.788 and RMSE of 0.975 using all the sensors. We also show a set of interesting findings of how different factors (e.g., combinations of sensors, school subjects, CO2 level) affect each dimension of the student learning engagement. △ Less

Submitted 22 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: This paper has been accepted by the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) volume 4 issue 3, 2020

arXiv:2006.12631 [pdf, other]

Fast and Flexible Temporal Point Processes with Triangular Maps

Authors: Oleksandr Shchur, Nicholas Gao, Marin Biloš, Stephan Günnemann

Abstract: Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP -- a new class of non-recurrent… ▽ More Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP -- a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel. TriTPP matches the flexibility of RNN-based methods but permits orders of magnitude faster sampling. This enables us to use the new model for variational inference in continuous-time discrete-state systems. We demonstrate the advantages of the proposed framework on synthetic and real-world datasets. △ Less

Submitted 10 November, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:2006.07680 [pdf, other]

doi 10.1145/3394486.3403138

High-Dimensional Similarity Search with Quantum-Assisted Variational Autoencoder

Authors: Nicholas Gao, Max Wilson, Thomas Vandal, Walter Vinci, Ramakrishna Nemani, Eleanor Rieffel

Abstract: Recent progress in quantum algorithms and hardware indicates the potential importance of quantum computing in the near future. However, finding suitable application areas remains an active area of research. Quantum machine learning is touted as a potential approach to demonstrate quantum advantage within both the gate-model and the adiabatic schemes. For instance, the Quantum-assisted Variational… ▽ More Recent progress in quantum algorithms and hardware indicates the potential importance of quantum computing in the near future. However, finding suitable application areas remains an active area of research. Quantum machine learning is touted as a potential approach to demonstrate quantum advantage within both the gate-model and the adiabatic schemes. For instance, the Quantum-assisted Variational Autoencoder has been proposed as a quantum enhancement to the discrete VAE. We extend on previous work and study the real-world applicability of a QVAE by presenting a proof-of-concept for similarity search in large-scale high-dimensional datasets. While exact and fast similarity search algorithms are available for low dimensional datasets, scaling to high-dimensional data is non-trivial. We show how to construct a space-efficient search index based on the latent space representation of a QVAE. Our experiments show a correlation between the Hamming distance in the embedded space and the Euclidean distance in the original space on the Moderate Resolution Imaging Spectroradiometer (MODIS) dataset. Further, we find real-world speedups compared to linear search and demonstrate memory-efficient scaling to half a billion data points. △ Less

Submitted 13 June, 2020; originally announced June 2020.

arXiv:2005.14260 [pdf]

doi 10.1007/s11661-020-06008-4

Overview: Computer vision and machine learning for microstructural characterization and analysis

Authors: Elizabeth A. Holm, Ryan Cohn, Nan Gao, Andrew R. Kitahara, Thomas P. Matson, Bo Lei, Srujana Rao Yarasi

Abstract: The characterization and analysis of microstructure is the foundation of microstructural science, connecting the materials structure to its composition, process history, and properties. Microstructural quantification traditionally involves a human deciding a priori what to measure and then devising a purpose-built method for doing so. However, recent advances in data science, including computer vi… ▽ More The characterization and analysis of microstructure is the foundation of microstructural science, connecting the materials structure to its composition, process history, and properties. Microstructural quantification traditionally involves a human deciding a priori what to measure and then devising a purpose-built method for doing so. However, recent advances in data science, including computer vision (CV) and machine learning (ML) offer new approaches to extracting information from microstructural images. This overview surveys CV approaches to numerically encode the visual information contained in a microstructural image, which then provides input to supervised or unsupervised ML algorithms that find associations and trends in the high-dimensional image representation. CV/ML systems for microstructural characterization and analysis span the taxonomy of image analysis tasks, including image classification, semantic segmentation, object detection, and instance segmentation. These tools enable new approaches to microstructural analysis, including the development of new, rich visual metrics and the discovery of processing-microstructure-property relationships. △ Less

Submitted 28 May, 2020; originally announced May 2020.

Comments: submitted to Materials and Metallurgical Transactions A

arXiv:2004.14382 [pdf, other]

Transfer Learning for Thermal Comfort Prediction in Multiple Cities

Authors: Nan Gao, Wei Shao, Mohammad Saiedur Rahaman, Jun Zhai, Klaus David, Flora D. Salim

Abstract: HVAC (Heating, Ventilation and Air Conditioning) system is an important part of a building, which constitutes up to 40% of building energy usage. The main purpose of HVAC, maintaining appropriate thermal comfort, is crucial for the best utilisation of energy usage. Besides, thermal comfort is also crucial for well-being, health, and work productivity. Recently, data-driven thermal comfort models h… ▽ More HVAC (Heating, Ventilation and Air Conditioning) system is an important part of a building, which constitutes up to 40% of building energy usage. The main purpose of HVAC, maintaining appropriate thermal comfort, is crucial for the best utilisation of energy usage. Besides, thermal comfort is also crucial for well-being, health, and work productivity. Recently, data-driven thermal comfort models have got better performance than traditional knowledge-based methods (e.g. Predicted Mean Vote Model). An accurate thermal comfort model requires a large amount of self-reported thermal comfort data from indoor occupants which undoubtedly remains a challenge for researchers. In this research, we aim to tackle this data-shortage problem and boost the performance of thermal comfort prediction. We utilise sensor data from multiple cities in the same climate zone to learn thermal comfort patterns. We present a transfer learning based multilayer perceptron model from the same climate zone (TL-MLP-C*) for accurate thermal comfort prediction. Extensive experimental results on ASHRAE RP-884, the Scales Project and Medium US Office datasets show that the performance of the proposed TL-MLP-C* exceeds the state-of-the-art methods in accuracy, precision and F1-score. △ Less

Submitted 20 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Showing 1–50 of 57 results for author: Gao, N