subscribe to arXiv mailings

Observation Time Difference: an Online Dynamic Objects Removal Method for Ground Vehicles

Authors: Rongguang Wu, Chenglin Pang, Xuankang Wu, Zheng Fang

Abstract: In the process of urban environment mapping, the sequential accumulations of dynamic objects will leave a large number of traces in the map. These traces will usually have bad influences on the localization accuracy and navigation performance of the robot. Therefore, dynamic objects removal plays an important role for creating clean map. However, conventional dynamic objects removal methods usuall… ▽ More In the process of urban environment mapping, the sequential accumulations of dynamic objects will leave a large number of traces in the map. These traces will usually have bad influences on the localization accuracy and navigation performance of the robot. Therefore, dynamic objects removal plays an important role for creating clean map. However, conventional dynamic objects removal methods usually run offline. That is, the map is reprocessed after it is constructed, which undoubtedly increases additional time costs. To tackle the problem, this paper proposes a novel method for online dynamic objects removal for ground vehicles. According to the observation time difference between the object and the ground where it is located, dynamic objects are classified into two types: suddenly appear and suddenly disappear. For these two kinds of dynamic objects, we propose downward retrieval and upward retrieval methods to eliminate them respectively. We validate our method on SemanticKITTI dataset and author-collected dataset with highly dynamic objects. Compared with other state-of-the-art methods, our method is more efficient and robust, and reduces the running time per frame by more than 60$\%$ on average. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.13243 [pdf, ps, other]

Abelian Group Codes for Classical and Classical-Quantum Channels: One-shot and Asymptotic Rate Bounds

Authors: James Chin-Jen Pang, Sandeep Pradhan, Hessam Mahdavifar

Abstract: We study the problem of transmission of information over classical and classical-quantum channels in the one-shot regime where the underlying codes are constrained to be group codes. In the achievability part, we introduce a new input probability distribution that incorporates the encoding homomorphism and the underlying channel law. Using a random coding argument, we characterize the performance… ▽ More We study the problem of transmission of information over classical and classical-quantum channels in the one-shot regime where the underlying codes are constrained to be group codes. In the achievability part, we introduce a new input probability distribution that incorporates the encoding homomorphism and the underlying channel law. Using a random coding argument, we characterize the performance of group codes in terms of hypothesis testing relative-entropic quantities. In the converse part, we establish bounds by leveraging a hypothesis testing-based approach. Furthermore, we apply the one-shot result to the asymptotic stationary memoryless setting, and establish a single-letter lower bound on group capacities for both classes of channels. Moreover, we derive a matching upper bound on the asymptotic group capacity. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 41 pages

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.04113 [pdf, other]

Uncovering Limitations of Large Language Models in Information Seeking from Tables

Authors: Chaoxu Pang, Yixuan Cao, Chunhao Yang, Ping Luo

Abstract: Tables are recognized for their high information density and widespread usage, serving as essential sources of information. Seeking information from tables (TIS) is a crucial capability for Large Language Models (LLMs), serving as the foundation of knowledge-based Q&A systems. However, this field presently suffers from an absence of thorough and reliable evaluation. This paper introduces a more re… ▽ More Tables are recognized for their high information density and widespread usage, serving as essential sources of information. Seeking information from tables (TIS) is a crucial capability for Large Language Models (LLMs), serving as the foundation of knowledge-based Q&A systems. However, this field presently suffers from an absence of thorough and reliable evaluation. This paper introduces a more reliable benchmark for Table Information Seeking (TabIS). To avoid the unreliable evaluation caused by text similarity-based metrics, TabIS adopts a single-choice question format (with two options per question) instead of a text generation format. We establish an effective pipeline for generating options, ensuring their difficulty and quality. Experiments conducted on 12 LLMs reveal that while the performance of GPT-4-turbo is marginally satisfactory, both other proprietary and open-source models perform inadequately. Further analysis shows that LLMs exhibit a poor understanding of table structures, and struggle to balance between TIS performance and robustness against pseudo-relevant tables (common in retrieval-augmented systems). These findings uncover the limitations and potential challenges of LLMs in seeking information from tables. We release our data and code to facilitate further research in this field. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Findings of ACL 2024

arXiv:2405.12376 [pdf]

A silicon photonics waveguide-coupled colloidal quantum dot photodiode sensitive beyond 1.6 um

Authors: Chao Pang, Yu-Hao Deng, Ezat Kheradmand, Luis Moreno Hagelsieb, Yujie Guo, David Cheyns, Pieter Geiregat, Zeger Hens, Dries Van Thourhout

Abstract: Silicon photonics faces a persistent challenge in extending photodetection capabilities beyond the 1.6 um wavelength range, primarily due to the lack of appropriate epitaxial materials. Colloidal quantum dots (QDs) present a promising solution here, offering distinct advantages such as infrared wavelength tunability, cost-effectiveness, and facile deposition. Their unique properties position them… ▽ More Silicon photonics faces a persistent challenge in extending photodetection capabilities beyond the 1.6 um wavelength range, primarily due to the lack of appropriate epitaxial materials. Colloidal quantum dots (QDs) present a promising solution here, offering distinct advantages such as infrared wavelength tunability, cost-effectiveness, and facile deposition. Their unique properties position them as a potential candidate for enabling photodetection in silicon photonics beyond the conventional telecom wavelength, thereby expanding the potential applications and capabilities within this domain. In this study, we have successfully integrated lead sulfide (PbS) colloidal quantum dot photodiodes (QDPDs) onto silicon waveguides using standard process techniques. The integrated photodiodes exhibit a remarkable responsivity of 1.3 A/W (with an external quantum efficiency of 74.8%) at a wavelength of 2.1 um, a low dark current of only 106 nA and a bandwidth of 1.1 MHz under a -3 V bias. To demonstrate the scalability of our integration approach, we have developed a compact 8-channel spectrometer incorporating an array of QDPDs. This achievement marks a significant step toward realizing a cost-effective photodetector solution for silicon photonics, particularly tailored for a wide range of sensing applications around the 2 um wavelength range. △ Less

Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.07765 [pdf, other]

TANQ: An open domain dataset of table answered questions

Authors: Mubashara Akhtar, Chenxi Pang, Andreea Marzoca, Yasemin Altun, Julian Martin Eisenschlos

Abstract: Language models, potentially augmented with tool usage such as retrieval are becoming the go-to means of answering questions. Understanding and answering questions in real-world settings often requires retrieving information from different sources, processing and aggregating data to extract insights, and presenting complex findings in form of structured artifacts such as novel tables, charts, or i… ▽ More Language models, potentially augmented with tool usage such as retrieval are becoming the go-to means of answering questions. Understanding and answering questions in real-world settings often requires retrieving information from different sources, processing and aggregating data to extract insights, and presenting complex findings in form of structured artifacts such as novel tables, charts, or infographics. In this paper, we introduce TANQ, the first open domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging behind human performance by 19.7 points. We analyse baselines' performance across different dataset attributes such as different skills required for this task, including multi-hop reasoning, math operations, and unit conversions. We further discuss common failures in model-generated answers, suggesting that TANQ is a complex task with many challenges ahead. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2404.16157 [pdf, ps, other]

Convergence of stochastic integrals with applications to transport equations and conservation laws with noise

Authors: Kenneth H. Karlsen, Peter H. C. Pang

Abstract: Convergence of stochastic integrals driven by Wiener processes $W_n$, with $W_n \to W$ almost surely in $C_t$, is crucial in analyzing SPDEs. Our focus is on the convergence of the form $\int_0^T V_n\, \mathrm{d} W_n \to \int_0^T V\, \mathrm{d} W$, where $\{V_n\}$ is bounded in $L^p(Ω\times [0,T];X)$ for a Banach space $X$ and some finite $p > 2$. This is challenging when $V_n$ converges to $V$ we… ▽ More Convergence of stochastic integrals driven by Wiener processes $W_n$, with $W_n \to W$ almost surely in $C_t$, is crucial in analyzing SPDEs. Our focus is on the convergence of the form $\int_0^T V_n\, \mathrm{d} W_n \to \int_0^T V\, \mathrm{d} W$, where $\{V_n\}$ is bounded in $L^p(Ω\times [0,T];X)$ for a Banach space $X$ and some finite $p > 2$. This is challenging when $V_n$ converges to $V$ weakly in the temporal variable. We supply convergence results to handle stochastic integral limits when strong temporal convergence is lacking. A key tool is a uniform mean $L^1$ time translation estimate on $V_n$, an estimate that is easily verified in many SPDEs. However, this estimate alone does not guarantee strong compactness of $(ω,t)\mapsto V_n(ω,t)$. Our findings, especially pertinent to equations exhibiting singular behavior, are substantiated by establishing several stability results for stochastic transport equations and conservation laws. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 31 pages

MSC Class: Primary: 60H15; 60G46; Secondary: 60F25

arXiv:2403.20213 [pdf, other]

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

Authors: Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

Abstract: The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack… ▽ More The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack of large-scale, high-quality RS vision-language datasets. We constructed HqDC-1.4M, the large scale High quality and Detailed Captions for RS images, containing 1.4 million image-caption pairs, which not only enhance the RSVLM's understanding of RS images but also significantly improve the model's spatial perception abilities, such as localization and counting, thereby increasing the helpfulness of the RSVLM. Moreover, to address the inevitable "hallucination" problem in RSVLM, we developed RSSA, the first dataset aimed at enhancing the Self-Awareness capability of RSVLMs. By incorporating a variety of unanswerable questions into typical RS visual question-answering tasks, RSSA effectively improves the truthfulness and reduces the hallucinations of the model's outputs, thereby enhancing the honesty of the RSVLM. Based on these datasets, we proposed the H2RSVLM, the Helpful and Honest Remote Sensing Vision Language Model. H2RSVLM has achieved outstanding performance on multiple RS public datasets and is capable of recognizing and refusing to answer the unanswerable questions, effectively mitigating the incorrect generations. We will release the code, data and model weights at https://github.com/opendatalab/H2RSVLM . △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Equal contribution: Chao Pang, Jiang Wu; Corresponding author: Gui-Song Xia, Conghui He

arXiv:2403.19427 [pdf]

Dynamic Phase Enabled Topological Mode Steering in Composite Su-Schrieffer-Heeger Waveguide Arrays

Authors: Min Tang, Chi Pang, Christian N. Saggau, Haiyun Dong, Ching Hua Lee, Ronny Thomale, Sebastian Klembt, Ion Cosma Fulga, Jeroen Van Den Brink, Yana Vaynzof, Oliver G. Schmidt, Jiawei Wang, Libo Ma

Abstract: Topological boundary states localize at interfaces whenever the interface implies a change of the associated topological invariant encoded in the geometric phase. The generically present dynamic phase, however, which is energy and time dependent, has been known to be non-universal, and hence not to intertwine with any topological geometric phase. Using the example of topological zero modes in comp… ▽ More Topological boundary states localize at interfaces whenever the interface implies a change of the associated topological invariant encoded in the geometric phase. The generically present dynamic phase, however, which is energy and time dependent, has been known to be non-universal, and hence not to intertwine with any topological geometric phase. Using the example of topological zero modes in composite Su-Schrieffer-Heeger (c-SSH) waveguide arrays with a central defect, we report on the selective excitation and transition of topological boundary mode based on dynamic phase-steered interferences. Our work thus provides a new knob for the control and manipulation of topological states in composite photonic devices, indicating promising applications where topological modes and their bandwidth can be jointly controlled by the dynamic phase, geometric phase, and wavelength in on-chip topological devices. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.19386 [pdf, ps, other]

The viscous variational wave equation with transport noise

Authors: Peter H. C. Pang

Abstract: This article considers the variational wave equation with viscosity and transport noise as a system of three coupled nonlinear stochastic partial differential equations. We prove pathwise global existence, uniqueness, and temporal continuity of solutions to this system in $L^2_x$. Martingale solutions are extracted from a two-level Galerkin approximation via the Skorokhod--Jakubowski theorem. We u… ▽ More This article considers the variational wave equation with viscosity and transport noise as a system of three coupled nonlinear stochastic partial differential equations. We prove pathwise global existence, uniqueness, and temporal continuity of solutions to this system in $L^2_x$. Martingale solutions are extracted from a two-level Galerkin approximation via the Skorokhod--Jakubowski theorem. We use the apparatus of Dudley maps to streamline this stochastic compactness method, bypassing the usual martingale identification argument. Pathwise uniqueness for the system is established through a renormalisation procedure that involves double commutator estimates and a delicate handling of noise and nonlinear terms. New model-specific commutator estimates are proven. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 40 pages

MSC Class: Primary: 35R60; 35F55; Secondary: 35D30

arXiv:2402.18132 [pdf, other]

Understanding the Role of Pathways in a Deep Neural Network

Authors: Lei Lyu, Chen Pang, Jihua Wang

Abstract: Deep neural networks have demonstrated superior performance in artificial intelligence applications, but the opaqueness of their inner working mechanism is one major drawback in their application. The prevailing unit-based interpretation is a statistical observation of stimulus-response data, which fails to show a detailed internal process of inherent mechanisms of neural networks. In this work, w… ▽ More Deep neural networks have demonstrated superior performance in artificial intelligence applications, but the opaqueness of their inner working mechanism is one major drawback in their application. The prevailing unit-based interpretation is a statistical observation of stimulus-response data, which fails to show a detailed internal process of inherent mechanisms of neural networks. In this work, we analyze a convolutional neural network (CNN) trained in the classification task and present an algorithm to extract the diffusion pathways of individual pixels to identify the locations of pixels in an input image associated with object classes. The pathways allow us to test the causal components which are important for classification and the pathway-based representations are clearly distinguishable between categories. We find that the few largest pathways of an individual pixel from an image tend to cross the feature maps in each layer that is important for classification. And the large pathways of images of the same category are more consistent in their trends than those of different categories. We also apply the pathways to understanding adversarial attacks, object completion, and movement perception. Further, the total number of pathways on feature maps in all layers can clearly discriminate the original, deformed, and target samples. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.14380 [pdf, other]

doi 10.1609/aaai.v38i5.28240

RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation

Authors: Changsong Pang, Xieyuanli Chen, Yimin Liu, Huimin Lu, Yuwei Cheng

Abstract: Moving object segmentation (MOS) and Ego velocity estimation (EVE) are vital capabilities for mobile systems to achieve full autonomy. Several approaches have attempted to achieve MOSEVE using a LiDAR sensor. However, LiDAR sensors are typically expensive and susceptible to adverse weather conditions. Instead, millimeter-wave radar (MWR) has gained popularity in robotics and autonomous driving for… ▽ More Moving object segmentation (MOS) and Ego velocity estimation (EVE) are vital capabilities for mobile systems to achieve full autonomy. Several approaches have attempted to achieve MOSEVE using a LiDAR sensor. However, LiDAR sensors are typically expensive and susceptible to adverse weather conditions. Instead, millimeter-wave radar (MWR) has gained popularity in robotics and autonomous driving for real applications due to its cost-effectiveness and resilience to bad weather. Nonetheless, publicly available MOSEVE datasets and approaches using radar data are limited. Some existing methods adopt point convolutional networks from LiDAR-based approaches, ignoring the specific artifacts and the valuable radial velocity information of radar measurements, leading to suboptimal performance. In this paper, we propose a novel transformer network that effectively addresses the sparsity and noise issues and leverages the radial velocity measurements of radar points using our devised radar self- and cross-attention mechanisms. Based on that, our method achieves accurate EVE of the robot and performs MOS using only radar data simultaneously. To thoroughly evaluate the MOSEVE performance of our method, we annotated the radar points in the public View-of-Delft (VoD) dataset and additionally constructed a new radar dataset in various environments. The experimental results demonstrate the superiority of our approach over existing state-of-the-art methods. The code is available at https://github.com/ORCA-Uboat/RadarMOSEVE. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted at AAAI-24

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence.38(2024)4424-4432

arXiv:2402.10629 [pdf, ps, other]

M1 Radiative and spin-nonflip $ππ$ transitions of $B_c$ states in the Cornell potential model

Authors: Zhi-bin Gao, Yan-yue Fan, Hao Chen, Cheng-qun Pang

Abstract: In this paper, we mainly predict the rates of M1 radiative and spin-nonflip $ππ$ transitions of $B_{c}$-meson under the non-relativistic Cornell potential model with a screening potential effect. We employ the numerical wave function to determine the M1 radiative transition widths of $B_c$ excited states and utilize the Kuang-Yan proposed method for the spin-nonflip $ππ$ transitions among $B_c$ st… ▽ More In this paper, we mainly predict the rates of M1 radiative and spin-nonflip $ππ$ transitions of $B_{c}$-meson under the non-relativistic Cornell potential model with a screening potential effect. We employ the numerical wave function to determine the M1 radiative transition widths of $B_c$ excited states and utilize the Kuang-Yan proposed method for the spin-nonflip $ππ$ transitions among $B_c$ states. Our theoretical results are valuable for studying the M1 radiative and spin-nonflip $ππ$ transition processes of $B_c$ states in experiments. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 4 figures. arXiv admin note: text overlap with arXiv:2205.05950 by other authors

arXiv:2402.04400 [pdf, other]

CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines

Authors: Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Elise L. Minto, Jason Patterson, Linying Zhang, George Hripcsak, Gamze Gürsoy, Noémie Elhadad, Karthik Natarajan

Abstract: Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabula… ▽ More Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format. △ Less

Submitted 5 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.10752 [pdf, other]

HiCD: Change Detection in Quality-Varied Images via Hierarchical Correlation Distillation

Authors: Chao Pang, Xingxing Weng, Jiang Wu, Qiang Wang, Gui-Song Xia

Abstract: Advanced change detection techniques primarily target image pairs of equal and high quality. However, variations in imaging conditions and platforms frequently lead to image pairs with distinct qualities: one image being high-quality, while the other being low-quality. These disparities in image quality present significant challenges for understanding image pairs semantically and extracting change… ▽ More Advanced change detection techniques primarily target image pairs of equal and high quality. However, variations in imaging conditions and platforms frequently lead to image pairs with distinct qualities: one image being high-quality, while the other being low-quality. These disparities in image quality present significant challenges for understanding image pairs semantically and extracting change features, ultimately resulting in a notable decline in performance. To tackle this challenge, we introduce an innovative training strategy grounded in knowledge distillation. The core idea revolves around leveraging task knowledge acquired from high-quality image pairs to guide the model's learning process when dealing with image pairs that exhibit differences in quality. Additionally, we develop a hierarchical correlation distillation approach (involving self-correlation, cross-correlation, and global correlation). This approach compels the student model to replicate the correlations inherent in the teacher model, rather than focusing solely on individual features. This ensures effective knowledge transfer while maintaining the student model's training flexibility. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: accepted by TGRS

arXiv:2401.09085 [pdf]

3D orientation super-resolution spatial-frequency-shift microscopy

Authors: Xiaowei Liu, Mingwei Tang, Ning Zhou, Chenlei Pang, Zhong Wen, Xu Liu, Qing Yang

Abstract: Super-resolution mapping of the 3D orientation of fluorophores reveals the alignment of biological structures where the fluorophores are tightly attached, and thus plays a vital role in studying the organization and dynamics of bio-complexes. However, current super-resolution imaging techniques are either limited to 2D orientation mapping or suffer from slow speed and the requirement of special la… ▽ More Super-resolution mapping of the 3D orientation of fluorophores reveals the alignment of biological structures where the fluorophores are tightly attached, and thus plays a vital role in studying the organization and dynamics of bio-complexes. However, current super-resolution imaging techniques are either limited to 2D orientation mapping or suffer from slow speed and the requirement of special labels in 3D orientation mapping. Here, we propose a novel polarized virtual spatial-frequency-shift effect to overcome these restrictions to achieve a universal 3D orientation super-resolution mapping capability. To demonstrate the mechanism, we simulate the imaging process and reconstruct the spatial-angular information for sparsely distributed dipoles with random 3D orientations and microfilament-like structures decorated with fluorophores oriented parallel to them. The 3D orientation distribution can be recovered with a doubled spatial resolution and an average angular precision of up to 2.39 degrees. The performance of the approach with noise has also been analyzed considering real implementation. △ Less

Submitted 22 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: 22 pages, 5 figures

arXiv:2401.07607 [pdf]

SnS2 thin film with in-situ and controllable Sb doping via atomic layer deposition for optoelectronic applications

Authors: Dong-Ho Shin, Jun Yang, Samik Mukherjee, Amin Bahrami, Sebastian Lehmann, Noushin Nasiri, Fabian Krahl, Chi Pang, Angelika Wrzesińska-Lashkova, Yana Vaynzof, Steve Wohlrab, Alexey Popov, Kornelius Nielsch

Abstract: SnS2 stands out as a highly promising two-dimensional material with significant potential for applications in the field of electronics. Numerous attempts have been undertaken to modulate the physical properties of SnS2 by doping with various metal ions. Here, we deposited a series of Sb-doped SnS2 via atomic layer deposition (ALD) super-cycle process and compared its crystallinity, composition, an… ▽ More SnS2 stands out as a highly promising two-dimensional material with significant potential for applications in the field of electronics. Numerous attempts have been undertaken to modulate the physical properties of SnS2 by doping with various metal ions. Here, we deposited a series of Sb-doped SnS2 via atomic layer deposition (ALD) super-cycle process and compared its crystallinity, composition, and optical properties to those of pristine SnS2. We found that the increase in the concentration of Sb is accompanied by a gradual reduction in the Sn and S binding energies. The work function is increased upon Sb doping from 4.32 eV (SnS2) to 4.75 eV (Sb-doped SnS2 with 9:1 ratio). When integrated into photodetectors, the Sb-doped SnS2 showed improved performances, demonstrating increased peak photoresponsivity values from 19.5 A/W to 27.8 A/W at 405 nm, accompanied by an improvement in response speed. These results offer valuable insights into next-generation optoelectronic applications based on SnS2. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 18 pages, 5 Figures, Journal

arXiv:2312.17077 [pdf, ps, other]

Projected Langevin Monte Carlo algorithms in non-convex and super-linear setting

Authors: Chenxu Pang, Xiaojie Wang, Yue Wu

Abstract: It is of significant interest in many applications to sample from a high-dimensional target distribution $π$ with the density $π(\text{d} x) \propto e^{-U(x)} (\text{d} x) $, based on the temporal discretization of the Langevin stochastic differential equations (SDEs). In this paper, we propose an explicit projected Langevin Monte Carlo (PLMC) algorithm with non-convex potential $U$ and super-line… ▽ More It is of significant interest in many applications to sample from a high-dimensional target distribution $π$ with the density $π(\text{d} x) \propto e^{-U(x)} (\text{d} x) $, based on the temporal discretization of the Langevin stochastic differential equations (SDEs). In this paper, we propose an explicit projected Langevin Monte Carlo (PLMC) algorithm with non-convex potential $U$ and super-linear gradient of $U$ and investigate the non-asymptotic analysis of its sampling error in total variation distance. Equipped with time-independent regularity estimates for the corresponding Kolmogorov equation, we derive the non-asymptotic bounds on the total variation distance between the target distribution of the Langevin SDEs and the law induced by the PLMC scheme with order $\mathcal{O}(h |\ln h|)$. Moreover, for a given precision $ε$, the smallest number of iterations of the classical Langevin Monte Carlo (LMC) scheme with the non-convex potential $U$ and the globally Lipshitz gradient of $U$ can be guaranteed by order ${\mathcal{O}}\big(\tfrac{d^{3/2}}ε \cdot \ln (\tfrac{d}ε) \cdot \ln (\tfrac{1}ε) \big)$. Numerical experiments are provided to confirm the theoretical findings. △ Less

Submitted 1 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 31 pages, 6 figures

MSC Class: 60H35; 65C05; 65C30

arXiv:2312.14557 [pdf, other]

Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

Authors: Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan

Abstract: Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets wi… ▽ More Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at https://github.com/WangRongsheng/Aurora △ Less

Submitted 1 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 10 pages, 2 figures

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.06682 [pdf, other]

Learning to Denoise Unreliable Interactions for Link Prediction on Biomedical Knowledge Graph

Authors: Tengfei Ma, Yujie Chen, Wen Tao, Dashun Zheng, Xuan Lin, Patrick Cheong-lao Pang, Yiping Liu, Yijun Wang, Bosheng Song, Xiangxiang Zeng

Abstract: Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However… ▽ More Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However, all these works only focus on improving the predictive performance without considering the inevitable noise and unreliable interactions existing in the KGs, which limits the development of KG-based computational methods. To address these limitations, we propose a Denoised Link Prediction framework, called DenoisedLP. DenoisedLP obtains reliable interactions based on the local subgraph by denoising noisy links in a learnable way, providing a universal module for mining underlying task-relevant relations. To collaborate with the smoothed semantic information, DenoisedLP introduces the semantic subgraph by blurring conflict relations around the predicted link. By maximizing the mutual information between the reliable structure and smoothed semantic relations, DenoisedLP emphasizes the informative interactions for predicting relation-specific links. Experimental results on real-world datasets demonstrate that DenoisedLP outperforms state-of-the-art methods on DTI and DDI prediction tasks, and verify the effectiveness and robustness of denoising unreliable interactions on the contaminated KGs. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2311.09258 [pdf, other]

Single-Chip Silicon Photonic Processor for Analog Optical and Microwave Signals

Authors: Hong Deng, Jing Zhang, Emadreza Soltanian, Xiangfeng Chen, Chao Pang, Nicolas Vaissiere, Delphine Neel, Joan Ramirez, Jean Decobert, Nishant Singh, Guy Torfs, Gunther Roelkens, Wim Bogaerts

Abstract: The explosion of data volume in communications, AI training, and cloud computing requires efficient data handling, which is typically stored as digital electrical information and transmitted as wireless radio frequency (RF) signals or light waves in optical fibres. Today's communications systems mostly treat the RF and optical signals separately, which results in unnecessary conversion losses and… ▽ More The explosion of data volume in communications, AI training, and cloud computing requires efficient data handling, which is typically stored as digital electrical information and transmitted as wireless radio frequency (RF) signals or light waves in optical fibres. Today's communications systems mostly treat the RF and optical signals separately, which results in unnecessary conversion losses and increased cost. In this work, we report the first fully on-chip signal processor for high-speed RF and optical signals based on a silicon photonic circuit. Our chip is capable of both generation and detection of analog electrical and optical signals, and can program a user-defined filter response in both domains. The single silicon photonic chip integrates all essential components like modulators, optical filters, and photodetectors, as well as tunable lasers enabled by transfer-printed Indium Phosphide (InP) optical amplifiers. The system's configuration is locally programmed through thermo-optic phase shifters and monitored by photodetectors. We demonstrate our chip's capabilities with different combinations of RF and optical signal processing functions, including optical and RF signal generation and filtering. This represents a key step towards compact microwave photonic systems for future wireless communication and sensing applications. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.17901 [pdf, other]

Improving the Knowledge Gradient Algorithm

Authors: Yang Le, Gao Siyang, Ho Chin Pang

Abstract: The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms. In this research, we show that this policy has limitations, causing the algorithm not asymptotically optimal. We next provide a… ▽ More The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms. In this research, we show that this policy has limitations, causing the algorithm not asymptotically optimal. We next provide a remedy for it, by following the manner of one-step look ahead of KG, but instead choosing the measurement that yields the greatest one-step improvement in the probability of selecting the best arm. The new policy is called improved knowledge gradient (iKG). iKG can be shown to be asymptotically optimal. In addition, we show that compared to KG, it is easier to extend iKG to variant problems of BAI, with the $ε$-good arm identification and feasible arm identification as two examples. The superior performances of iKG on these problems are further demonstrated using numerical examples. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 32 pages, 42 figures

arXiv:2310.05066 [pdf, other]

Guideline Learning for In-context Information Extraction

Authors: Chaoxu Pang, Yixuan Cao, Qiang Ding, Ping Luo

Abstract: Large language models (LLMs) can perform a new task by merely conditioning on task instructions and a few input-output examples, without optimizing any parameters. This is called In-Context Learning (ICL). In-context Information Extraction (IE) has recently garnered attention in the research community. However, the performance of In-context IE generally lags behind the state-of-the-art supervised… ▽ More Large language models (LLMs) can perform a new task by merely conditioning on task instructions and a few input-output examples, without optimizing any parameters. This is called In-Context Learning (ICL). In-context Information Extraction (IE) has recently garnered attention in the research community. However, the performance of In-context IE generally lags behind the state-of-the-art supervised expert models. We highlight a key reason for this shortfall: underspecified task description. The limited-length context struggles to thoroughly express the intricate IE task instructions and various edge cases, leading to misalignment in task comprehension with humans. In this paper, we propose a Guideline Learning (GL) framework for In-context IE which reflectively learns and follows guidelines. During the learning phrase, GL automatically synthesizes a set of guidelines based on a few error cases, and during inference, GL retrieves helpful guidelines for better ICL. Moreover, we propose a self-consistency-based active learning method to enhance the efficiency of GL. Experiments on event extraction and relation extraction show that GL can significantly improve the performance of in-context IE. △ Less

Submitted 21 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 main conference

arXiv:2310.02815 [pdf, other]

CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

Authors: Hao Shi, Chengshan Pang, Jiaming Zhang, Kailun Yang, Yuhao Wu, Huajian Ni, Yining Lin, Rainer Stiefelhagen, Kaiwei Wang

Abstract: Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses pre… ▽ More Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV. △ Less

Submitted 17 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: The source code will be made publicly available at https://github.com/MasterHow/CoBEV

arXiv:2309.02208 [pdf, ps, other]

Convergent finite difference schemes for stochastic transport equations

Authors: Ulrik S. Fjordholm, Kenneth H. Karlsen, Peter H. C. Pang

Abstract: We present difference schemes for stochastic transport equations with low-regularity velocity fields. We establish $L^2$ stability and convergence of the difference approximations under conditions that are less strict than those required for deterministic transport equations. The $L^2$ estimate, crucial for the analysis, is obtained through a discrete duality argument and a comprehensive examinati… ▽ More We present difference schemes for stochastic transport equations with low-regularity velocity fields. We establish $L^2$ stability and convergence of the difference approximations under conditions that are less strict than those required for deterministic transport equations. The $L^2$ estimate, crucial for the analysis, is obtained through a discrete duality argument and a comprehensive examination of a class of backward parabolic difference schemes. △ Less

Submitted 3 July, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 42 pages; adjustments in Section 2.2, other typos amended

MSC Class: 60H15; 65M12; 60H50; 65M80

arXiv:2308.12886 [pdf, ps, other]

doi 10.1016/j.jco.2024.101842

Linear implicit approximations of invariant measures of semi-linear SDEs with non-globally Lipschitz coefficients

Authors: Chenxu Pang, Xiaojie Wang, Yue Wu

Abstract: This article investigates the weak approximation towards the invariant measure of semi-linear stochastic differential equations (SDEs) under non-globally Lipschitz coefficients. For this purpose, we propose a linear-theta-projected Euler (LTPE) scheme, which also admits an invariant measure, to handle the potential influence of the linear stiffness. Under certain assumptions, both the SDE and the… ▽ More This article investigates the weak approximation towards the invariant measure of semi-linear stochastic differential equations (SDEs) under non-globally Lipschitz coefficients. For this purpose, we propose a linear-theta-projected Euler (LTPE) scheme, which also admits an invariant measure, to handle the potential influence of the linear stiffness. Under certain assumptions, both the SDE and the corresponding LTPE method are shown to converge exponentially to the underlying invariant measures, respectively. Moreover, with time-independent regularity estimates for the corresponding Kolmogorov equation, the weak error between the numerical invariant measure and the original one can be guaranteed with convergence of order one. In terms of computational complexity, the proposed ergodicity preserving scheme with the nonlinearity explicitly treated has a significant advantage over the ergodicity preserving implicit Euler method in the literature. Numerical experiments are provided to verify our theoretical findings. △ Less

Submitted 17 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 37 pages, 7 figures

MSC Class: 60H35; 37M25; 65C30

Journal ref: Journal of Complexity, Volume 83, August 2024, 101842

arXiv:2307.10512 [pdf, other]

IvyGPT: InteractiVe Chinese pathwaY language model in medical domain

Authors: Rongsheng Wang, Yaofei Duan, ChanTong Lam, Jiexi Chen, Jiangsheng Xu, Haoming Chen, Xiaohong Liu, Patrick Cheong-Iao Pang, Tao Tan

Abstract: General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF).… ▽ More General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF). After supervised fine-tuning, IvyGPT has good multi-turn conversation capabilities, but it cannot perform like a doctor in other aspects, such as comprehensive diagnosis. Through RLHF, IvyGPT can output richer diagnosis and treatment answers that are closer to human. In the training, we used QLoRA to train 33 billion parameters on a small number of NVIDIA A100 (80GB) GPUs. Experimental results show that IvyGPT has outperformed other medical GPT models. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 5 pages, 3 figures

arXiv:2305.12992 [pdf, ps, other]

Antithetic multilevel Monte Carlo method for approximations of SDEs with non-globally Lipschitz continuous coefficients

Authors: Chenxu Pang, Xiaojie Wang

Abstract: In the field of computational finance, it is common for the quantity of interest to be expected values of functions of random variables via stochastic differential equations (SDEs). For SDEs with globally Lipschitz coefficients and commutative diffusion coefficients, the explicit Milstein scheme, relying on only Brownian increments and thus easily implementable, can be combined with the multilevel… ▽ More In the field of computational finance, it is common for the quantity of interest to be expected values of functions of random variables via stochastic differential equations (SDEs). For SDEs with globally Lipschitz coefficients and commutative diffusion coefficients, the explicit Milstein scheme, relying on only Brownian increments and thus easily implementable, can be combined with the multilevel Monte Carlo (MLMC) method proposed by Giles \cite{giles2008multilevel} to give the optimal overall computational cost $\mathcal{O}(ε^{-2})$, where $ε$ is the required target accuracy. For multi-dimensional SDEs that do not satisfy the commutativity condition, a kind of one-half order truncated Milstein-type scheme without Lévy areas is introduced by Giles and Szpruch \cite{giles2014antithetic}, which combined with the antithetic MLMC gives the optimal computational cost under globally Lipschitz conditions. In the present work, we turn to SDEs with non-globally Lipschitz continuous coefficients, for which a family of modified Milstein-type schemes without Lévy areas is proposed. The expected one-half order of strong convergence is recovered in a non-globally Lipschitz setting, where the diffusion coefficients are allowed to grow superlinearly. This helps us to analyze the relevant variance of the multilevel estimator and the optimal computational cost is finally achieved for the antithetic MLMC. The analysis of both the convergence rate and the desired variance in the non-globally Lipschitz setting is highly non-trivial and non-standard arguments are developed to overcome some essential difficulties. Numerical experiments are provided to confirm the theoretical findings. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 39 pages, 4 figures

MSC Class: 65C05; 60H15; 65C30

arXiv:2305.07328 [pdf, other]

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

Authors: Kai Cheng, Xinhua Zeng, Yang Liu, Tian Wang, Chengxin Pang, Jing Teng, Zhaoyang Xia, Jing Liu

Abstract: Video anomaly detection (VAD) is a vital task with great practical applications in industrial surveillance, security system, and traffic control. Unlike previous unsupervised VAD methods that adopt a fixed structure to learn normality without considering different detection demands, we design a spatial-temporal hierarchical architecture (STHA) as a configurable architecture to flexibly detect diff… ▽ More Video anomaly detection (VAD) is a vital task with great practical applications in industrial surveillance, security system, and traffic control. Unlike previous unsupervised VAD methods that adopt a fixed structure to learn normality without considering different detection demands, we design a spatial-temporal hierarchical architecture (STHA) as a configurable architecture to flexibly detect different degrees of anomaly. The comprehensive structure of the STHA is delineated into a tripartite hierarchy, encompassing the following tiers: the stream level, the stack level, and the block level. Specifically, we design several auto-encoder-based blocks that possess varying capacities for extracting normal patterns. Then, we stack blocks according to the complexity degrees with both intra-stack and inter-stack residual links to learn hierarchical normality gradually. Considering the multisource knowledge of videos, we also model the spatial normality of video frames and temporal normality of RGB difference by designing two parallel streams consisting of stacks. Thus, STHA can provide various representation learning abilities by expanding or contracting hierarchically to detect anomalies of different degrees. Since the anomaly set is complicated and unbounded, our STHA can adjust its detection ability to adapt to the human detection demands and the complexity degree of anomaly that happened in the history of a scene. We conduct experiments on three benchmarks and perform extensive analysis, and the results demonstrate that our method performs comparablely to the state-of-the-art methods. In addition, we design a toy dataset to prove that our model can better balance the learning ability to adapt to different detection demands. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: submitted to IEEE TCSVT, under peer review

arXiv:2304.03981 [pdf, other]

Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Authors: Meng Wang, Tian Lin, Lianyu Wang, Aidi Lin, Ke Zou, Xinxing Xu, Yi Zhou, Yuanyuan Peng, Qingquan Meng, Yiming Qian, Guoyao Deng, Zhiqun Wu, Junhong Chen, Jianhong Lin, Mingzhi Zhang, Weifang Zhu, Changqing Zhang, Daoqiang Zhang, Rick Siow Mong Goh, Yong Liu, Chi Pui Pang, Xinjian Chen, Haoyu Chen, Huazhu Fu

Abstract: Failure to recognize samples from the classes unseen during training is a major limitation of artificial intelligence in the real-world implementation for recognition and classification of retinal anomalies. We established an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions. Besides assessing the probability of each category, UIOS also calcul… ▽ More Failure to recognize samples from the classes unseen during training is a major limitation of artificial intelligence in the real-world implementation for recognition and classification of retinal anomalies. We established an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions. Besides assessing the probability of each category, UIOS also calculated an uncertainty score to express its confidence. Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set, external target categories (TC)-JSIEC dataset and TC-unseen testing set, respectively, compared to the F1 score of 92.20%, 80.69% and 64.74% by the standard AI model. Furthermore, UIOS correctly predicted high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images. UIOS provides a robust method for real-world screening of retinal anomalies. △ Less

Submitted 29 August, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

arXiv:2304.03852 [pdf, other]

doi 10.1145/3544548.3580912

StoryChat: Designing a Narrative-Based Viewer Participation Tool for Live Streaming Chatrooms

Authors: Ryan Yen, Li Feng, Brinda Mehra, Ching Christie Pang, Siying Hu, Zhicong Lu

Abstract: Live streaming platforms and existing viewer participation tools enable users to interact and engage with an online community, but the anonymity and scale of chat usually result in the spread of negative comments. However, only a few existing moderation tools investigate the influence of proactive moderation on viewers' engagement and prosocial behavior. To address this, we developed StoryChat, a… ▽ More Live streaming platforms and existing viewer participation tools enable users to interact and engage with an online community, but the anonymity and scale of chat usually result in the spread of negative comments. However, only a few existing moderation tools investigate the influence of proactive moderation on viewers' engagement and prosocial behavior. To address this, we developed StoryChat, a narrative-based viewer participation tool that utilizes a dynamic graphical plot to reflect chatroom negativity. We crafted the narrative through a viewer-centered (N=65) iterative design process and evaluated the tool with 48 experienced viewers in a deployment study. We discovered that StoryChat encouraged viewers to contribute prosocial comments, increased viewer engagement, and fostered viewers' sense of community. Viewers reported a closer connection between streamers and other viewers because of the narrative design, suggesting that narrative-based viewer engagement tools have the potential to encourage community engagement and prosocial behaviors. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.09511 [pdf, other]

Capacity-achieving Polar-based Codes with Sparsity Constraints on the Generator Matrices

Authors: James Chin-Jen Pang, Hessam Mahdavifar, S. Sandeep Pradhan

Abstract: In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization ker… ▽ More In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization kernel such that the corresponding polar code is capacity-achieving with the \textit{rate of polarization} $s/2$, and the GM column weights being bounded from above by $N^s$. To improve the sparsity versus error rate trade-off, we devise a column-splitting algorithm and two coding schemes for BEC and then for general BMS channels. The \textit{polar-based} codes generated by the two schemes inherit several fundamental properties of polar codes with the original $2 \times 2$ kernel including the decay in error probability, decoding complexity, and the capacity-achieving property. Furthermore, they demonstrate the additional property that their GM column weights are bounded from above sublinearly in $N$, while the original polar codes have some column weights that are linear in $N$. In particular, for any BEC and $β<0.5$, the existence of a sequence of capacity-achieving polar-based codes where all the GM column weights are bounded from above by $N^λ$ with $λ\approx 0.585$, and with the error probability bounded by $O(2^{-N^β} )$ under a decoder with complexity $O(N\log N)$, is shown. The existence of similar capacity-achieving polar-based codes with the same decoding complexity is shown for any BMS channel and $β<0.5$ with $λ\approx 0.631$. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 31 pages, single column. arXiv admin note: substantial text overlap with arXiv:2012.13977

arXiv:2302.10329 [pdf, other]

doi 10.1145/3593013.3594033

Harms from Increasingly Agentic Algorithmic Systems

Authors: Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj

Abstract: Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems… ▽ More Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency -- notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems. △ Less

Submitted 11 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Accepted at FAccT 2023

arXiv:2302.05582 [pdf, other]

ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems

Authors: Daniel Hao Xian Yuen, Andrew Yong Chen Pang, Zhou Yang, Chun Yong Chong, Mei Kuan Lim, David Lo

Abstract: Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test ca… ▽ More Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test cases from a text corpus. However, CrossASR++ fails to make use of the text corpus efficiently and provides limited information on how the failed test cases can improve ASR systems. To address these limitations, our tool incorporates two novel features: (1) a text transformation module to boost the number of generated test cases and uncover more errors in ASR systems and (2) a phonetic analysis module to identify on which phonemes the ASR system tend to produce errors. ASDF generates more high-quality test cases by applying various text transformation methods (e.g., change tense) to the texts in failed test cases. By doing so, ASDF can utilize a small text corpus to generate a large number of audio test cases, something which CrossASR++ is not capable of. In addition, ASDF implements more metrics to evaluate the performance of ASR systems from multiple perspectives. ASDF performs phonetic analysis on the identified failed test cases to identify the phonemes that ASR systems tend to transcribe incorrectly, providing useful information for developers to improve ASR systems. The demonstration video of our tool is made online at https://www.youtube.com/watch?v=DzVwfc3h9As. The implementation is available at https://github.com/danielyuenhx/asdf-differential-testing. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Accpeted by ICST 2023 Tool Demo Track

arXiv:2302.04456 [pdf, other]

ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

Authors: Pengfei Zhu, Chao Pang, Yekun Chai, Lei Li, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu

Abstract: In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinne… ▽ More In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models. Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process within the diffusion model framework. Addressing the challenge of limited text-music parallel data, we undertake the creation of a dataset by harnessing web resources, a task facilitated by weak supervision techniques. Furthermore, a rigorous empirical inquiry is undertaken to contrast the efficacy of two distinct prompt formats for text conditioning, namely, music tags and unconstrained textual descriptions. The outcomes of this comparative analysis affirm the superior performance of our proposed model in terms of enhancing text-music relevance. Finally, our work culminates in a demonstrative exhibition of the excellent capabilities of our model in text-to-music generation. We further demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance. △ Less

Submitted 21 September, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: Accepted by AACL demo 2023

arXiv:2301.11495 [pdf, other]

Skeleton-based Action Recognition through Contrasting Two-Stream Spatial-Temporal Networks

Authors: Chen Pang, Xuequan Lu, Lei Lyu

Abstract: For pursuing accurate skeleton-based action recognition, most prior methods use the strategy of combining Graph Convolution Networks (GCNs) with attention-based methods in a serial way. However, they regard the human skeleton as a complete graph, resulting in less variations between different actions (e.g., the connection between the elbow and head in action ``clapping hands''). For this, we propo… ▽ More For pursuing accurate skeleton-based action recognition, most prior methods use the strategy of combining Graph Convolution Networks (GCNs) with attention-based methods in a serial way. However, they regard the human skeleton as a complete graph, resulting in less variations between different actions (e.g., the connection between the elbow and head in action ``clapping hands''). For this, we propose a novel Contrastive GCN-Transformer Network (ConGT) which fuses the spatial and temporal modules in a parallel way. The ConGT involves two parallel streams: Spatial-Temporal Graph Convolution stream (STG) and Spatial-Temporal Transformer stream (STT). The STG is designed to obtain action representations maintaining the natural topology structure of the human skeleton. The STT is devised to acquire action representations containing the global relationships among joints. Since the action representations produced from these two streams contain different characteristics, and each of them knows little information of the other, we introduce the contrastive learning paradigm to guide their output representations of the same sample to be as close as possible in a self-supervised manner. Through the contrastive learning, they can learn information from each other to enrich the action features by maximizing the mutual information between the two types of action representations. To further improve action recognition accuracy, we introduce the Cyclical Focal Loss (CFL) which can focus on confident training samples in early training epochs, with an increasing focus on hard samples during the middle epochs. We conduct experiments on three benchmark datasets, which demonstrate that our model achieves state-of-the-art performance in action recognition. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 14 pages, 9 figures

arXiv:2301.10922 [pdf, other]

Detecting Building Changes with Off-Nadir Aerial Images

Authors: Chao Pang, Jiang Wu, Jian Ding, Can Song, Gui-Song Xia

Abstract: The tilted viewing nature of the off-nadir aerial images brings severe challenges to the building change detection (BCD) problem: the mismatch of the nearby buildings and the semantic ambiguity of the building facades. To tackle these challenges, we present a multi-task guided change detection network model, named as MTGCD-Net. The proposed model approaches the specific BCD problem by designing th… ▽ More The tilted viewing nature of the off-nadir aerial images brings severe challenges to the building change detection (BCD) problem: the mismatch of the nearby buildings and the semantic ambiguity of the building facades. To tackle these challenges, we present a multi-task guided change detection network model, named as MTGCD-Net. The proposed model approaches the specific BCD problem by designing three auxiliary tasks, including: (1) a pixel-wise classification task to predict the roofs and facades of buildings; (2) an auxiliary task for learning the roof-to-footprint offsets of each building to account for the misalignment between building roof instances; and (3) an auxiliary task for learning the identical roof matching flow between bi-temporal aerial images to tackle the building roof mismatch problem. These auxiliary tasks provide indispensable and complementary building parsing and matching information. The predictions of the auxiliary tasks are finally fused to the main building change detection branch with a multi-modal distillation module. To train and test models for the BCD problem with off-nadir aerial images, we create a new benchmark dataset, named BANDON. Extensive experiments demonstrate that our model achieves superior performance over the previous state-of-the-art competitors. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Journal ref: SCIENCE CHINA Information Sciences (SCIS) 2023

arXiv:2301.06349 [pdf, ps, other]

Second order commutator estimates in renormalisation theory for SPDEs with gradient-type noise

Authors: Peter H. C. Pang

Abstract: An important step in standard renormalisation arguments involve convolution against a standard mollifier. As pointed out in (Punshon-Smith--Smith 2018), this generates second order commutator terms in equations with gradient-type noise. These are commutators similar to commutators in the well-known ``folklore lemma" of Di Perna--Lions (Di Perna--Lions 1989, Lemma II.1), but not covered by standard… ▽ More An important step in standard renormalisation arguments involve convolution against a standard mollifier. As pointed out in (Punshon-Smith--Smith 2018), this generates second order commutator terms in equations with gradient-type noise. These are commutators similar to commutators in the well-known ``folklore lemma" of Di Perna--Lions (Di Perna--Lions 1989, Lemma II.1), but not covered by standard renormalisation theory. In this note we establish the vanishing of these commutators for gradient-type noises on $\mathbb{T}^d$ not necessarily possessing divergence-free structure. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: 10 pages, proceedings of HYP2022

MSC Class: 35-06; 35A25; 35R60; 60H15

arXiv:2301.06096

Weak convergence of stochastic integrals

Authors: Kenneth H. Karlsen, Peter H. C. Pang

Abstract: The convergence of stochastic integrals driven by a sequence of Wiener processes $W_n\to W$ (with convergence in $C_t$) is crucial in the analysis of stochastic partial differential equations (SPDEs). The convergence we focus on in this paper is of the form $\int_0^T V_n\, {\rm d} W_n \to \int_0^T V\,{\rm d} W$, where $V_n$ takes values in $L^p([0,T];X)$ for some finite $p\ge 2$ and a Banach space… ▽ More The convergence of stochastic integrals driven by a sequence of Wiener processes $W_n\to W$ (with convergence in $C_t$) is crucial in the analysis of stochastic partial differential equations (SPDEs). The convergence we focus on in this paper is of the form $\int_0^T V_n\, {\rm d} W_n \to \int_0^T V\,{\rm d} W$, where $V_n$ takes values in $L^p([0,T];X)$ for some finite $p\ge 2$ and a Banach space $X$. Standard methods do not directly apply when $V_n$ only converges weakly in the temporal variable to $V$. We provide (weak) convergence results that address the need to take limits of stochastic integrals when only weak temporal convergence is available. This is particularly relevant for SPDEs with singular behaviour. △ Less

Submitted 23 August, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: This paper was withdrawn due to an error in the proof of the main theorem

MSC Class: Primary: 60H15; 60G46; Secondary: 60F25

arXiv:2212.10505 [pdf, other]

DePlot: One-shot visual language reasoning by plot-to-table translation

Authors: Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

Abstract: Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual languag… ▽ More Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA. △ Less

Submitted 23 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: ACL 2023 (Findings)

arXiv:2212.09662 [pdf, other]

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Authors: Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos

Abstract: Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks… ▽ More Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks. △ Less

Submitted 23 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2212.06742 [pdf, other]

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Authors: Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu

Abstract: Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs… ▽ More Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs. We employ two methods for universal cross-lingual pre-training: span-corruption language modeling that learns patterns from monolingual NL or PL; and pivot-based translation language modeling that relies on parallel data of many NLs and PLs. Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation. We further show its advantage of zero-shot prompting on multilingual code summarization and text-to-text translation. We release our code and pre-trained checkpoints. △ Less

Submitted 19 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: Accepted at ACL 2023 (Findings)

arXiv:2212.01575 [pdf]

Multi-view deep learning based molecule design and structural optimization accelerates the SARS-CoV-2 inhibitor discovery

Authors: Chao Pang, Yu Wang, Yi Jiang, Ruheng Wang, Ran Su, Leyi Wei

Abstract: In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and a… ▽ More In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and adaptively learn comprehensive structural semantics from targeted molecular topology and geometry. We show that our MEDICO significantly outperforms the state-of-the-art methods in generating valid, unique, and novel molecules under benchmarking comparisons. In particular, we showcase the multi-view deep learning model enables us to generate not only the molecules structurally similar to the targeted molecules but also the molecules with desired chemical properties, demonstrating the strong capability of our model in exploring the chemical space deeply. Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that by integrating molecule docking into our model as chemical priori, we successfully generate new small molecules with desired drug-like properties for the Mpro, potentially accelerating the de novo design of Covid-19 drugs. Further, we apply MEDICO to the structural optimization of three well-known Mpro inhibitors (N3, 11a, and GC376) and achieve ~88% improvement in their binding affinity to Mpro, demonstrating the application value of our model for the development of therapeutics for SARS-CoV-2 infection. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2211.14921 [pdf]

Padded Helmet Shell Covers in American Football: A Comprehensive Laboratory Evaluation with Preliminary On-Field Findings

Authors: Nicholas J. Cecchi, Ashlyn A. Callan, Landon P. Watson, Yuzhe Liu, Xianghao Zhan, Ramanand V. Vegesna, Collin Pang, Enora Le Flao, Gerald A. Grant, Michael M. Zeineh, David B. Camarillo

Abstract: Protective headgear effects measured in the laboratory may not always translate to the field. In this study, we evaluated the impact attenuation capabilities of a commercially available padded helmet shell cover in the laboratory and field. In the laboratory, we evaluated the efficacy of the padded helmet shell cover in attenuating impact magnitude across six impact locations and three impact velo… ▽ More Protective headgear effects measured in the laboratory may not always translate to the field. In this study, we evaluated the impact attenuation capabilities of a commercially available padded helmet shell cover in the laboratory and field. In the laboratory, we evaluated the efficacy of the padded helmet shell cover in attenuating impact magnitude across six impact locations and three impact velocities when equipped to three different helmet models. In a preliminary on-field investigation, we used instrumented mouthguards to monitor head impact magnitude in collegiate linebackers during practice sessions while not wearing the padded helmet shell covers (i.e., bare helmets) for one season and whilst wearing the padded helmet shell covers for another season. The addition of the padded helmet shell cover was effective in attenuating the magnitude of angular head accelerations and two brain injury risk metrics (DAMAGE, HARM) across most laboratory impact conditions, but did not significantly attenuate linear head accelerations for all helmets. Overall, HARM values were reduced in laboratory impact tests by an average of 25% at 3.5 m/s (range: 9.7 - 39.6%), 18% at 5.5 m/s (range: -5.5 - 40.5%), and 10% at 7.4 m/s (range: -6.0 - 31.0%). However, on the field, no significant differences in any measure of head impact magnitude were observed between the bare helmet impacts and padded helmet impacts. Further laboratory tests were conducted to evaluate the ability of the padded helmet shell cover to maintain its performance after exposure to repeated, successive impacts and across a range of temperatures. This research provides a detailed assessment of padded helmet shell covers and supports the continuation of in vivo helmet research to validate laboratory testing results. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: 49 references, 8 figures

arXiv:2211.09023 [pdf, ps, other]

Can the three new states around 2.2 GeV assign to $ω(3D)$

Authors: Ya-rong Wang, Yang Ma, Cheng-qun Pang

Abstract: Recently, the BESIII Collaboration reported three resonances: $X(2232)$ with $M = 2232 \pm 19 \pm 27$ MeV and $Γ= 93 \pm 53 \pm 20$ MeV, $X(2200)$ whose mass $M = 2200 \pm 11 \pm 17$ MeV and width $Γ= 74 \pm 20 \pm 24$ MeV as well as $X(2222)$ which has mass of $2222 \pm 7 \pm 2$ MeV and the width of $59 \pm 30 \pm 6$ MeV. The mass spectrum of $ω$ meson family is studied utilizing the modified God… ▽ More Recently, the BESIII Collaboration reported three resonances: $X(2232)$ with $M = 2232 \pm 19 \pm 27$ MeV and $Γ= 93 \pm 53 \pm 20$ MeV, $X(2200)$ whose mass $M = 2200 \pm 11 \pm 17$ MeV and width $Γ= 74 \pm 20 \pm 24$ MeV as well as $X(2222)$ which has mass of $2222 \pm 7 \pm 2$ MeV and the width of $59 \pm 30 \pm 6$ MeV. The mass spectrum of $ω$ meson family is studied utilizing the modified Godfrey-Isgur model, and the two-body strong decays of $X(2232)$, $X(2200)$ and $X(2222)$ within two different approaches of the $^3P_0$ model. We find that the newly discovered states $X(2232)$, $X(2200)$ and $X(2222)$ may be the same and are most likely to be the $ω(3D)$ state. The discovery could be useful in establishing entire $ω$ mesons. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 5 pages,2 figures

arXiv:2211.07454 [pdf, other]

LGN-Net: Local-Global Normality Network for Video Anomaly Detection

Authors: Mengyang Zhao, Xinhua Zeng, Yang Liu, Jing Liu, Di Li, Xing Hu, Chengxin Pang

Abstract: Video anomaly detection (VAD) has been intensively studied for years because of its potential applications in intelligent video systems. Existing unsupervised VAD methods tend to learn normality from training sets consisting of only normal videos and regard instances deviating from such normality as anomalies. However, they often consider only local or global normality in the temporal dimension. S… ▽ More Video anomaly detection (VAD) has been intensively studied for years because of its potential applications in intelligent video systems. Existing unsupervised VAD methods tend to learn normality from training sets consisting of only normal videos and regard instances deviating from such normality as anomalies. However, they often consider only local or global normality in the temporal dimension. Some of them focus on learning local spatiotemporal representations from consecutive frames to enhance the representation for normal events. But powerful representation allows these methods to represent some anomalies and causes miss detection. In contrast, the other methods are devoted to memorizing prototypical normal patterns of whole training videos to weaken the generalization for anomalies, which also restricts them from representing diverse normal patterns and causes false alarm. To this end, we propose a two-branch model, Local-Global Normality Network (LGN-Net), to simultaneously learn local and global normality. Specifically, one branch learns the evolution regularities of appearance and motion from consecutive frames as local normality utilizing a spatiotemporal prediction network, while the other branch memorizes prototype features of the whole videos as global normality by a memory module. LGN-Net achieves a balance of representing normal and abnormal instances by fusing local and global normality. In addition, the fused normality enables LGN-Net to generalize to various scenes more than exploiting single normality. Experiments demonstrate the effectiveness and superior performance of our method. The code is available online: https://github.com/Myzhao1999/LGN-Net. △ Less

Submitted 8 January, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.07046 [pdf, ps, other]

doi 10.1016/j.jde.2023.12.021

Global existence of dissipative solutions to the Camassa--Holm equation with transport noise

Authors: Luca Galimberti, Helge Holden, Kenneth H. Karlsen, Peter H. C. Pang

Abstract: We consider a nonlinear stochastic partial differential equation (SPDE) that takes the form of the Camassa--Holm equation perturbed by a convective, position-dependent, noise term. We establish the first global-in-time existence result for dissipative weak martingale solutions to this SPDE, with general finite-energy initial data. The solution is obtained as the limit of classical solutions to par… ▽ More We consider a nonlinear stochastic partial differential equation (SPDE) that takes the form of the Camassa--Holm equation perturbed by a convective, position-dependent, noise term. We establish the first global-in-time existence result for dissipative weak martingale solutions to this SPDE, with general finite-energy initial data. The solution is obtained as the limit of classical solutions to parabolic SPDEs. The proof combines model-specific statistical estimates with stochastic propagation of compactness techniques, along with the systematic use of tightness and a.s. representations of random variables on specific quasi-Polish spaces. The spatial dependence of the noise function makes more difficult the analysis of a priori estimates and various renormalisations, giving rise to nonlinear terms induced by the martingale part of the equation and the second-order Stratonovich--Itô correction term. △ Less

Submitted 1 January, 2024; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: 86 pages

MSC Class: Primary: 35R60; 35G25; Secondary: 35A01; 35D30

arXiv:2211.03885 [pdf, other]

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li , et al. (13 additional authors not shown)

Abstract: The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th… ▽ More The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.03545 [pdf, other]

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Authors: Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu

Abstract: Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We prop… ▽ More Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. △ Less

Submitted 4 December, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

Showing 1–50 of 170 results for author: Pang, C