-
Gate-controlled neuromorphic functional transition in an electrochemical graphene transistor
Authors:
Chenglin Yu,
Shaorui Li,
Zhoujie Pan,
Yanming Liu,
Yongchao Wang,
Siyi Zhou,
Zhiting Gao,
He Tian,
Kaili Jiang,
Yayu Wang,
Jinsong Zhang
Abstract:
Neuromorphic devices have gained significant attention as potential building blocks for the next generation of computing technologies owing to their ability to emulate the functionalities of biological nervous systems. The essential components in artificial neural network such as synapses and neurons are predominantly implemented by dedicated devices with specific functionalities. In this work, we…
▽ More
Neuromorphic devices have gained significant attention as potential building blocks for the next generation of computing technologies owing to their ability to emulate the functionalities of biological nervous systems. The essential components in artificial neural network such as synapses and neurons are predominantly implemented by dedicated devices with specific functionalities. In this work, we present a gate-controlled transition of neuromorphic functions between artificial neurons and synapses in monolayer graphene transistors that can be employed as memtransistors or synaptic transistors as required. By harnessing the reliability of reversible electrochemical reactions between C atoms and hydrogen ions, the electric conductivity of graphene transistors can be effectively manipulated, resulting in high on/off resistance ratio, well-defined set/reset voltage, and prolonged retention time. Overall, the on-demand switching of neuromorphic functions in a single graphene transistor provides a promising opportunity to develop adaptive neural networks for the upcoming era of artificial intelligence and machine learning.
△ Less
Submitted 31 December, 2023; v1 submitted 8 December, 2023;
originally announced December 2023.
-
Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models
Authors:
Yijie Zhang,
Zhangyang Gao,
Cheng Tan,
Stan Z. Li
Abstract:
Predicting protein stability changes induced by single-point mutations has been a persistent challenge over the years, attracting immense interest from numerous researchers. The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry, including drug development, protein evolution analysis, and enzyme synthesis. Despite the proposition…
▽ More
Predicting protein stability changes induced by single-point mutations has been a persistent challenge over the years, attracting immense interest from numerous researchers. The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry, including drug development, protein evolution analysis, and enzyme synthesis. Despite the proposition of multiple methodologies aimed at addressing this issue, few approaches have successfully achieved optimal performance coupled with high computational efficiency. Two principal hurdles contribute to the existing challenges in this domain. The first is the complexity of extracting and aggregating sufficiently representative features from proteins. The second refers to the limited availability of experimental data for protein mutation analysis, further complicating the comprehensive evaluation of model performance on unseen data samples. With the advent of Large Language Models(LLM), such as the ESM models in protein research, profound interpretation of protein features is now accessibly aided by enormous training data. Therefore, LLMs are indeed to facilitate a wide range of protein research. In our study, we introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations. Furthermore, we have curated a dataset meticulously designed to preclude data leakage, corresponding to two extensively employed test datasets, to facilitate a more equitable model comparison.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Mapping the Information Journey: Unveiling the Documentation Experience of Software Developers in China
Authors:
Zhijun Gao,
Jiangying Wang,
Meina Wang
Abstract:
This research delves into understanding the behaviors and characteristics of Chinese developers in relation to their use of technical documentation, which is crucial for creating high-quality developer documentation. We conducted interviews with 25 software developers and surveyed 177 participants, using the preliminary interview findings to inform the survey design. Our approach encompassed tradi…
▽ More
This research delves into understanding the behaviors and characteristics of Chinese developers in relation to their use of technical documentation, which is crucial for creating high-quality developer documentation. We conducted interviews with 25 software developers and surveyed 177 participants, using the preliminary interview findings to inform the survey design. Our approach encompassed traditional user research methods, including persona and user journey mapping, to develop typical personas and information journeys based on the qualitative data from the interviews and quantitative results from the survey. Our results revealed distinct characteristics and differences between junior and senior developers in terms of their use of technical documentation, broadly categorized into personality traits, learning habits, and working habits. We observed that the information journey of both groups typically encompasses four stages: Exploration, Understanding, Practice, and Application. Consequently, we created two distinct personas and information journey maps to represent these two developer groups. Our findings highlight that developers prioritize the content, organization, and maintenance aspects of documentation. In conclusion, we recommend organizing documentation content to align with developers' information journeys, tailoring documentation to meet the needs of developers at various levels, and focusing on the content, organization, and maintenance aspects of documentation.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
On the Trade-Off between Stability and Representational Capacity in Graph Neural Networks
Authors:
Zhan Gao,
Amanda Prorok,
Elvin Isufi
Abstract:
Analyzing the stability of graph neural networks (GNNs) under topological perturbations is key to understanding their transferability and the role of each architecture component. However, stability has been investigated only for particular architectures, questioning whether it holds for a broader spectrum of GNNs or only for a few instances. To answer this question, we study the stability of EdgeN…
▽ More
Analyzing the stability of graph neural networks (GNNs) under topological perturbations is key to understanding their transferability and the role of each architecture component. However, stability has been investigated only for particular architectures, questioning whether it holds for a broader spectrum of GNNs or only for a few instances. To answer this question, we study the stability of EdgeNet: a general GNN framework that unifies more than twenty solutions including the convolutional and attention-based classes, as well as graph isomorphism networks and hybrid architectures. We prove that all GNNs within the EdgeNet framework are stable to topological perturbations. By studying the effect of different EdgeNet categories on the stability, we show that GNNs with fewer degrees of freedom in their parameter space, linked to a lower representational capacity, are more stable. The key factor yielding this trade-off is the eigenvector misalignment between the EdgeNet parameter matrices and the graph shift operator. For example, graph convolutional neural networks that assign a single scalar per signal shift (hence, with a perfect alignment) are more stable than the more involved node or edge-varying counterparts. Extensive numerical results corroborate our theoretical findings and highlight the role of different architecture components in the trade-off.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Bootstrapping SparseFormers from Vision Foundation Models
Authors:
Ziteng Gao,
Zhan Tong,
Kevin Qinghong Lin,
Joya Chen,
Mike Zheng Shou
Abstract:
The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual tokens via adjusting RoIs, greatly reducing computational costs while still achieving promising performance. However, training SparseFormers from scratch is still expensive, and scaling up the number of parameters can be challenging. In this p…
▽ More
The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual tokens via adjusting RoIs, greatly reducing computational costs while still achieving promising performance. However, training SparseFormers from scratch is still expensive, and scaling up the number of parameters can be challenging. In this paper, we propose to bootstrap SparseFormers from ViT-based vision foundation models in a simple and efficient way. Since the majority of SparseFormer blocks are the standard transformer ones, we can inherit weights from large-scale pre-trained vision transformers and freeze them as much as possible. Therefore, we only need to train the SparseFormer-specific lightweight focusing transformer to adjust token RoIs and fine-tune a few early pre-trained blocks to align the final token representation. In such a way, we can bootstrap SparseFormer architectures from various large-scale pre-trained models (e.g., IN-21K pre-trained AugRegs or CLIPs) using a rather smaller amount of training samples (e.g., IN-1K) and without labels or captions within just a few hours. As a result, the bootstrapped unimodal SparseFormer (from AugReg-ViT-L/16-384) can reach 84.9% accuracy on IN-1K with only 49 tokens, and the multimodal SparseFormer from CLIPs also demonstrates notable zero-shot performance with highly reduced computational cost without seeing any caption during the bootstrapping procedure. In addition, CLIP-bootstrapped SparseFormers, which align the output space with language without seeing a word, can serve as efficient vision encoders in multimodal large language models. Code and models are available at https://github.com/showlab/sparseformer
△ Less
Submitted 4 April, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Survey on deep learning in multimodal medical imaging for cancer detection
Authors:
Yan Tian,
Zhaocheng Xu,
Yujun Ma,
Weiping Ding,
Ruili Wang,
Zhihong Gao,
Guohua Cheng,
Linyang He,
Xuran Zhao
Abstract:
The task of multimodal cancer detection is to determine the locations and categories of lesions by using different imaging techniques, which is one of the key research methods for cancer diagnosis. Recently, deep learning-based object detection has made significant developments due to its strength in semantic feature extraction and nonlinear function fitting. However, multimodal cancer detection r…
▽ More
The task of multimodal cancer detection is to determine the locations and categories of lesions by using different imaging techniques, which is one of the key research methods for cancer diagnosis. Recently, deep learning-based object detection has made significant developments due to its strength in semantic feature extraction and nonlinear function fitting. However, multimodal cancer detection remains challenging due to morphological differences in lesions, interpatient variability, difficulty in annotation, and imaging artifacts. In this survey, we mainly investigate over 150 papers in recent years with respect to multimodal cancer detection using deep learning, with a focus on datasets and solutions to various challenges such as data annotation, variance between classes, small-scale lesions, and occlusion. We also provide an overview of the advantages and drawbacks of each approach. Finally, we discuss the current scope of work and provide directions for the future development of multimodal cancer detection.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Quantification of cardiac capillarization in single-immunostained myocardial slices using weakly supervised instance segmentation
Authors:
Zhao Zhang,
Xiwen Chen,
William Richardson,
Bruce Z. Gao,
Abolfazl Razi,
Tong Ye
Abstract:
Decreased myocardial capillary density has been reported as an important histopathological feature associated with various heart disorders. Quantitative assessment of cardiac capillarization typically involves double immunostaining of cardiomyocytes (CMs) and capillaries in myocardial slices. In contrast, single immunostaining of basement membrane components is a straightforward approach to simult…
▽ More
Decreased myocardial capillary density has been reported as an important histopathological feature associated with various heart disorders. Quantitative assessment of cardiac capillarization typically involves double immunostaining of cardiomyocytes (CMs) and capillaries in myocardial slices. In contrast, single immunostaining of basement membrane components is a straightforward approach to simultaneously label CMs and capillaries, presenting fewer challenges in background staining. However, subsequent image analysis always requires manual work in identifying and segmenting CMs and capillaries. Here, we developed an image analysis tool, AutoQC, to automatically identify and segment CMs and capillaries in immunofluorescence images of collagen type IV, a predominant basement membrane protein within the myocardium. In addition, commonly used capillarization-related measurements can be derived from segmentation masks. AutoQC features a weakly supervised instance segmentation algorithm by leveraging the power of a pre-trained segmentation model via prompt engineering. AutoQC outperformed YOLOv8-Seg, a state-of-the-art instance segmentation model, in both instance segmentation and capillarization assessment. Furthermore, the training of AutoQC required only a small dataset with bounding box annotations instead of pixel-wise annotations, leading to a reduced workload during network training. AutoQC provides an automated solution for quantifying cardiac capillarization in basement-membrane-immunostained myocardial slices, eliminating the need for manual image analysis once it is trained.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
An HCAI Methodological Framework: Putting It Into Action to Enable Human-Centered AI
Authors:
Wei Xu,
Zaifeng Gao,
Marvin Dainoff
Abstract:
Human-centered AI (HCAI), as a design philosophy, advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI technology to humans and avoid its potential adverse effects. While HCAI has gained momentum, the lack of guidance on methodology in its implementation makes its adoption challenging. After assessing the needs for a metho…
▽ More
Human-centered AI (HCAI), as a design philosophy, advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI technology to humans and avoid its potential adverse effects. While HCAI has gained momentum, the lack of guidance on methodology in its implementation makes its adoption challenging. After assessing the needs for a methodological framework for HCAI, this paper first proposes a comprehensive and interdisciplinary HCAI methodological framework integrated with seven components, including design goals, design principles, implementation approaches, design paradigms, interdisciplinary teams, methods, and processes. THe implications of the framework are also discussed. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe the proposed framework is systematic and executable, which can overcome the weaknesses in current frameworks and the challenges currently faced in implementing HCAI. Thus, the framework can help put it into action to develop, transfer, and implement HCAI in practice, eventually enabling the design, development, and deployment of HCAI-based intelligent systems.
△ Less
Submitted 30 November, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
A Deep-learning Real-time Bias Correction Method for Significant Wave Height Forecasts in the Western North Pacific
Authors:
Wei Zhang,
Yu Sun,
Yapeng Wu,
Junyu Dong,
Xiaojiang Song,
Zhiyi Gao,
Renbo Pang,
Boyu Guoan
Abstract:
Significant wave height is one of the most important parameters characterizing ocean waves, and accurate numerical ocean wave forecasting is crucial for coastal protection and shipping. However, due to the randomness and nonlinearity of the wind fields that generate ocean waves and the complex interaction between wave and wind fields, current forecasts of numerical ocean waves have biases. In this…
▽ More
Significant wave height is one of the most important parameters characterizing ocean waves, and accurate numerical ocean wave forecasting is crucial for coastal protection and shipping. However, due to the randomness and nonlinearity of the wind fields that generate ocean waves and the complex interaction between wave and wind fields, current forecasts of numerical ocean waves have biases. In this study, a spatiotemporal deep-learning method was employed to correct gridded SWH forecasts from the ECMWF-IFS. This method was built on the trajectory gated recurrent unit deep neural network,and it conducts real-time rolling correction for the 0-240h SWH forecasts from ECMWF-IFS. The correction model is co-driven by wave and wind fields, providing better results than those based on wave fields alone. A novel pixel-switch loss function was developed. The pixel-switch loss function can dynamically fine-tune the pre-trained correction model, focusing on pixels with large biases in SWH forecasts. According to the seasonal characteristics of SWH, four correction models were constructed separately, for spring, summer, autumn, and winter. The experimental results show that, compared with the original ECMWF SWH predictions, the correction was most effective in spring, when the mean absolute error decreased by 12.972~46.237%. Although winter had the worst performance, the mean absolute error decreased by 13.794~38.953%. The corrected results improved the original ECMWF SWH forecasts under both normal and extreme weather conditions, indicating that our SWH correction model is robust and generalizable.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Authors:
Cheng Tan,
Jingxuan Wei,
Zhangyang Gao,
Linzhuang Sun,
Siyuan Li,
Ruifeng Guo,
Bihui Yu,
Stan Z. Li
Abstract:
Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rational…
▽ More
Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rationales. In this work, we delve into the importance of rationales in model reasoning. We observe that when rationales are completely accurate, the model's accuracy significantly improves, highlighting the need for high-quality rationale generation. Motivated by this, we propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process. This approach not only enhances the quality of generated rationales but also leads to more accurate and robust answers. Through extensive experiments, we demonstrate that our approach significantly improves model performance across various benchmarks. Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning. The code is available at https://github.com/chengtan9907/mc-cot.
△ Less
Submitted 2 July, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
How Far Have We Gone in Vulnerability Detection Using Large Language Models
Authors:
Zeyu Gao,
Hao Wang,
Yuchen Zhou,
Wenyu Zhu,
Chao Zhang
Abstract:
As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still mi…
▽ More
As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.
△ Less
Submitted 22 December, 2023; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Opportunities for Gas-Phase Science at Short-Wavelength Free-Electron Lasers with Undulator-Based Polarization Control
Authors:
Markus Ilchen,
Enrico Allaria,
Primož Rebernik Ribič,
Heinz-Dieter Nuhn,
Alberto Lutman,
Evgeny Schneidmiller,
Markus Tischer,
Mikail Yurkov,
Marco Calvi,
Eduard Prat,
Sven Reiche,
Thomas Schmidt,
Gianluca Aldo Geloni,
Suren Karabekyan,
Jiawei Yan,
Svitozar Serkez,
Zhangfeng Gao,
Bangjie Deng,
Chao Feng,
Haixiao Deng,
Wolfram Helml,
Lars Funke,
Mats Larsson,
Vitali,
Zhaunerchyk
, et al. (22 additional authors not shown)
Abstract:
Free-electron lasers (FELs) are the world's most brilliant light sources with rapidly evolving technological capabilities in terms of ultrabright and ultrashort pulses over a large range of accessible photon energies. Their revolutionary and innovative developments have opened new fields of science regarding nonlinear light-matter interaction, the investigation of ultrafast processes from specific…
▽ More
Free-electron lasers (FELs) are the world's most brilliant light sources with rapidly evolving technological capabilities in terms of ultrabright and ultrashort pulses over a large range of accessible photon energies. Their revolutionary and innovative developments have opened new fields of science regarding nonlinear light-matter interaction, the investigation of ultrafast processes from specific observer sites, and approaches to imaging matter with atomic resolution. A core aspect of FEL science is the study of isolated and prototypical systems in the gas phase with the possibility of addressing well-defined electronic transitions or particular atomic sites in molecules. Notably for polarization-controlled short-wavelength FELs, the gas phase offers new avenues for investigations of nonlinear and ultrafast phenomena in spin orientated systems, for decoding the function of the chiral building blocks of life as well as steering reactions and particle emission dynamics in otherwise inaccessible ways. This roadmap comprises descriptions of technological capabilities of facilities worldwide, innovative diagnostics and instrumentation, as well as recent scientific highlights, novel methodology and mathematical modeling. The experimental and theoretical landscape of using polarization controllable FELs for dichroic light-matter interaction in the gas phase will be discussed and comprehensively outlined to stimulate and strengthen global collaborative efforts of all disciplines.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
An Improved Neural Network Model Based On CNN Using For Fruit Sugar Degree Detection
Authors:
Boyang Deng,
Xin Wen,
Zhan Gao
Abstract:
Artificial Intelligence(AI) widely applies in Image Classification and Recognition, Text Understanding and Natural Language Processing, which makes great progress. In this paper, we introduced AI into the fruit quality detection field. We designed a fruit sugar degree regression model using an Artificial Neural Network based on spectra of fruits within the visible/near-infrared(V/NIR)range. After…
▽ More
Artificial Intelligence(AI) widely applies in Image Classification and Recognition, Text Understanding and Natural Language Processing, which makes great progress. In this paper, we introduced AI into the fruit quality detection field. We designed a fruit sugar degree regression model using an Artificial Neural Network based on spectra of fruits within the visible/near-infrared(V/NIR)range. After analysis of fruit spectra, we innovatively proposed a new neural network structure: low layers consist of a Multilayer Perceptron(MLP), a middle layer is a 2-dimensional correlation matrix layer, and high layers consist of several Convolutional Neural Network(CNN) layers. In this study, we used fruit sugar value as a detection target, collecting two fruits called Gan Nan Navel and Tian Shan Pear as samples, doing experiments respectively, and comparing their results. We used Analysis of Variance(ANOVA) to evaluate the reliability of the dataset we collected. Then, we tried multiple strategies to process spectrum data, evaluating their effects. In this paper, we tried to add Wavelet Decomposition(WD) to reduce feature dimensions and a Genetic Algorithm(GA) to find excellent features. Then, we compared Neural Network models with traditional Partial Least Squares(PLS) based models. We also compared the neural network structure we designed(MLP-CNN) with other traditional neural network structures. In this paper, we proposed a new evaluation standard derived from dataset standard deviation(STD) for evaluating detection performance, validating the viability of using an artificial neural network model to do fruit sugar degree nondestructive detection.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Gradient-Map-Guided Adaptive Domain Generalization for Cross Modality MRI Segmentation
Authors:
Bingnan Li,
Zhitong Gao,
Xuming He
Abstract:
Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization. However, most existing methods have difficulty in handling local variations in domain shift and typically require a significant amount of data for training, which hinders their usage in practice. To address these problems, we propose a novel adaptive dom…
▽ More
Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization. However, most existing methods have difficulty in handling local variations in domain shift and typically require a significant amount of data for training, which hinders their usage in practice. To address these problems, we propose a novel adaptive domain generalization framework, which integrates a learning-free cross-domain representation based on image gradient maps and a class prior-informed test-time adaptation strategy for mitigating local domain shift. We validate our approach on two multi-modal MRI datasets with six cross-modal segmentation tasks. Across all the task settings, our method consistently outperforms competing approaches and shows a stable performance even with limited training data.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Role of the isospin diffusion on cluster transfer in $^{12,14}$C + $^{209}$Bi reactions
Authors:
Zepeng Gao,
Yinu Zhang,
Long Zhu,
Zehong Liao,
Yu Yang,
Chenchen Guo,
Jun Su
Abstract:
Heavy-ion collisions at near-barrier energies provide a crucial pathway for investigating nucleon correlations and clustering structures.
Recent experimental results showed that the valence neutrons in light projectiles obviously enhance the $α$ transfer. This finding is extremely puzzled and fascinating, because it violates the ground-state $Q$ value systematics unexpectedly. In this work, the…
▽ More
Heavy-ion collisions at near-barrier energies provide a crucial pathway for investigating nucleon correlations and clustering structures.
Recent experimental results showed that the valence neutrons in light projectiles obviously enhance the $α$ transfer. This finding is extremely puzzled and fascinating, because it violates the ground-state $Q$ value systematics unexpectedly. In this work, the time-dependent Hartree-Fock approach is utilized to investigate the cluster transfer. By comparing the reactions $^{12,14}$C + $^{209}$Bi, we discover that above puzzling behavior is because of the strong correlation between isospin diffusion and clustering. Our calculations clearly show that the equilibrium of neutron-to-proton ratio strongly inhibits the clustering. This work opens a prospect for investigating the clustering in open quantum system.
△ Less
Submitted 16 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Compressive Sensing-Based Grant-Free Massive Access for 6G Massive Communication
Authors:
Zhen Gao,
Malong Ke,
Yikun Mei,
Li Qiao,
Sheng Chen,
Derrick Wing Kwan Ng,
H. Vincent Poor
Abstract:
The advent of the sixth-generation (6G) of wireless communications has given rise to the necessity to connect vast quantities of heterogeneous wireless devices, which requires advanced system capabilities far beyond existing network architectures. In particular, such massive communication has been recognized as a prime driver that can empower the 6G vision of future ubiquitous connectivity, suppor…
▽ More
The advent of the sixth-generation (6G) of wireless communications has given rise to the necessity to connect vast quantities of heterogeneous wireless devices, which requires advanced system capabilities far beyond existing network architectures. In particular, such massive communication has been recognized as a prime driver that can empower the 6G vision of future ubiquitous connectivity, supporting Internet of Human-Machine-Things for which massive access is critical. This paper surveys the most recent advances toward massive access in both academic and industry communities, focusing primarily on the promising compressive sensing-based grant-free massive access paradigm. We first specify the limitations of existing random access schemes and reveal that the practical implementation of massive communication relies on a dramatically different random access paradigm from the current ones mainly designed for human-centric communications. Then, a compressive sensing-based grant-free massive access roadmap is presented, where the evolutions from single-antenna to large-scale antenna array-based base stations, from single-station to cooperative massive multiple-input multiple-output systems, and from unsourced to sourced random access scenarios are detailed. Finally, we discuss the key challenges and open issues to shed light on the potential future research directions of grant-free massive access.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Enabling Human-Centered AI: A Methodological Perspective
Authors:
Wei Xu,
Zaifeng Gao
Abstract:
Human-centered AI (HCAI) is a design philosophy that advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI to humans and avoid potential adverse impacts. While HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based…
▽ More
Human-centered AI (HCAI) is a design philosophy that advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI to humans and avoid potential adverse impacts. While HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.
△ Less
Submitted 14 November, 2023; v1 submitted 11 November, 2023;
originally announced November 2023.
-
AI-accelerated Discovery of Altermagnetic Materials
Authors:
Ze-Feng Gao,
Shuai Qu,
Bocheng Zeng,
Yang Liu,
Ji-Rong Wen,
Hao Sun,
Peng-Jie Guo,
Zhong-Yi Lu
Abstract:
Altermagnetism, a new magnetic phase, has been theoretically proposed and experimentally verified to be distinct from ferromagnetism and antiferromagnetism. Although altermagnets have been found to possess many exotic physical properties, the very limited availability of known altermagnetic materials (e.g., 14 confirmed materials) hinders the study of such properties. Hence, discovering more types…
▽ More
Altermagnetism, a new magnetic phase, has been theoretically proposed and experimentally verified to be distinct from ferromagnetism and antiferromagnetism. Although altermagnets have been found to possess many exotic physical properties, the very limited availability of known altermagnetic materials (e.g., 14 confirmed materials) hinders the study of such properties. Hence, discovering more types of altermagnetic materials is crucial for a comprehensive understanding of altermagnetism and thus facilitating new applications in the next-generation information technologies, e.g., storage devices and high-sensitivity sensors. Here, we report 25 new altermagnetic materials that cover metals, semiconductors, and insulators, discovered by an AI search engine unifying symmetry analysis, graph neural network pre-training, optimal transport theory, and first-principles electronic structure calculation. The wide range of electronic structural characteristics reveals that various novel physical properties manifest in these newly discovered altermagnetic materials, e.g., anomalous Hall effect, anomalous Kerr effect, and topological property. Noteworthy, we discovered 8 i-wave altermagnetic materials for the first time. Overall, the AI search engine performs much better than human experts and suggests a set of new altermagnetic materials with unique properties, outlining its potential for accelerated discovery of the materials with targeting properties.
△ Less
Submitted 12 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Electrically empowered microcomb laser
Authors:
Jingwei Ling,
Zhengdong Gao,
Shixin Xue,
Qili Hu,
Mingxiao Li,
Kaibo Zhang,
Usman A. Javid,
Raymond Lopez-Rios,
Jeremy Staffa,
Qiang Lin
Abstract:
Optical frequency comb underpins a wide range of applications from communication, metrology, to sensing. Its development on a chip-scale platform -- so called soliton microcomb -- provides a promising path towards system miniaturization and functionality integration via photonic integrated circuit (PIC) technology. Although extensively explored in recent years, challenges remain in key aspects of…
▽ More
Optical frequency comb underpins a wide range of applications from communication, metrology, to sensing. Its development on a chip-scale platform -- so called soliton microcomb -- provides a promising path towards system miniaturization and functionality integration via photonic integrated circuit (PIC) technology. Although extensively explored in recent years, challenges remain in key aspects of microcomb such as complex soliton initialization, high threshold, low power efficiency, and limited comb reconfigurability. Here we present an on-chip laser that directly outputs microcomb and resolves all these challenges, with a distinctive mechanism created from synergetic interaction among resonant electro-optic effect, optical Kerr effect, and optical gain inside the laser cavity. Realized with integration between a III-V gain chip and a thin-film lithium niobate (TFLN) PIC, the laser is able to directly emit mode-locked microcomb on demand with robust turnkey operation inherently built in, with individual comb linewidth down to 600 Hz, whole-comb frequency tuning rate exceeding $\rm 2.4\times10^{17}$ Hz/s, and 100% utilization of optical power fully contributing to comb generation. The demonstrated approach unifies architecture and operation simplicity, high-speed reconfigurability, and multifunctional capability enabled by TFLN PIC, opening up a great avenue towards on-demand generation of mode-locked microcomb that is expected to have profound impact on broad applications.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation
Authors:
Zhaowei Gao,
Mingyang Song,
Christopher Schroers,
Yang Zhang
Abstract:
Due to old CRT display technology and limited transmission bandwidth, early film and TV broadcasts commonly used interlaced scanning. This meant each field contained only half of the information. Since modern displays require full frames, this has spurred research into deinterlacing, i.e. restoring the missing information in legacy video content. In this paper, we present a deep-learning-based met…
▽ More
Due to old CRT display technology and limited transmission bandwidth, early film and TV broadcasts commonly used interlaced scanning. This meant each field contained only half of the information. Since modern displays require full frames, this has spurred research into deinterlacing, i.e. restoring the missing information in legacy video content. In this paper, we present a deep-learning-based method for deinterlacing animated and live-action content. Our proposed method supports bidirectional spatio-temporal information propagation across multiple scales to leverage information in both space and time. More specifically, we design a Flow-guided Refinement Block (FRB) which performs feature refinement including alignment, fusion, and rectification. Additionally, our method can process multiple fields simultaneously, reducing per-frame processing time, and potentially enabling real-time processing. Our experimental results demonstrate that our proposed method achieves superior performance compared to existing methods.
△ Less
Submitted 5 December, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Rare Event Probability Learning by Normalizing Flows
Authors:
Zhenggqi Gao,
Dinghuai Zhang,
Luca Daniel,
Duane S. Boning
Abstract:
A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow…
▽ More
A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow assisted importance sampling, termed NOFIS. NOFIS first learns a sequence of proposal distributions associated with predefined nested subset events by minimizing KL divergence losses. Next, it estimates the rare event probability by utilizing importance sampling in conjunction with the last proposal. The efficacy of our NOFIS method is substantiated through comprehensive qualitative visualizations, affirming the optimality of the learned proposal distribution, as well as a series of quantitative experiments encompassing $10$ distinct test cases, which highlight NOFIS's superiority over baseline approaches.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
DPSS-based Codebook Design for Near-Field XL-MIMO Channel Estimation
Authors:
Shicong Liu,
Xianghao Yu,
Zhen Gao,
Derrick Wing Kwan Ng
Abstract:
Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. While accurate channel estimation is essential for beamforming and data detection, the unique characteristics of near-field channels pose additional challenges to the effective acquisition of channel…
▽ More
Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. While accurate channel estimation is essential for beamforming and data detection, the unique characteristics of near-field channels pose additional challenges to the effective acquisition of channel state information. In this paper, we propose a novel codebook design, which allows efficient near-field channel estimation with significantly reduced codebook size. Specifically, we consider the eigen-problem based on the near-field electromagnetic wave transmission model. Moreover, we derive the general form of the eigenvectors associated with the near-field channel matrix, revealing their noteworthy connection to the discrete prolate spheroidal sequence (DPSS). Based on the proposed near-field codebook design, we further introduce a two-step channel estimation scheme. Simulation results demonstrate that the proposed codebook design not only achieves superior sparsification performance of near-field channels with a lower leakage effect, but also significantly improves the accuracy in compressive sensing channel estimation.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Adaptive operator learning for infinite-dimensional Bayesian inverse problems
Authors:
Zhiwei Gao,
Liang Yan,
Tao Zhou
Abstract:
The fundamental computational issues in Bayesian inverse problems (BIP) governed by partial differential equations (PDEs) stem from the requirement of repeated forward model evaluations. A popular strategy to reduce such costs is to replace expensive model simulations with computationally efficient approximations using operator learning, motivated by recent progress in deep learning. However, usin…
▽ More
The fundamental computational issues in Bayesian inverse problems (BIP) governed by partial differential equations (PDEs) stem from the requirement of repeated forward model evaluations. A popular strategy to reduce such costs is to replace expensive model simulations with computationally efficient approximations using operator learning, motivated by recent progress in deep learning. However, using the approximated model directly may introduce a modeling error, exacerbating the already ill-posedness of inverse problems. Thus, balancing between accuracy and efficiency is essential for the effective implementation of such approaches. To this end, we develop an adaptive operator learning framework that can reduce modeling error gradually by forcing the surrogate to be accurate in local areas. This is accomplished by adaptively fine-tuning the pre-trained approximate model with train- ing points chosen by a greedy algorithm during the posterior computational process. To validate our approach, we use DeepOnet to construct the surrogate and unscented Kalman inversion (UKI) to approximate the BIP solution, respectively. Furthermore, we present a rigorous convergence guarantee in the linear case using the UKI framework. The approach is tested on a number of benchmarks, including the Darcy flow, the heat source inversion problem, and the reaction-diffusion problem. The numerical results show that our method can significantly reduce computational costs while maintaining inversion accuracy.
△ Less
Submitted 4 March, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Authors:
Zhaoyang Liu,
Zeqiang Lai,
Zhangwei Gao,
Erfei Cui,
Ziheng Li,
Xizhou Zhu,
Lewei Lu,
Qifeng Chen,
Yu Qiao,
Jifeng Dai,
Wenhai Wang
Abstract:
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises…
▽ More
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a \textit{task decomposer} that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a \textit{Thoughts-on-Graph (ToG) paradigm} that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an \textit{execution engine with a rich toolbox} that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods. The code is at https://github.com/OpenGVLab/ControlLLM.
△ Less
Submitted 18 December, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Authors:
Yongxin Zhu,
Zhujin Gao,
Xinyuan Zhou,
Zhongyi Ye,
Linli Xu
Abstract:
While Diffusion Generative Models have achieved great success on image generation tasks, how to efficiently and effectively incorporate them into speech generation especially translation tasks remains a non-trivial problem. Specifically, due to the low information density of speech data, the transformed discrete speech unit sequence is much longer than the corresponding text transcription, posing…
▽ More
While Diffusion Generative Models have achieved great success on image generation tasks, how to efficiently and effectively incorporate them into speech generation especially translation tasks remains a non-trivial problem. Specifically, due to the low information density of speech data, the transformed discrete speech unit sequence is much longer than the corresponding text transcription, posing significant challenges to existing auto-regressive models. Furthermore, it is not optimal to brutally apply discrete diffusion on the speech unit sequence while disregarding the continuous space structure, which will degrade the generation performance significantly. In this paper, we propose a novel diffusion model by applying the diffusion forward process in the \textit{continuous} speech representation space, while employing the diffusion backward process in the \textit{discrete} speech unit space. In this way, we preserve the semantic structure of the continuous speech representation space in the diffusion process and integrate the continuous and discrete diffusion models. We conduct extensive experiments on the textless direct speech-to-speech translation task, where the proposed method achieves comparable results to the computationally intensive auto-regressive baselines (500 steps on average) with significantly fewer decoding steps (50 steps).
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
General Point Model with Autoencoding and Autoregressive
Authors:
Zhe Li,
Zhangyang Gao,
Cheng Tan,
Stan Z. Li,
Laurence T. Yang
Abstract:
The pre-training architectures of large language models encompass various types, including autoencoding models, autoregressive models, and encoder-decoder models. We posit that any modality can potentially benefit from a large language model, as long as it undergoes vector quantization to become discrete tokens. Inspired by GLM, we propose a General Point Model (GPM) which seamlessly integrates au…
▽ More
The pre-training architectures of large language models encompass various types, including autoencoding models, autoregressive models, and encoder-decoder models. We posit that any modality can potentially benefit from a large language model, as long as it undergoes vector quantization to become discrete tokens. Inspired by GLM, we propose a General Point Model (GPM) which seamlessly integrates autoencoding and autoregressive tasks in point cloud transformer. This model is versatile, allowing fine-tuning for downstream point cloud representation tasks, as well as unconditional and conditional generation tasks. GPM enhances masked prediction in autoencoding through various forms of mask padding tasks, leading to improved performance in point cloud understanding. Additionally, GPM demonstrates highly competitive results in unconditional point cloud generation tasks, even exhibiting the potential for conditional generation tasks by modifying the input's conditional information. Compared to models like Point-BERT, MaskPoint and PointMAE, our GPM achieves superior performance in point cloud understanding tasks. Furthermore, the integration of autoregressive and autoencoding within the same transformer underscores its versatility across different downstream tasks.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Graph Agent: Explicit Reasoning Agent for Graphs
Authors:
Qinyong Wang,
Zhenxiang Gao,
Rong Xu
Abstract:
Graph embedding methods such as Graph Neural Networks (GNNs) and Graph Transformers have contributed to the development of graph reasoning algorithms for various tasks on knowledge graphs. However, the lack of interpretability and explainability of graph embedding methods has limited their applicability in scenarios requiring explicit reasoning. In this paper, we introduce the Graph Agent (GA), an…
▽ More
Graph embedding methods such as Graph Neural Networks (GNNs) and Graph Transformers have contributed to the development of graph reasoning algorithms for various tasks on knowledge graphs. However, the lack of interpretability and explainability of graph embedding methods has limited their applicability in scenarios requiring explicit reasoning. In this paper, we introduce the Graph Agent (GA), an intelligent agent methodology of leveraging large language models (LLMs), inductive-deductive reasoning modules, and long-term memory for knowledge graph reasoning tasks. GA integrates aspects of symbolic reasoning and existing graph embedding methods to provide an innovative approach for complex graph reasoning tasks. By converting graph structures into textual data, GA enables LLMs to process, reason, and provide predictions alongside human-interpretable explanations. The effectiveness of the GA was evaluated on node classification and link prediction tasks. Results showed that GA reached state-of-the-art performance, demonstrating accuracy of 90.65%, 95.48%, and 89.32% on Cora, PubMed, and PrimeKG datasets, respectively. Compared to existing GNN and transformer models, GA offered advantages of explicit reasoning ability, free-of-training, easy adaption to various graph reasoning tasks
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
KirchhoffNet: A Scalable Ultra Fast Analog Neural Network
Authors:
Zhengqi Gao,
Fan-Keng Sun,
Ron Rohrer,
Duane S. Boning
Abstract:
In this paper, we leverage a foundational principle of analog electronic circuitry, Kirchhoff's current and voltage laws, to introduce a distinctive class of neural network models termed KirchhoffNet. Essentially, KirchhoffNet is an analog circuit that can function as a neural network, utilizing its initial node voltages as the neural network input and the node voltages at a specific time point as…
▽ More
In this paper, we leverage a foundational principle of analog electronic circuitry, Kirchhoff's current and voltage laws, to introduce a distinctive class of neural network models termed KirchhoffNet. Essentially, KirchhoffNet is an analog circuit that can function as a neural network, utilizing its initial node voltages as the neural network input and the node voltages at a specific time point as the output. The evolution of node voltages within the specified time is dictated by learnable parameters on the edges connecting nodes. We demonstrate that KirchhoffNet is governed by a set of ordinary differential equations (ODEs), and notably, even in the absence of traditional layers (such as convolution layers), it attains state-of-the-art performances across diverse and complex machine learning tasks. Most importantly, KirchhoffNet can be potentially implemented as a low-power analog integrated circuit, leading to an appealing property -- irrespective of the number of parameters within a KirchhoffNet, its on-chip forward calculation can always be completed within a short time. This characteristic makes KirchhoffNet a promising and fundamental paradigm for implementing large-scale neural networks, opening a new avenue in analog neural networks for AI.
△ Less
Submitted 6 May, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction
Authors:
Chih-Yu Lai,
Fan-Keng Sun,
Zhengqi Gao,
Jeffrey H. Lang,
Duane S. Boning
Abstract:
Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based recon…
▽ More
Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Adaptive Tuning of Robotic Polishing Skills based on Force Feedback Model
Authors:
Yu Wang,
Zhouyi Zheng,
Chen Chen,
Zezheng Wang,
Zhitao Gao,
Fangyu Peng,
Xiaowei Tang,
Rong Yan
Abstract:
Acquiring human skills offers an efficient approach to tackle complex task planning challenges. When performing a learned skill model for a continuous contact task, such as robot polishing in an uncertain environment, the robot needs to be able to adaptively modify the skill model to suit the environment and perform the desired task. The environmental perturbation of the polishing task is mainly r…
▽ More
Acquiring human skills offers an efficient approach to tackle complex task planning challenges. When performing a learned skill model for a continuous contact task, such as robot polishing in an uncertain environment, the robot needs to be able to adaptively modify the skill model to suit the environment and perform the desired task. The environmental perturbation of the polishing task is mainly reflected in the variation of contact force. Therefore, adjusting the task skill model by providing feedback on the contact force deviation is an effective way to meet the task requirements. In this study, a phase-modulated diagonal recurrent neural network (PMDRNN) is proposed for force feedback model learning in the robotic polishing task. The contact between the tool and the workpiece in the polishing task can be considered a dynamic system. In comparison to the existing feedforward neural network phase-modulated neural network (PMNN), PMDRNN combines the diagonal recurrent network structure with the phase-modulated neural network layer to improve the learning performance of the feedback model for dynamic systems. Specifically, data from real-world robot polishing experiments are used to learn the feedback model. PMDRNN demonstrates a significant reduction in the training error of the feedback model when compared to PMNN. Building upon this, the combination of PMDRNN and dynamic movement primitives (DMPs) can be used for real-time adjustment of skills for polishing tasks and effectively improve the robustness of the task skill model. Finally, real-world robotic polishing experiments are conducted to demonstrate the effectiveness of the approach.
△ Less
Submitted 22 November, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours
Authors:
Ze Gao,
Xiang Li,
Changkun Liu,
Xian Wang,
Anqi Wang,
Liang Yang,
Yuyang Wang,
Pan Hui,
Tristan Braud
Abstract:
We present VR PreM+, an innovative VR system designed to enhance web exploration beyond traditional computer screens. Unlike static 2D displays, VR PreM+ leverages 3D environments to create an immersive pre-learning experience. Using keyword-based information retrieval allows users to manage and connect various content sources in a dynamic 3D space, improving communication and data comparison. We…
▽ More
We present VR PreM+, an innovative VR system designed to enhance web exploration beyond traditional computer screens. Unlike static 2D displays, VR PreM+ leverages 3D environments to create an immersive pre-learning experience. Using keyword-based information retrieval allows users to manage and connect various content sources in a dynamic 3D space, improving communication and data comparison. We conducted preliminary and user studies that demonstrated efficient information retrieval, increased user engagement, and a greater sense of presence. These findings yielded three design guidelines for future VR information systems: display, interaction, and user-centric design. VR PreM+ bridges the gap between traditional web browsing and immersive VR, offering an interactive and comprehensive approach to information acquisition. It holds promise for research, education, and beyond.
△ Less
Submitted 1 November, 2023; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey
Authors:
Lijuan Zhou,
Xiang Meng,
Zhihuan Liu,
Mengqi Wu,
Zhimin Gao,
Pichao Wang
Abstract:
Human pose analysis has garnered significant attention within both the research community and practical applications, owing to its expanding array of uses, including gaming, video surveillance, sports performance analysis, and human-computer interactions, among others. The advent of deep learning has significantly improved the accuracy of pose capture, making pose-based applications increasingly p…
▽ More
Human pose analysis has garnered significant attention within both the research community and practical applications, owing to its expanding array of uses, including gaming, video surveillance, sports performance analysis, and human-computer interactions, among others. The advent of deep learning has significantly improved the accuracy of pose capture, making pose-based applications increasingly practical. This paper presents a comprehensive survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition.Pose estimation involves the determination of human joint positions from images or image sequences. Pose tracking is an emerging research direction aimed at generating consistent human pose trajectories over time. Action recognition, on the other hand, targets the identification of action types using pose estimation or tracking data. These three tasks are intricately interconnected, with the latter often reliant on the former. In this survey, we comprehensively review related works, spanning from single-person pose estimation to multi-person pose estimation, from 2D pose estimation to 3D pose estimation, from single image to video, from mining temporal context gradually to pose tracking, and lastly from tracking to pose-based action recognition. As a survey centered on the application of deep learning to pose analysis, we explicitly discuss both the strengths and limitations of existing techniques. Notably, we emphasize methodologies for integrating these three tasks into a unified framework within video sequences. Additionally, we explore the challenges involved and outline potential directions for future research.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Almost Optimal Locality Sensitive Orderings in Euclidean Space
Authors:
Zhimeng Gao,
Sariel Har-Peled
Abstract:
$ \newcommand{\Re}{\mathbb{R}} \newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}} \newcommand{\rad}{r} \newcommand{\Eps}{\Mh{\mathcal{E}}} \newcommand{\p}{\Mh{p}} \newcommand{\q}{\Mh{q}} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q} \newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}} \newcommand{\Polygon}{\mathsf{P}} \newcommand{\IntRange}[1]{[ #1 ]} \newcommand…
▽ More
$ \newcommand{\Re}{\mathbb{R}} \newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}} \newcommand{\rad}{r} \newcommand{\Eps}{\Mh{\mathcal{E}}} \newcommand{\p}{\Mh{p}} \newcommand{\q}{\Mh{q}} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q} \newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}} \newcommand{\Polygon}{\mathsf{P}} \newcommand{\IntRange}[1]{[ #1 ]} \newcommand{\Space}{\overline{\mathsf{m}}} \newcommand{\pth}[2][\!]{#1\left({#2}\right)} \newcommand{\polylog}{\mathrm{polylog}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\pt}{p} \newcommand{\distY}[2]{\left\| {#1} - {#2} \right\|} \newcommand{\ptq}{q} \newcommand{\pts}{s}$
For a parameter $\eps \in (0,1)$, we present a new construction of $\eps$-locality-sensitive orderings (<LSOs) in $\Re^d$ of size $M = O(\Eps^{d-1} \log \Eps)$, where $\Eps = 1/\eps$. This improves over previous work by a factor of $\Eps$, and is optimal up to a factor of $\log \Eps$. Such a set of LSOs has the property that for any two points, $\p, \q \in [0,1]^d$, there exist an order in the set such that all the points between $\p$ and $\q$ in the order are $\eps$-close to either $\p$ or $\q$.
The existence of such LSOs is a fundamental property of low dimensional Euclidean space, conceptually similar to the existence of well-separated pairs decomposition, so the question of how to compute (near) optimal construction of LSOs is quite natural.
As a consequence we get a flotilla of improved dynamic geometric algorithms, such as maintaining bichromatic closest pair, and spanners, among others. In particular, for geometric dynamic spanners the new result matches (up to the aforementioned $\log \Eps$ factor) the lower bound, Thus offering a near-optimal simple dynamic data-structure for maintaining spanners under insertions and deletions.
△ Less
Submitted 21 February, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Developing 3D Virtual Safety Risk Terrain for UAS Operations in Complex Urban Environments
Authors:
Zhenyu Gao,
John-Paul Clarke,
Javid Mardanov,
Karen Marais
Abstract:
Unmanned Aerial Systems (UAS), an integral part of the Advanced Air Mobility (AAM) vision, are capable of performing a wide spectrum of tasks in urban environments. The societal integration of UAS is a pivotal challenge, as these systems must operate harmoniously within the constraints imposed by regulations and societal concerns. In complex urban environments, UAS safety has been a perennial obst…
▽ More
Unmanned Aerial Systems (UAS), an integral part of the Advanced Air Mobility (AAM) vision, are capable of performing a wide spectrum of tasks in urban environments. The societal integration of UAS is a pivotal challenge, as these systems must operate harmoniously within the constraints imposed by regulations and societal concerns. In complex urban environments, UAS safety has been a perennial obstacle to their large-scale deployment. To mitigate UAS safety risk and facilitate risk-aware UAS operations planning, we propose a novel concept called \textit{3D virtual risk terrain}. This concept converts public risk constraints in an urban environment into 3D exclusion zones that UAS operations should avoid to adequately reduce risk to Entities of Value (EoV). To implement the 3D virtual risk terrain, we develop a conditional probability framework that comprehensively integrates most existing basic models for UAS ground risk. To demonstrate the concept, we build risk terrains on a Chicago downtown model and observe their characteristics under different conditions. We believe that the 3D virtual risk terrain has the potential to become a new routine tool for risk-aware UAS operations planning, urban airspace management, and policy development. The same idea can also be extended to other forms of societal impacts, such as noise, privacy, and perceived risk.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
AI Nushu: An Exploration of Language Emergence in Sisterhood -Through the Lens of Computational Linguistics
Authors:
Yuqian Sun,
Yuying Tang,
Ze Gao,
Zhijun Pan,
Chuyan Xu,
Yurou Chen,
Kejiang Qian,
Zhigang Wang,
Tristan Braud,
Chang Hee Lee,
Ali Asadipour
Abstract:
This paper presents "AI Nushu," an emerging language system inspired by Nushu (women's scripts), the unique language created and used exclusively by ancient Chinese women who were thought to be illiterate under a patriarchal society. In this interactive installation, two artificial intelligence (AI) agents are trained in the Chinese dictionary and the Nushu corpus. By continually observing their e…
▽ More
This paper presents "AI Nushu," an emerging language system inspired by Nushu (women's scripts), the unique language created and used exclusively by ancient Chinese women who were thought to be illiterate under a patriarchal society. In this interactive installation, two artificial intelligence (AI) agents are trained in the Chinese dictionary and the Nushu corpus. By continually observing their environment and communicating, these agents collaborate towards creating a standard writing system to encode Chinese. It offers an artistic interpretation of the creation of a non-western script from a computational linguistics perspective, integrating AI technology with Chinese cultural heritage and a feminist viewpoint.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction
Authors:
Yufei Huang,
Siyuan Li,
Jin Su,
Lirong Wu,
Odin Zhang,
Haitao Lin,
Jingqi Qi,
Zihan Liu,
Zhangyang Gao,
Yuyang Liu,
Jiangbin Zheng,
Stan. ZQ. Li
Abstract:
Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternati…
▽ More
Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternatives. However, we observed that current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. While similar phenomena have been extensively studied in general fields (e.g., Computer Vision) as model robustness, their impact on protein property prediction remains unexplored. In this paper, we first investigate the reason behind the performance decrease when utilizing predicted structures, attributing it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. Extensive experiments have shown that our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. The benchmark datasets and codes will be released to benefit the community.
△ Less
Submitted 19 October, 2023; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey
Authors:
Zijun Gao,
Lingbo Li
Abstract:
Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative mea…
▽ More
Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with....
△ Less
Submitted 9 April, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Staged Depthwise Correlation and Feature Fusion for Siamese Object Tracking
Authors:
Dianbo Ma,
Jianqiang Xiao,
Ziyan Gao,
Satoshi Yamane
Abstract:
In this work, we propose a novel staged depthwise correlation and feature fusion network, named DCFFNet, to further optimize the feature extraction for visual tracking. We build our deep tracker upon a siamese network architecture, which is offline trained from scratch on multiple large-scale datasets in an end-to-end manner. The model contains a core component, that is, depthwise correlation and…
▽ More
In this work, we propose a novel staged depthwise correlation and feature fusion network, named DCFFNet, to further optimize the feature extraction for visual tracking. We build our deep tracker upon a siamese network architecture, which is offline trained from scratch on multiple large-scale datasets in an end-to-end manner. The model contains a core component, that is, depthwise correlation and feature fusion module (correlation-fusion module), which facilitates model to learn a set of optimal weights for a specific object by utilizing ensembles of multi-level features from lower and higher layers and multi-channel semantics on the same layer. We combine the modified ResNet-50 with the proposed correlation-fusion layer to constitute the feature extractor of our model. In training process, we find the training of model become more stable, that benifits from the correlation-fusion module. For comprehensive evaluations of performance, we implement our tracker on the popular benchmarks, including OTB100, VOT2018 and LaSOT. Extensive experiment results demonstrate that our proposed method achieves favorably competitive performance against many leading trackers in terms of accuracy and precision, while satisfying the real-time requirements of applications.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Authors:
Dongsheng Jiang,
Yuchen Liu,
Songlin Liu,
Jin'e Zhao,
Hao Zhang,
Zhen Gao,
Xiaopeng Zhang,
Jin Li,
Hongkai Xiong
Abstract:
Multi-modal Large Language Models (MLLMs) have made significant strides in expanding the capabilities of Large Language Models (LLMs) through the incorporation of visual perception interfaces. Despite the emergence of exciting applications and the availability of diverse instruction tuning data, existing approaches often rely on CLIP or its variants as the visual branch, and merely extract feature…
▽ More
Multi-modal Large Language Models (MLLMs) have made significant strides in expanding the capabilities of Large Language Models (LLMs) through the incorporation of visual perception interfaces. Despite the emergence of exciting applications and the availability of diverse instruction tuning data, existing approaches often rely on CLIP or its variants as the visual branch, and merely extract features from the deep layers. However, these methods lack a comprehensive analysis of the visual encoders in MLLMs. In this paper, we conduct an extensive investigation into the effectiveness of different vision encoders within MLLMs. Our findings reveal that the shallow layer features of CLIP offer particular advantages for fine-grained tasks such as grounding and region understanding. Surprisingly, the vision-only model DINO, which is not pretrained with text-image alignment, demonstrates promising performance as a visual branch within MLLMs. By simply equipping it with an MLP layer for alignment, DINO surpasses CLIP in fine-grained related perception tasks. Building upon these observations, we propose a simple yet effective feature merging strategy, named COMM, that integrates CLIP and DINO with Multi-level features Merging, to enhance the visual capabilities of MLLMs. We evaluate COMM through comprehensive experiments on a wide range of benchmarks, including image captioning, visual question answering, visual grounding, and object hallucination. Experimental results demonstrate the superior performance of COMM compared to existing methods, showcasing its enhanced visual capabilities within MLLMs.
△ Less
Submitted 7 March, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Adaptive Storey's null proportion estimator
Authors:
Zijun Gao
Abstract:
False discovery rate (FDR) is a commonly used criterion in multiple testing and the Benjamini-Hochberg (BH) procedure is arguably the most popular approach with FDR guarantee. To improve power, the adaptive BH procedure has been proposed by incorporating various null proportion estimators, among which Storey's estimator has gained substantial popularity. The performance of Storey's estimator hinge…
▽ More
False discovery rate (FDR) is a commonly used criterion in multiple testing and the Benjamini-Hochberg (BH) procedure is arguably the most popular approach with FDR guarantee. To improve power, the adaptive BH procedure has been proposed by incorporating various null proportion estimators, among which Storey's estimator has gained substantial popularity. The performance of Storey's estimator hinges on a critical hyper-parameter, where a pre-fixed configuration lacks power and existing data-driven hyper-parameters compromise the FDR control. In this work, we propose a novel class of adaptive hyper-parameters and establish the FDR control of the associated BH procedure using a martingale argument. Within this class of data-driven hyper-parameters, we present a specific configuration designed to maximize the number of rejections and characterize the convergence of this proposal to the optimal hyper-parameter under a commonly-used mixture model. We evaluate our adaptive Storey's null proportion estimator and the associated BH procedure on extensive simulated data and a motivating protein dataset. Our proposal exhibits significant power gains when dealing with a considerable proportion of weak non-nulls or a conservative null distribution.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Pair-breaking scattering interference as a mechanism for superconducting gap modulation
Authors:
Zhi-Qiang Gao,
Yu-Ping Lin,
Dung-Hai Lee
Abstract:
We propose the "pair-breaking scattering interference" as a general source of coherence peak modulations in superconductors. Assuming this mechanism, we present a simple physical picture for the coherence peak modulations in overdoped cuprate Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ (Bi-2223), ferromagnetic iron pnictide EuRbFe$_4$As$_4$ (Eu-1144), and kagome metals $A$V$_3$Sb$_5$ ($A=$ K, Rb, and Cs).…
▽ More
We propose the "pair-breaking scattering interference" as a general source of coherence peak modulations in superconductors. Assuming this mechanism, we present a simple physical picture for the coherence peak modulations in overdoped cuprate Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ (Bi-2223), ferromagnetic iron pnictide EuRbFe$_4$As$_4$ (Eu-1144), and kagome metals $A$V$_3$Sb$_5$ ($A=$ K, Rb, and Cs). Specifically, we explain the wavevectors, the particle-hole symmetry, and the dependence on the internal or external Zeeman-field of the coherence peak modulations. This work is intended as a cautious reminder to the scientific community when asserting the existence of a pair density wave phenomenon in the absence of tunneling conductance modulations in the normal state.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Revisiting the Temporal Modeling in Spatio-Temporal Predictive Learning under A Unified View
Authors:
Cheng Tan,
Jue Wang,
Zhangyang Gao,
Siyuan Li,
Lirong Wu,
Jun Xia,
Stan Z. Li
Abstract:
Spatio-temporal predictive learning plays a crucial role in self-supervised learning, with wide-ranging applications across a diverse range of fields. Previous approaches for temporal modeling fall into two categories: recurrent-based and recurrent-free methods. The former, while meticulously processing frames one by one, neglect short-term spatio-temporal information redundancies, leading to inef…
▽ More
Spatio-temporal predictive learning plays a crucial role in self-supervised learning, with wide-ranging applications across a diverse range of fields. Previous approaches for temporal modeling fall into two categories: recurrent-based and recurrent-free methods. The former, while meticulously processing frames one by one, neglect short-term spatio-temporal information redundancies, leading to inefficiencies. The latter naively stack frames sequentially, overlooking the inherent temporal dependencies. In this paper, we re-examine the two dominant temporal modeling approaches within the realm of spatio-temporal predictive learning, offering a unified perspective. Building upon this analysis, we introduce USTEP (Unified Spatio-TEmporal Predictive learning), an innovative framework that reconciles the recurrent-based and recurrent-free methods by integrating both micro-temporal and macro-temporal scales. Extensive experiments on a wide range of spatio-temporal predictive learning demonstrate that USTEP achieves significant improvements over existing temporal modeling approaches, thereby establishing it as a robust solution for a wide range of spatio-temporal applications.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
VQPL: Vector Quantized Protein Language
Authors:
Zhangyang Gao,
Cheng Tan,
Stan Z. Li
Abstract:
Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. To represent protein sequence-structure as discrete symbols, we propose a VQProteinformer to project residue types and structures into a discrete space, supervise…
▽ More
Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. To represent protein sequence-structure as discrete symbols, we propose a VQProteinformer to project residue types and structures into a discrete space, supervised by a reconstruction loss to ensure information preservation. The sequential latent codes of residues introduce a new quantized protein language, transforming the protein sequence-structure into a unified modality. We demonstrate the potential of the created protein language on predictive and generative tasks, which may not only advance protein research but also establish a connection between the protein-related and NLP-related fields. The proposed method will be continually improved to unify more protein modalities, including text and point cloud.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Embodied Cognition Guides Virtual-Real Interaction Design to Help Yicheng Flower Drum Intangible Cultural Heritage Dissemination
Authors:
Yuhan Ma,
Weiran Zhao,
Xiaolin Zhang,
Ze Gao
Abstract:
In order to make the non-heritage culture of Yicheng Flower Drum more relevant to the trend of the digital era and promote its dissemination and inheritance, the design and application of gesture recognition and virtual reality technologies guided by embodied cognition theory in the process of non-heritage culture dissemination is studied. At the same time, it will enhance the interaction between…
▽ More
In order to make the non-heritage culture of Yicheng Flower Drum more relevant to the trend of the digital era and promote its dissemination and inheritance, the design and application of gesture recognition and virtual reality technologies guided by embodied cognition theory in the process of non-heritage culture dissemination is studied. At the same time, it will enhance the interaction between people and NRM culture, stimulate the audience's interest in understanding NRM and spreading NRM, and create awareness of preserving NRM culture. Using embodied cognition as a theoretical guide, expanding the unidirectional communication mode through human-computer interaction close to natural behavior and cooperating with multisensory information reception channels, so as to construct an embodied and immersive interactive atmosphere for the participants and enable them to naturally form the cognition and understanding of the traditional culture in the process of interaction. The dissemination of the non-heritage culture Yicheng Flower Drum can take the theory of embodied cognition as an entry point, and through the virtual and real scenes of Yicheng Flower Drum and the immersive experience, we can empower the interaction design of non-heritage culture dissemination of the virtual and real, and provide a new method for the research of digital design of non-heritage culture.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Importance of physical information on the prediction of heavy-ion fusion cross section with machine learning
Authors:
Zhilong Li,
Zepeng Gao,
Ling Liu,
Yongjia Wang,
Long Zhu,
Qingfeng Li
Abstract:
In this work, the Light Gradient Boosting Machine (LightGBM), which is a modern decision tree based machine-learning algorithm, is used to study the fusion cross section (CS) of heavy-ion reaction. Several basic quantities (e.g., mass number and proton number of projectile and target) and the CS obtained from phenomenological formula are fed into the LightGBM algorithm to predict the CS. It is fou…
▽ More
In this work, the Light Gradient Boosting Machine (LightGBM), which is a modern decision tree based machine-learning algorithm, is used to study the fusion cross section (CS) of heavy-ion reaction. Several basic quantities (e.g., mass number and proton number of projectile and target) and the CS obtained from phenomenological formula are fed into the LightGBM algorithm to predict the CS. It is found that, on the validation set, the mean absolute error (MAE) which measures the average magnitude of the absolute difference between $log_{10}$ of the predicted CS and experimental CS is 0.129 by only using the basic quantities as the input, this value is smaller than 0.154 obtained from the empirical coupled channel model. MAE can be further reduced to 0.08 by including an physical-informed input feature. The MAE on the test set (it consists of 280 data points from 18 reaction systems that not included in the training set) is about 0.19 and 0.53 by including and excluding the physical-informed feature, respectively. We further verify the LightGBM predictions by comparing the CS of $^{ 40,48}{\rm Ca }$+$^{78}{\rm Ni}$ obtained from the density-constrained time-dependent Hartree-Fock approach. Our study demonstrates the importance of physical information in predicting fusion cross section of heavy-ion reaction with machine learning.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Authors:
Zhihao Du,
Jiaming Wang,
Qian Chen,
Yunfei Chu,
Zhifu Gao,
Zerui Li,
Kai Hu,
Xiaohuan Zhou,
Jin Xu,
Ziyang Ma,
Wen Wang,
Siqi Zheng,
Chang Zhou,
Zhijie Yan,
Shiliang Zhang
Abstract:
Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks, and have shown great potential as backbones for audio-and-text large language models (LLMs). Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as a…
▽ More
Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks, and have shown great potential as backbones for audio-and-text large language models (LLMs). Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as automatic speech recognition, speech-to-text translation, and speech enhancement over models using continuous speech features. In this paper, we propose LauraGPT, a novel unified audio-and-text GPT-based LLM for audio recognition, understanding, and generation. LauraGPT is a versatile LLM that can process both audio and text inputs and generate outputs in either modalities. We propose a novel data representation that combines continuous and discrete features for audio: LauraGPT encodes input audio into continuous representations using an audio encoder and generates output audio from discrete codec codes. We propose a one-step codec vocoder to overcome the prediction challenge caused by the multimodal distribution of codec tokens. We fine-tune LauraGPT using supervised multi-task learning. Extensive experiments show that LauraGPT consistently achieves comparable to superior performance compared to strong baselines on a wide range of audio tasks related to content, semantics, paralinguistics, and audio-signal analysis, such as automatic speech recognition, speech-to-text translation, text-to-speech synthesis, speech enhancement, automated audio captioning, speech emotion recognition, and spoken language understanding.
△ Less
Submitted 2 July, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Multi-alpha Boson Gas state in Fusion Evaporation Reaction and Three-body Force
Authors:
Taofeng Wang,
Ziming Li,
R. B. Wiringa,
Minliang Liu,
Jiansong Wang,
Yanyun Yang,
Qinghua He,
Zhiyu Sun,
Chengjian Lin,
M. Assié,
Y. Ayyad,
D. Beaumel,
Zhen Bai,
Fangfang Duan,
Zhihao Gao,
Song Guo,
Yue Hu,
Wei Jiang,
F. Kobayashi,
Chengui Lu,
Junbing Ma,
Peng Ma,
P. Napolitani,
G. Verde,
Jianguo Wang
, et al. (11 additional authors not shown)
Abstract:
The experimental evidence for the $α$ Boson gas state in the $^{11}$C+$^{12}$C$\rightarrow$$^{23}$Mg$^{\ast}$ fusion evaporation reaction is presented. By measuring the $α$ emission spectrum with multiplicity 2 and 3, we provide insight into the existence of a three-body force among $α$ particles. The observed spectrum exhibited distinct tails corresponding to $α$ particles emitted in pairs and tr…
▽ More
The experimental evidence for the $α$ Boson gas state in the $^{11}$C+$^{12}$C$\rightarrow$$^{23}$Mg$^{\ast}$ fusion evaporation reaction is presented. By measuring the $α$ emission spectrum with multiplicity 2 and 3, we provide insight into the existence of a three-body force among $α$ particles. The observed spectrum exhibited distinct tails corresponding to $α$ particles emitted in pairs and triplets consistent well with the model-calculations of AV18-UX and chiral effective field theory of NV2-3-la*, indicating the formation of $α$ clusters with three-body force in the Boson gas state.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Aspect of Clusters Correlation at Light Nuclei Excited State
Authors:
Ziming Li,
Jie Zhu,
Taofeng Wang,
Minliang Liu,
Jiansong Wang,
Yanyun Yang,
Chengjian Lin,
Zhiyu Sun,
Qinghua He,
M. Assié,
Y. Ayyad,
D. Beaumel,
Zhen Bai,
Fangfang Duan,
Zhihao Gao,
Song Guo,
Yue Hu,
Wei Jiang,
F. Kobayashi,
Chengui Lu,
Junbing Ma,
Peng Ma,
P. Napolitani,
G. Verde,
Jianguo Wang
, et al. (11 additional authors not shown)
Abstract:
The correlation of $αα$ was probed via measuring the transverse momentum $p_{T}$ and width $δp_{T}$ of one $α$, for the first time, which represents the spatial and dynamical essentialities of the initial coupling state in $^{8}$Be nucleus. The weighted interaction vertex of 3$α$ reflected by the magnitudes of their relative momentums and relative emission angles proves the isosceles triangle conf…
▽ More
The correlation of $αα$ was probed via measuring the transverse momentum $p_{T}$ and width $δp_{T}$ of one $α$, for the first time, which represents the spatial and dynamical essentialities of the initial coupling state in $^{8}$Be nucleus. The weighted interaction vertex of 3$α$ reflected by the magnitudes of their relative momentums and relative emission angles proves the isosceles triangle configuration for 3$α$ at the high excited energy analogous Hoyle states.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Variation of Tensor Force due to Nuclear Medium Effect
Authors:
Ziming Li,
Jie Zhu,
Taofeng Wang,
Minliang Liu,
Jiansong Wang,
Yanyun Yang,
Chengjian Lin,
Zhiyu Sun,
Qinghua He,
M. Assié,
Y. Ayyad,
D. Beaumel,
Zhen Bai,
Fangfang Duan,
Zhihao Gao,
Song Guo,
Yue Hu,
Wei Jiang,
F. Kobayashi,
Chengui Lu,
Junbing Ma,
Peng Ma,
P. Napolitani,
G. Verde,
Jianguo Wang
, et al. (11 additional authors not shown)
Abstract:
The enhancement of $J^π(T)$=3$^{+}$(0) state with isospin $T=0$ excited by the tensor force in the free $^{6}$Li nucleus has been observed, for the first time, relative to a shrinkable excitation in the $^{6}$Li cluster component inside its host nucleus. Comparatively, the excitation of $J^π(T)$=0$^{+}$(1) state with isospin $T=1$ for these two $^{6}$Li formations take on an approximately equal ex…
▽ More
The enhancement of $J^π(T)$=3$^{+}$(0) state with isospin $T=0$ excited by the tensor force in the free $^{6}$Li nucleus has been observed, for the first time, relative to a shrinkable excitation in the $^{6}$Li cluster component inside its host nucleus. Comparatively, the excitation of $J^π(T)$=0$^{+}$(1) state with isospin $T=1$ for these two $^{6}$Li formations take on an approximately equal excitation strength. The mechanism of such tensor force effect was proposed due to the intensive nuclear medium role on isospin $T$=0 state.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Alpha-Fair Routing in Urban Air Mobility with Risk-Aware Constraints
Authors:
Yue Yu,
Zhenyu Gao,
Sarah H. Q. Li,
Qinshuang Wei,
John-Paul Clarke,
Ufuk Topcu
Abstract:
In the vision of urban air mobility, air transport systems serve the demands of urban communities by routing flight traffic in networks formed by vertiports and flight corridors. We develop a routing algorithm to ensure that the air traffic flow fairly serves the demand of multiple communities subject to stochastic network capacity constraints. This algorithm guarantees that the flight traffic vol…
▽ More
In the vision of urban air mobility, air transport systems serve the demands of urban communities by routing flight traffic in networks formed by vertiports and flight corridors. We develop a routing algorithm to ensure that the air traffic flow fairly serves the demand of multiple communities subject to stochastic network capacity constraints. This algorithm guarantees that the flight traffic volume allocated to different communities satisfies the \emph{alpha-fairness conditions}, a commonly used family of fairness conditions in resource allocation. It further ensures robust satisfaction of stochastic network capacity constraints by bounding the coherent risk measures of capacity violation. We prove that implementing the proposed algorithm is equivalent to solving a convex optimization problem. We demonstrate the proposed algorithm using a case study based on the city of Austin. Compared with one that maximizes the total served demands, the proposed algorithm promotes even distributions of served demands for different communities.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.