-
A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations
Authors:
Xiangzhu Kong,
Tianqi Ning,
Hao Huang,
Zhijian Ou
Abstract:
Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is limited in exploration. Additionally, recent studies call attention to the gap between in-distribution (ID) and out-of-distribution (OOD) tests and doing realistic evaluations. This paper focuses on two research problems: realizing str…
▽ More
Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is limited in exploration. Additionally, recent studies call attention to the gap between in-distribution (ID) and out-of-distribution (OOD) tests and doing realistic evaluations. This paper focuses on two research problems: realizing streaming ME2E ASR and improving OOD generalization. We propose the CUSIDE-array method, which integrates the recent CUSIDE methodology (Chunking, Simulating Future Context and Decoding) into the neural beamformer approach of ME2E ASR. It enables streaming processing of both front-end and back-end with a total latency of 402ms. The CUSIDE-array ME2E models are shown to achieve superior streaming results in both ID and OOD tests. Realistic evaluations confirm the advantage of CUSIDE-array in its capability to consume single-channel data to improve OOD generalization via back-end pre-training and ME2E fine-tuning.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework
Authors:
Haoqin Sun,
Shiwan Zhao,
Shaokai Li,
Xiangyu Kong,
Xuechen Wang,
Aobo Kong,
Jiaming Zhou,
Yong Chen,
Wenjia Zeng,
Yong Qin
Abstract:
Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle…
▽ More
Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle missing modalities and enhance emotion recognition. This framework utilizes unsupervised distribution-based contrastive learning to align heterogeneous modal distributions, reducing discrepancies and modeling semantic uncertainty effectively. The reconstruction phase applies normalizing flow models to transform these aligned distributions and recover missing modalities. The refinement phase employs supervised point-based contrastive learning to disrupt semantic correlations and accentuate emotional traits, thereby enriching the affective content of the reconstructed representations. Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the superior performance of CM-ARR under conditions of both missing and complete modalities. Notably, averaged across six scenarios of missing modalities, CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the MSP-IMPROV dataset.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training
Authors:
Yunshu Wu,
Yingtao Luo,
Xianghao Kong,
Evangelos E. Papalexakis,
Greg Ver Steeg
Abstract:
Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in…
▽ More
Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Authors:
Xianghao Kong,
Jinyu Chen,
Wenguan Wang,
Hang Su,
Xiaolin Hu,
Yi Yang,
Si Liu
Abstract:
Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation…
▽ More
Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation environment. Leveraging the capabilities of Large Language Models (LLMs), we propose C-Instructor, which utilizes the chain-of-thought-style prompt for style-controllable and content-controllable instruction generation. Firstly, we propose a Chain of Thought with Landmarks (CoTL) mechanism, which guides the LLM to identify key landmarks and then generate complete instructions. CoTL renders generated instructions more accessible to follow and offers greater controllability over the manipulation of landmark objects. Furthermore, we present a Spatial Topology Modeling Task to facilitate the understanding of the spatial structure of the environment. Finally, we introduce a Style-Mixed Training policy, harnessing the prior knowledge of LLMs to enable style control for instruction generation based on different prompts within a single model instance. Extensive experiments demonstrate that instructions generated by C-Instructor outperform those generated by previous methods in text metrics, navigation guidance evaluation, and user studies.
△ Less
Submitted 16 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy
Authors:
Zhenyu Guan,
Xiangyu Kong,
Fangwei Zhong,
Yizhou Wang
Abstract:
Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves…
▽ More
Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Recently, LLM agents have shown their potential for extending the boundary of previous agents on a couple of applications, however, it is still not enough to handle a very long planning period in a complex multi-agent environment. Empowered with cutting-edge LLM technology, we make the first stab to explore AI's upper bound towards a human-like agent for such a highly comprehensive multi-agent mission by combining three core and essential capabilities for stronger LLM-based societal agents: 1) strategic planner with memory and reflection; 2) goal-oriented negotiate with social reasoning; 3) augmenting memory by self-play games to self-evolving without any human in the loop.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Causality-driven Sequence Segmentation for Enhancing Multiphase Industrial Process Data Analysis and Soft Sensing
Authors:
Yimeng He,
Le Yao,
Xinmin Zhang,
Xiangyin Kong,
Zhihuan Song
Abstract:
The dynamic characteristics of multiphase industrial processes present significant challenges in the field of industrial big data modeling. Traditional soft sensing models frequently neglect the process dynamics and have difficulty in capturing transient phenomena like phase transitions. To address this issue, this article introduces a causality-driven sequence segmentation (CDSS) model. This mode…
▽ More
The dynamic characteristics of multiphase industrial processes present significant challenges in the field of industrial big data modeling. Traditional soft sensing models frequently neglect the process dynamics and have difficulty in capturing transient phenomena like phase transitions. To address this issue, this article introduces a causality-driven sequence segmentation (CDSS) model. This model first identifies the local dynamic properties of the causal relationships between variables, which are also referred to as causal mechanisms. It then segments the sequence into different phases based on the sudden shifts in causal mechanisms that occur during phase transitions. Additionally, a novel metric, similarity distance, is designed to evaluate the temporal consistency of causal mechanisms, which includes both causal similarity distance and stable similarity distance. The discovered causal relationships in each phase are represented as a temporal causal graph (TCG). Furthermore, a soft sensing model called temporal-causal graph convolutional network (TC-GCN) is trained for each phase, by using the time-extended data and the adjacency matrix of TCG. The numerical examples are utilized to validate the proposed CDSS model, and the segmentation results demonstrate that CDSS has excellent performance on segmenting both stable and unstable multiphase series. Especially, it has higher accuracy in separating non-stationary time series compared to other methods. The effectiveness of the proposed CDSS model and the TC-GCN model is also verified through a penicillin fermentation process. Experimental results indicate that the breakpoints discovered by CDSS align well with the reaction mechanisms and TC-GCN significantly has excellent predictive accuracy.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Light-SLAM: A Robust Deep-Learning Visual SLAM System Based on LightGlue under Challenging Lighting Conditions
Authors:
Zhiqi Zhao,
Chang Wu,
Xiaotong Kong,
Zejie Lv,
Xiaoqi Du,
Qiyan Li
Abstract:
Simultaneous Localization and Mapping (SLAM) has become a critical technology for intelligent transportation systems and autonomous robots and is widely used in autonomous driving. However, traditional manual feature-based methods in challenging lighting environments make it difficult to ensure robustness and accuracy. Some deep learning-based methods show potential but still have significant draw…
▽ More
Simultaneous Localization and Mapping (SLAM) has become a critical technology for intelligent transportation systems and autonomous robots and is widely used in autonomous driving. However, traditional manual feature-based methods in challenging lighting environments make it difficult to ensure robustness and accuracy. Some deep learning-based methods show potential but still have significant drawbacks. To address this problem, we propose a novel hybrid system for visual SLAM based on the LightGlue deep learning network. It uses deep local feature descriptors to replace traditional hand-crafted features and a more efficient and accurate deep network to achieve fast and precise feature matching. Thus, we use the robustness of deep learning to improve the whole system. We have combined traditional geometry-based approaches to introduce a complete visual SLAM system for monocular, binocular, and RGB-D sensors. We thoroughly tested the proposed system on four public datasets: KITTI, EuRoC, TUM, and 4Season, as well as on actual campus scenes. The experimental results show that the proposed method exhibits better accuracy and robustness in adapting to low-light and strongly light-varying environments than traditional manual features and deep learning-based methods. It can also run on GPU in real time.
△ Less
Submitted 10 May, 2024;
originally announced July 2024.
-
Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models
Authors:
Xiangrui Kong,
Wenxiao Zhang,
Jin Hong,
Thomas Braunl
Abstract:
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and solving mathematical problems, leading to advancements in various fields. We propose an LLM-embodied path planning framework for mobile agents, focusing on solving high-level coverage path planning issues and low-level control. Our proposed multi-layer architecture uses prompted LLMs in the…
▽ More
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and solving mathematical problems, leading to advancements in various fields. We propose an LLM-embodied path planning framework for mobile agents, focusing on solving high-level coverage path planning issues and low-level control. Our proposed multi-layer architecture uses prompted LLMs in the path planning phase and integrates them with the mobile agents' low-level actuators. To evaluate the performance of various LLMs, we propose a coverage-weighted path planning metric to assess the performance of the embodied models. Our experiments show that the proposed framework improves LLMs' spatial inference abilities. We demonstrate that the proposed multi-layer framework significantly enhances the efficiency and accuracy of these tasks by leveraging the natural language understanding and generative capabilities of LLMs. Our experiments show that this framework can improve LLMs' 2D plane reasoning abilities and complete coverage path planning tasks. We also tested three LLM kernels: gpt-4o, gemini-1.5-flash, and claude-3.5-sonnet. The experimental results show that claude-3.5 can complete the coverage planning task in different scenarios, and its indicators are better than those of the other models.
△ Less
Submitted 3 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
A photon-interfaced ten qubit quantum network node
Authors:
M. Canteri,
Z. X. Koong,
J. Bate,
A. Winkler,
V. Krutyanskiy,
B. P. Lanyon
Abstract:
We entangle each individual matter-qubit in a register of ten to a separate travelling photon. The qubits are encoded in a string of cotrapped atomic ions. By switching the trap confinement, ions are brought one at a time into the waist of an optical cavity and emit a photon via a laser-driven cavity-mediated Raman transition. The result is a train of photonic-qubits, each near-maximally entangled…
▽ More
We entangle each individual matter-qubit in a register of ten to a separate travelling photon. The qubits are encoded in a string of cotrapped atomic ions. By switching the trap confinement, ions are brought one at a time into the waist of an optical cavity and emit a photon via a laser-driven cavity-mediated Raman transition. The result is a train of photonic-qubits, each near-maximally entangled by their polarisation with a different ion-qubit in the string. An average ion-photon Bell state fidelity of 92(1)% is achieved, for an average probability for detecting each single photon of 9.1(8)%. The technique is directly scalable to larger ion-qubit registers and opens up the near-term possibility of entangling distributed networks of trapped-ion quantum processors, sensing arrays and clocks.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results
Authors:
Xin Jin,
Chunle Guo,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangchen Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Ruoqi Li,
Chang Liu,
Ziyi Wang,
Yao Du,
Jingjing Yang,
Long Bao,
Heng Sun,
Xiangyu Kong,
Xiaoxia Xing,
Jinlong Wu,
Yuanyang Xue,
Hyunhee Park,
Sejun Song,
Changho Kim,
Jingfan Tan
, et al. (17 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Few-shot RAW Image Denoising track on MIPI 2024. In total, 165 participants were successfully registered, and 7 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art erformance on Few-shot RAW Image Denoising. More details of this challenge and the link to the dataset can be found at https://mipichallenge.org/MIPI2024.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A Superalignment Framework in Autonomous Driving with Large Language Models
Authors:
Xiangrui Kong,
Thomas Braunl,
Marco Fahmi,
Yue Wang
Abstract:
Over the last year, significant advancements have been made in the realms of large language models (LLMs) and multi-modal large language models (MLLMs), particularly in their application to autonomous driving. These models have showcased remarkable abilities in processing and interacting with complex information. In autonomous driving, LLMs and MLLMs are extensively used, requiring access to sensi…
▽ More
Over the last year, significant advancements have been made in the realms of large language models (LLMs) and multi-modal large language models (MLLMs), particularly in their application to autonomous driving. These models have showcased remarkable abilities in processing and interacting with complex information. In autonomous driving, LLMs and MLLMs are extensively used, requiring access to sensitive vehicle data such as precise locations, images, and road conditions. These data are transmitted to an LLM-based inference cloud for advanced analysis. However, concerns arise regarding data security, as the protection against data and privacy breaches primarily depends on the LLM's inherent security measures, without additional scrutiny or evaluation of the LLM's inference outputs. Despite its importance, the security aspect of LLMs in autonomous driving remains underexplored. Addressing this gap, our research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach. This framework is designed to safeguard sensitive information associated with autonomous vehicles from potential leaks, while also ensuring that LLM outputs adhere to driving regulations and align with human values. It includes mechanisms to filter out irrelevant queries and verify the safety and reliability of LLM outputs. Utilizing this framework, we evaluated the security, privacy, and cost aspects of eleven large language model-driven autonomous driving cues. Additionally, we performed QA tests on these driving prompts, which successfully demonstrated the framework's efficacy.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Prevalence of non-standard collapsing of strong Langmuir turbulence in solar corona plasmas
Authors:
Yaokun Li,
Haomin Sun,
Hao Ning,
Sulan Ni,
Xiangliang Kong,
Jiansen He,
Yao Chen
Abstract:
We present a fully-kinetic simulation of the full life cycle of strong Langmuir turbulence (SLT) excited by electron beams that are accelerated under the solar corona conditions. We find that (1) most packets ($\sim$80%) are affected by their neighbors during their collapse, as a result, their spatial scale variations present non-standard evolutionary features, i.e., deviating away from what was p…
▽ More
We present a fully-kinetic simulation of the full life cycle of strong Langmuir turbulence (SLT) excited by electron beams that are accelerated under the solar corona conditions. We find that (1) most packets ($\sim$80%) are affected by their neighbors during their collapse, as a result, their spatial scale variations present non-standard evolutionary features, i.e., deviating away from what was predicted by the Zakharov model; (2) the collapsing cavity is too shallow to trap the wave packet due to the growth of the Coulomb force, as a result a majority ($\sim$70%) of the packet energy runs away and a secondary localization may occur. The study indicates that the non-standard Langmuir collapse may play an important role in coronal plasmas interacting with an intense electron beam, that may be eventually confirmed by humanity's first mission to fly through the corona.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Large Language Model-guided Document Selection
Authors:
Xiang Kong,
Tom Gunter,
Ruoming Pang
Abstract:
Large Language Model (LLM) pre-training exhausts an ever growing compute budget, yet recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. Inspired by efforts suggesting that domain-specific training document selection is in fact an interpretable process [Gunasekar et al., 2023], as well as research showing that instruc…
▽ More
Large Language Model (LLM) pre-training exhausts an ever growing compute budget, yet recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. Inspired by efforts suggesting that domain-specific training document selection is in fact an interpretable process [Gunasekar et al., 2023], as well as research showing that instruction-finetuned LLMs are adept zero-shot data labelers [Gilardi et al.,2023], we explore a promising direction for scalable general-domain document selection; employing a prompted LLM as a document grader, we distill quality labels into a classifier model, which is applied at scale to a large, and already heavily-filtered, web-crawl-derived corpus autonomously. Following the guidance of this classifier, we drop 75% of the corpus and train LLMs on the remaining data. Results across multiple benchmarks show that: 1. Filtering allows us to quality-match a model trained on the full corpus across diverse benchmarks with at most 70% of the FLOPs, 2. More capable LLM labelers and classifier models lead to better results that are less sensitive to the labeler's prompt, 3. In-context learning helps to boost the performance of less-capable labeling models. In all cases we use open-source datasets, models, recipes, and evaluation frameworks, so that results can be reproduced by the community.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
An adaptive parameter estimator for poor-quality spectral data of white dwarfs
Authors:
Duo Xie,
Jiangchuan Zhang,
Yude Bu,
Zhenping Yi,
Meng Liu,
Xiaoming Kong
Abstract:
White dwarfs represent the end stage for 97% of stars, making precise parameter measurement crucial for understanding stellar evolution. Traditional estimation methods involve fitting spectra or photometry, which require high-quality data. In recent years, machine learning has played a crucial role in processing spectral data due to its speed, automation, and accuracy. However, two common issues h…
▽ More
White dwarfs represent the end stage for 97% of stars, making precise parameter measurement crucial for understanding stellar evolution. Traditional estimation methods involve fitting spectra or photometry, which require high-quality data. In recent years, machine learning has played a crucial role in processing spectral data due to its speed, automation, and accuracy. However, two common issues have been identified. First, most studies rely on data with high signal-to-noise ratios (SNR > 10), leaving many poor-quality datasets underutilized. Second, existing machine learning models, primarily based on convolutional networks, recurrent networks, and their variants, cannot simultaneously capture both the spatial and sequential information of spectra. To address these challenges, we designed the Estimator Network (EstNet), an advanced algorithm integrating multiple techniques, including Residual Networks, Squeeze and Excitation Attention, Gated Recurrent Units, Adaptive Loss, and Monte-Carlo Dropout Layers. We conducted parameter estimation on 5,965 poor-quality white dwarf spectra (R~1800, SNR~1.17), achieving average percentage errors of 14.86% for effective temperature and 3.97% for surface gravity. These results are significantly superior to other mainstream algorithms and consistent with the outcomes of traditional theoretical spectrum fitting methods. In the future, our algorithms will be applied for large-scale parameter estimation on the Chinese Space Station Telescope and the Large Synoptic Survey Telescope.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models
Authors:
Qiang Sun,
Yuanyi Luo,
Wenxiao Zhang,
Sirui Li,
Jichunyang Li,
Kai Niu,
Xiangrui Kong,
Wei Liu
Abstract:
Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual…
▽ More
Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual appeals that reduce the human cognitive load, become the winning candidate for heterogeneous data integration and knowledge representation.
In this paper, we introduce Docs2KG, a novel framework designed to extract multimodal information from diverse and heterogeneous unstructured documents, including emails, web pages, PDF files, and Excel files. Dynamically generates a unified knowledge graph that represents the extracted key information, Docs2KG enables efficient querying and exploration of document data lakes. Unlike existing approaches that focus on domain-specific data sources or pre-designed schemas, Docs2KG offers a flexible and extensible solution that can adapt to various document structures and content types. The proposed framework unifies data processing supporting a multitude of downstream tasks with improved domain interpretability. Docs2KG is publicly accessible at https://docs2kg.ai4wa.com, and a demonstration video is available at https://docs2kg.ai4wa.com/Video.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Energetic Electrons Accelerated and Trapped in a Magnetic Bottle above a Solar Flare Arcade
Authors:
Bin Chen,
Xiangliang Kong,
Sijie Yu,
Chengcai Shen,
Xiaocan Li,
Fan Guo,
Yixian Zhang,
Lindsay Glesener,
Säm Krucker
Abstract:
Where and how flares efficiently accelerate charged particles remains an unresolved question. Recent studies revealed that a "magnetic bottle" structure, which forms near the bottom of a large-scale reconnection current sheet above the flare arcade, is an excellent candidate for confining and accelerating charged particles. However, further understanding its role requires linking the various obser…
▽ More
Where and how flares efficiently accelerate charged particles remains an unresolved question. Recent studies revealed that a "magnetic bottle" structure, which forms near the bottom of a large-scale reconnection current sheet above the flare arcade, is an excellent candidate for confining and accelerating charged particles. However, further understanding its role requires linking the various observational signatures to the underlying coupled plasma and particle processes. Here we present the first study combining multi-wavelength observations with data-informed macroscopic magnetohydrodynamics and particle modeling in a realistic eruptive flare geometry. The presence of an above-the-looptop magnetic bottle structure is strongly supported by the observations, which feature not only a local minimum of magnetic field strength but also abruptly slowing down plasma downflows. It also coincides with a compact hard X-ray source and an extended microwave source that bestrides above the flare arcade. Spatially resolved spectral analysis suggests that nonthermal electrons are highly concentrated in this region. Our model returns synthetic emission signatures that are well-matched to the observations. The results suggest that the energetic electrons are strongly trapped in the magnetic bottle region due to turbulence, with only a small fraction managing to escape. The electrons are primarily accelerated by plasma compression and facilitated by a fast-mode termination shock via the Fermi mechanism. Our results provide concrete support for the magnetic bottle as the primary electron acceleration site in eruptive solar flares. They also offer new insights into understanding the previously reported small population of flare-accelerated electrons entering interplanetary space.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Authors:
Yechen Xu,
Xinhao Kong,
Tingjun Chen,
Danyang Zhuo
Abstract:
The complexity of large language model (LLM) serving workloads has substantially increased due to the integration with external tool invocations, such as ChatGPT plugins. In this paper, we identify a new opportunity for efficient LLM serving for requests that trigger tools: tool partial execution alongside LLM decoding. To this end, we design Conveyor, an efficient LLM serving system optimized for…
▽ More
The complexity of large language model (LLM) serving workloads has substantially increased due to the integration with external tool invocations, such as ChatGPT plugins. In this paper, we identify a new opportunity for efficient LLM serving for requests that trigger tools: tool partial execution alongside LLM decoding. To this end, we design Conveyor, an efficient LLM serving system optimized for handling requests involving external tools. We introduce a novel interface for tool developers to expose partial execution opportunities to the LLM serving system and a request scheduler that facilitates partial tool execution. Our results demonstrate that tool partial execution can improve request completion latency by up to 38.8%.
△ Less
Submitted 4 June, 2024; v1 submitted 29 May, 2024;
originally announced June 2024.
-
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Authors:
Xianzhi Du,
Tom Gunter,
Xiang Kong,
Mark Lee,
Zirui Wang,
Aonan Zhang,
Nan Du,
Ruoming Pang
Abstract:
Mixture-of-Experts (MoE) enjoys performance gain by increasing model capacity while keeping computation cost constant. When comparing MoE to dense models, prior work typically adopt the following setting: 1) use FLOPs or activated parameters as a measure of model complexity; 2) train all models to the same number of tokens. We argue that this setting favors MoE as FLOPs and activated parameters do…
▽ More
Mixture-of-Experts (MoE) enjoys performance gain by increasing model capacity while keeping computation cost constant. When comparing MoE to dense models, prior work typically adopt the following setting: 1) use FLOPs or activated parameters as a measure of model complexity; 2) train all models to the same number of tokens. We argue that this setting favors MoE as FLOPs and activated parameters do not accurately measure the communication overhead in sparse layers, leading to a larger actual training budget for MoE. In this work, we revisit the settings by adopting step time as a more accurate measure of model complexity, and by determining the total compute budget under the Chinchilla compute-optimal settings. To efficiently run MoE on modern accelerators, we adopt a 3D sharding method that keeps the dense-to-MoE step time increase within a healthy range. We evaluate MoE and dense LLMs on a set of nine 0-shot and two 1-shot English tasks, as well as MMLU 5-shot and GSM8K 8-shot across three model scales at 6.4B, 12.6B, and 29.6B. Experimental results show that even under these settings, MoE consistently outperform dense LLMs on the speed-accuracy trade-off curve with meaningful gaps. Our full model implementation and sharding strategy has been released at~\url{https://github.com/apple/axlearn}
△ Less
Submitted 28 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl
Authors:
Zheyu Lin,
Ning Jiang,
Tinggui Wang,
Xu Kong,
Dongyue Li,
Han He,
Yibo Wang,
Jiazheng Zhu,
Wentao Li,
Ji-an Jiang,
Avinash Singh,
Rishabh Singh Teja,
D. K. Sahu,
Chichuan Jin,
Keiichi Maeda,
Shifeng Huang
Abstract:
The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m…
▽ More
The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from multiple stars can also produce similar flares. In this letter, we report the discovery of a repeated pTDE, AT 2022dbl. In a quiescent galaxy at z=0.0284, two separate optical/UV flares have been observed in 2022 and 2024, with no bright X-ray, radio or mid-infrared counterparts. Compared to the first flare, the second flare has a similar blackbody temperature of ~26,000 K, slightly lower peak luminosity, and slower rise and fall phases. Compared to the ZTF TDEs, their blackbody parameters, bolometric energies and light curve shapes are all similar. The spectra taken during the second flare show a steeper continuum than the late-time spectra of the previous flare, consistent with a newly risen flare. More importantly, the possibility of two independent TDEs can be largely ruled out because the optical spectra taken around the peak of the two flares exhibit highly similar broad Balmer, N III and possible He II emission lines, especially the extreme ~4100Å emission lines. This represents the first robust spectroscopic evidence for a repeated pTDE, which can soon be verified by observing the third flare, given its short orbital period.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Magnetic fluctuation and dominant superconducting pairing symmetry near the tunable Van Hove singularity
Authors:
Xiaohan Kong,
Boyang Wen,
Kaiyi Guo,
Ying Liang,
Tianxing Ma
Abstract:
We have investigated the magnetism and pairing correlations of the triangular lattice based on the Hubbard model using the determinant quantum Monte Carlo method and the constrained path Monte Carlo. The results show that the presence of the next-nearest-neighbor hopping integral $t^{\prime}$ introduces an additional energy scale to the system, and through $t^{\prime}$, one can regulate the shape…
▽ More
We have investigated the magnetism and pairing correlations of the triangular lattice based on the Hubbard model using the determinant quantum Monte Carlo method and the constrained path Monte Carlo. The results show that the presence of the next-nearest-neighbor hopping integral $t^{\prime}$ introduces an additional energy scale to the system, and through $t^{\prime}$, one can regulate the shape of the density of states and thus the position of the van Hove singularity point. Increasing inverse temperature $β$ and on-site interaction $U$ favor the formation of ferromagnetic correlation in a rather large filling region, and the calculations for different lattice sizes show that the range of the ferromagnetic correlations is smaller than the smallest lattice simulated at the investigated temperatures. We study the different pairing correlations of the triangular lattice near several typical fillings and show that the $f$-wave pairing dominates the system in the filling region near the van Hove singularity point with a high density of states, where the ferromagnetic correlation is also enhanced. When the filling is close to half-filling, the pairing susceptibility with $f$ wave is suppressed and the pairing susceptibility of $f_n$ wave is enhanced, however, both the effective pairing interaction with $f$ wave and $f_n$ wave are negative, which indicates that neither $f$-wave nor $f_n$-wave superconductivity may exist. Finally, we find that the pairing channel of different symmetry in the system maybe closely related to the magnetic properties. Ferromagnetic fluctuation favors the formation of $f$-wave pairing, while antiferromagnetic fluctuation tends to promote $f_n$-wave pairing.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
A Multi-Peak Solar Flare with a High Turnover Frequency of The Gyrosynchrotron Spectra from the Loop-Top Source
Authors:
Zhao Wu,
Alexey Kuznetsov,
Sergey Anfinogentov,
Victor Melnikov,
Robert Sych,
Bing Wang,
Ruisheng Zheng,
Xiangliang Kong,
Baolin Tan,
Zongjun Ning,
Yao Chen
Abstract:
The origin of multiple peaks in lightcurves of various wavelengths remains illusive during flares. Here we discuss the flare of SOL2023-05-09T03:54M6.5 with six flux peaks as recorded by a tandem of new microwave and Hard X-ray instruments. According to its microwave spectra, the flare represents a high-turnover frequency (>15 GHz) event. The rather-complete microwave and HXR spectral coverage pro…
▽ More
The origin of multiple peaks in lightcurves of various wavelengths remains illusive during flares. Here we discuss the flare of SOL2023-05-09T03:54M6.5 with six flux peaks as recorded by a tandem of new microwave and Hard X-ray instruments. According to its microwave spectra, the flare represents a high-turnover frequency (>15 GHz) event. The rather-complete microwave and HXR spectral coverage provides a rare opportunity to uncover the origin of such event together with simultaneous EUV images. We concluded that (1) the microwave sources originates around the top section of the flaring loops with a trend of source spatial dispersion with frequency;(2) the visible movement of the microwave source from peak to peak originates from the process of new flaring loops appearing sequentially along the magnetic neutral line; 3) the optically-thin microwave spectra are hard with the indices varying from -1.2 to -0.4, and the turnover frequency always exceeds 15 GHz; 4) higher turnover/peak frequency corresponds to stronger peak intensity and harder optically-thin spectra. Using the Fokker-Planck and GX simulator codes we obtained a good fit to the observed microwave spectra and spatial distribution of the sources at all peaks, if assuming the radiating energetic electrons have the same spatial distribution and single-power-law spectra but with the number density varying in a range of 30%. We conclude that the particle acceleration in this flare happens in a compact region nearing the looptop. These results provide new constraints on the acceleration of energetic electrons and the underlying flare intermittent reconnection process.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Explainable Interface for Human-Autonomy Teaming: A Survey
Authors:
Xiangqi Kong,
Yang Xing,
Antonios Tsourdos,
Ziyue Wang,
Weisi Guo,
Adolfo Perrusquia,
Andreas Wikander
Abstract:
Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous…
▽ More
Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition
Authors:
Yunbing Jia,
Xiaoyu Kong,
Fan Tang,
Yixing Gao,
Weiming Dong,
Yi Yang
Abstract:
In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via i…
▽ More
In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via imitation, the mixed feature with ambiguous semantics hinders the distillation. To this end, we propose an asymmetric distillation framework by feeding teacher model extra raw data to enlarge the benefit of teacher. Moreover, a joint mutual information loss and a selective relabel strategy are utilized to alleviate the influence of hard mixed samples. Our method successfully mitigates the decline in open-set and outperforms SOTAs by 2%~3% AUROC on the Tiny-ImageNet dataset and experiments on large-scale dataset ImageNet-21K demonstrate the generalization of our method.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
USmorph: An Updated Framework of Automatic Classification of Galaxy Morphologies and Its Application to Galaxies in the COSMOS Field
Authors:
Jie Song,
GuanWen Fang,
Shuo Ba,
Zesen Lin,
Yizhou Gu,
Chichun Zhou,
Tao Wang,
Cai-Na Hao,
Guilin Liu,
Hongxin Zhang,
Yao Yao,
Xu Kong
Abstract:
Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing s…
▽ More
Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing step. The updated method is applied to the galaxies with $I_{\rm mag}<25$ at $0.2<z<1.2$ in the COSMOS field. Based on their HST/ACS I-band images, we classify them into five distinct morphological types: spherical (SPH, 15,200), early-type disk (ETD, 17,369), late-type disk (LTD, 21,143), irregular disk (IRR, 28,965), and unclassified (UNC, 17,129). In addition, we have conducted both parametric and nonparametric morphological measurements. For galaxies with stellar masses exceeding $10^{9}M_{\sun}$, a gradual increase in effective radius from SPHs to IRRs is observed, accompanied by a decrease in the Sérsic index. Nonparametric morphologies reveal distinct distributions of galaxies across the $Gini-M_{20}$ and $C-A$ parameter spaces for different categories. Moreover, different categories exhibit significant dissimilarity in their $G_2$ and $Ψ$ distributions. We find morphology to be strongly correlated with redshift and stellar mass. The consistency of these classification results with expected correlations among multiple parameters underscores the validity and reliability of our classification method, rendering it a valuable tool for future studies.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Batch Array Codes
Authors:
Xiangliang Kong,
Chen Wang,
Yiwei Zhang
Abstract:
Batch codes are a type of codes specifically designed for coded distributed storage systems and private information retrieval protocols. These codes have got much attention in recent years due to their ability to enable efficient and secure storage in distributed systems.
In this paper, we study an array code version of the batch codes, which is called the \emph{batch array code} (BAC). Under th…
▽ More
Batch codes are a type of codes specifically designed for coded distributed storage systems and private information retrieval protocols. These codes have got much attention in recent years due to their ability to enable efficient and secure storage in distributed systems.
In this paper, we study an array code version of the batch codes, which is called the \emph{batch array code} (BAC). Under the setting of BAC, each node stores a bucket containing multiple code symbols and responds with a locally computed linear combination of the symbols in its bucket during the recovery of a requested symbol. We demonstrate that BACs can support the same type of requests as the original batch codes but with reduced redundancy. Specifically, we establish information theoretic lower bounds on the code lengths and provide several code constructions that confirm the tightness of the lower bounds for certain parameter regimes.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
FEASTS Combined with Interferometry (I): Overall Properties of Diffuse HI and Implications for Gas Accretion in Nearby Galaxies
Authors:
Jing Wang,
Xuchen Lin,
Dong Yang,
Lister Staveley-Smith,
Fabian Walter,
Q. Daniel Wang,
Ran Wang,
A. J. Battisti,
Barbara Catinella,
Hsiao-Wen Chen,
Luca Cortese,
D. B. Fisher,
Luis C. Ho,
Suoqing Ji,
Peng Jiang,
Guinevere Kauffmann,
Xu Kong,
Ziming Liu,
Li Shao,
Jie Wang,
Lile Wang,
Shun Wang
Abstract:
We present a statistical study of the properties of diffuse HI in ten nearby galaxies, comparing the HI detected by the single-dish telescope FAST (FEASTS program) and the interferometer VLA (THINGS program), respectively. The THINGS' observation missed HI with a median of 23% due to the short-spacing problem of interferometry and limited sensitivity. We extract the diffuse HI by subtracting the d…
▽ More
We present a statistical study of the properties of diffuse HI in ten nearby galaxies, comparing the HI detected by the single-dish telescope FAST (FEASTS program) and the interferometer VLA (THINGS program), respectively. The THINGS' observation missed HI with a median of 23% due to the short-spacing problem of interferometry and limited sensitivity. We extract the diffuse HI by subtracting the dense HI, which is obtained from the THINGS data with a uniform flux-density threshold, from the total HI detected by FAST. Among the sample, the median diffuse-HI fraction is 34%, and more diffuse HI is found in galaxies exhibiting more prominent tidal-interaction signatures. The diffuse HI we detected seems to be distributed in disk-like layers within a typical thickness of $1\,\text{kpc}$, different from the more halo-like diffuse HI detected around NGC 4631 in a previous study. Most of the diffuse HI is cospatial with the dense HI and has a typical column density of $10^{17.7}$-$10^{20.1}\,\text{cm}^{-2}$. The diffuse and dense HI exhibits a similar rotational motion, but the former lags by a median of 25% in at least the inner disks, and its velocity dispersions are typically twice as high. Based on a simplified estimation of circum-galactic medium properties and assuming pressure equilibrium, the volume density of diffuse HI appears to be constant within each individual galaxy, implying its role as a cooling interface. Comparing with existing models, these results are consistent with a possible link between tidal interactions, the formation of diffuse HI, and gas accretion.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Complete moment convergence of moving average processes for $m$-widely acceptable sequence under sub-linear expectations
Authors:
Mingzhou Xu,
Xuhang Kong
Abstract:
In this article, the complete moment convergence for the partial sum of moving average processes $\{X_n=\sum_{i=-\infty}^{\infty}a_iY_{i+n},n\ge 1\}$ is estabished under some proper conditions, where $\{Y_i,-\infty<i<\infty\}$ is a sequence of $m$-widely acceptable ($m$-WA) random variables, which is stochastically dominated by a random variable $Y$ in sub-linear expectations space $(Ω,\HH,\ee)$ a…
▽ More
In this article, the complete moment convergence for the partial sum of moving average processes $\{X_n=\sum_{i=-\infty}^{\infty}a_iY_{i+n},n\ge 1\}$ is estabished under some proper conditions, where $\{Y_i,-\infty<i<\infty\}$ is a sequence of $m$-widely acceptable ($m$-WA) random variables, which is stochastically dominated by a random variable $Y$ in sub-linear expectations space $(Ω,\HH,\ee)$ and $\{a_i,-\infty<i<\infty\}$ is an absolutely summable sequence of real numbers. The results extend the relevant results in probability space to those under sub-linear expectations.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Ensemble Adversarial Defense via Integration of Multiple Dispersed Low Curvature Models
Authors:
Kaikang Zhao,
Xi Chen,
Wei Huang,
Liuxin Ding,
Xianglong Kong,
Fan Zhang
Abstract:
The integration of an ensemble of deep learning models has been extensively explored to enhance defense against adversarial attacks. The diversity among sub-models increases the attack cost required to deceive the majority of the ensemble, thereby improving the adversarial robustness. While existing approaches mainly center on increasing diversity in feature representations or dispersion of first-…
▽ More
The integration of an ensemble of deep learning models has been extensively explored to enhance defense against adversarial attacks. The diversity among sub-models increases the attack cost required to deceive the majority of the ensemble, thereby improving the adversarial robustness. While existing approaches mainly center on increasing diversity in feature representations or dispersion of first-order gradients with respect to input, the limited correlation between these diversity metrics and adversarial robustness constrains the performance of ensemble adversarial defense. In this work, we aim to enhance ensemble diversity by reducing attack transferability. We identify second-order gradients, which depict the loss curvature, as a key factor in adversarial robustness. Computing the Hessian matrix involved in second-order gradients is computationally expensive. To address this, we approximate the Hessian-vector product using differential approximation. Given that low curvature provides better robustness, our ensemble model was designed to consider the influence of curvature among different sub-models. We introduce a novel regularizer to train multiple more-diverse low-curvature network models. Extensive experiments across various datasets demonstrate that our ensemble model exhibits superior robustness against a range of attacks, underscoring the effectiveness of our approach.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Radio-to-Submillimetre Spectral Energy Distributions of NGC 1365
Authors:
Guangwen Chen,
George J. Bendo,
Gary A. Fuller,
Hong-Xin Zhang,
Xu Kong
Abstract:
We analyse the radio-to-submillimetre spectral energy distribution (SED) for the central pseudobulge of NGC~1365 using archival data from the Atacama Large Millimeter/submillimeter Array (ALMA) and the Very Large Array (VLA). This analysis shows that free-free emission dominates the continuum emission at 50--120~GHz and produces about 75 per cent of the 103~GHz continuum emission. However, the fra…
▽ More
We analyse the radio-to-submillimetre spectral energy distribution (SED) for the central pseudobulge of NGC~1365 using archival data from the Atacama Large Millimeter/submillimeter Array (ALMA) and the Very Large Array (VLA). This analysis shows that free-free emission dominates the continuum emission at 50--120~GHz and produces about 75 per cent of the 103~GHz continuum emission. However, the fraction of 103~GHz continuum emission originating from free-free emission varies significantly among different subregions in the pseudobulge, particularly for an outflow from the AGN on the eastern pseudobulge where the synchrotron emission produces half of the 103~GHz continuum emission. Free-free emission also dominates at 103~GHz within the central 400 pc diameter region, but this emission is associated with the AGN rather than star formation. The star formation rate (SFR) within the pseudobulge derived from the ALMA free-free emission is $8.9 \pm 1.1$~M$_\odot$~yr$^{-1}$. This is comparable to the SFR from the mid-infrared emission but higher than the SFR from the extinction-corrected H$α$ line emission, mainly because the pseudobulge is heavily dust obscured. The 1.5 GHz emission yields a comparable SFR for the pseudobulge but may have lower SFRs within subregions of the pseudobulge because of the diffusion outside of these regions of the electrons producing the synchrotron radiation. We propose that applying a correction factor of 75 per cent to the 80--110~GHz continuum emission could provide valuable estimates of the free-free emission without performing any SED decomposition, which could derive extinction-free SFRs within 20 per cent accuracy.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Authors:
Brandon McKinzie,
Zhe Gan,
Jean-Philippe Fauconnier,
Sam Dodge,
Bowen Zhang,
Philipp Dufter,
Dhruti Shah,
Xianzhi Du,
Futang Peng,
Floris Weers,
Anton Belyi,
Haotian Zhang,
Karanjeet Singh,
Doug Kang,
Ankur Jain,
Hongyu Hè,
Max Schwarzer,
Tom Gunter,
Xiang Kong,
Aonan Zhang,
Jianyu Wang,
Chong Wang,
Nan Du,
Tao Lei,
Sam Wiseman
, et al. (7 additional authors not shown)
Abstract:
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la…
▽ More
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, including both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.
△ Less
Submitted 18 April, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation
Authors:
Hairong Shi,
Songhao Han,
Shaofei Huang,
Yue Liao,
Guanbin Li,
Xiangxing Kong,
Hua Zhu,
Xiaomu Wang,
Si Liu
Abstract:
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st…
▽ More
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent studies have attempted to enhance SAM with medical expertise by pre-training on large-scale medical segmentation datasets. However, challenges still exist in 3D tumor lesion segmentation owing to tumor complexity and the imbalance in foreground and background regions. Therefore, we introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation. We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks, facilitating the generation of more precise segmentation masks. Furthermore, an iterative refinement scheme is implemented in M-SAM to refine the segmentation masks progressively, leading to improved performance. Extensive experiments on seven tumor lesion segmentation datasets indicate that our M-SAM not only achieves high segmentation accuracy but also exhibits robust generalization. The code is available at https://github.com/nanase1025/M-SAM.
△ Less
Submitted 11 July, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission
Authors:
Shifeng Huang,
Ning Jiang,
Jiazheng Zhu,
Yibo Wang,
Tinggui Wang,
Shan-Qin Wang,
Wen-Pei Gan,
En-Wei Liang,
Yu-Jing Qin,
Zheyu Lin,
Lin-Na Xu,
Min-Xuan Cai,
Ji-An Jiang,
Xu Kong,
Jiaxun Li,
Long Li,
Jian-Guo Wang,
Ze-Lin Xu,
Yongquan Xue,
Ye-Fei Yuan,
Jingquan Cheng,
Lulu Fan,
Jie Gao,
Lei Hu,
Weida Hu
, et al. (20 additional authors not shown)
Abstract:
High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The…
▽ More
High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The bump represents the longest separation time from the main peak among known TDEs to date. The main UV/optical outburst declines as $t^{-4.10}$, making it one of the fastest decaying optically selected TDEs. Furthermore, we detected sporadic X-ray emission 30 days after the UV/optical peak, accompanied by a reduction in the period of inactivity. It is proposed that the UV/optical bump could be caused by the self-intersection of the stream debris, whereas the primary peak is generated by the reprocessed emission of the accretion process. In addition, our results suggest that episodic X-ray radiation during the initial phase of decline may be due to the patched obscurer surrounding the accretion disk, a phenomenon associated with the inhomogeneous reprocessing process. The double TDE scenario, in which two stars are disrupted in sequence, is also a possible explanation for producing the observed early bump and main peak. We anticipate that the multicolor light curves of TDEs, especially in the very early stages, and the underlying physics can be better understood in the near future with the assistance of dedicated surveys such as the deep high-cadence survey of the 2.5-meter Wide Field Survey Telescope (WFST).
△ Less
Submitted 26 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications
Authors:
Jiaqi Han,
Jiacheng Cen,
Liming Wu,
Zongzhao Li,
Xiangzhe Kong,
Rui Jiao,
Ziyang Yu,
Tingyang Xu,
Fandi Wu,
Zihe Wang,
Hongteng Xu,
Zhewei Wei,
Yang Liu,
Yu Rong,
Wenbing Huang
Abstract:
Geometric graph is a special kind of graph with geometric features, which is vital to model many scientific problems. Unlike generic graphs, geometric graphs often exhibit physical symmetries of translations, rotations, and reflections, making them ineffectively processed by current Graph Neural Networks (GNNs). To tackle this issue, researchers proposed a variety of Geometric Graph Neural Network…
▽ More
Geometric graph is a special kind of graph with geometric features, which is vital to model many scientific problems. Unlike generic graphs, geometric graphs often exhibit physical symmetries of translations, rotations, and reflections, making them ineffectively processed by current Graph Neural Networks (GNNs). To tackle this issue, researchers proposed a variety of Geometric Graph Neural Networks equipped with invariant/equivariant properties to better characterize the geometry and topology of geometric graphs. Given the current progress in this field, it is imperative to conduct a comprehensive survey of data structures, models, and applications related to geometric GNNs. In this paper, based on the necessary but concise mathematical preliminaries, we provide a unified view of existing models from the geometric message passing perspective. Additionally, we summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation. We also discuss the challenges and future potential directions of Geometric GNNs at the end of this survey.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Full-Atom Peptide Design with Geometric Latent Diffusion
Authors:
Xiangzhe Kong,
Yinjun Jia,
Wenbing Huang,
Yang Liu
Abstract:
Peptide design plays a pivotal role in therapeutics, allowing brand new possibility to leverage target binding sites that are previously undruggable. Most existing methods are either inefficient or only concerned with the target-agnostic design of 1D sequences. In this paper, we propose a generative model for full-atom \textbf{Pep}tide design with \textbf{G}eometric \textbf{LA}tent \textbf{D}iffus…
▽ More
Peptide design plays a pivotal role in therapeutics, allowing brand new possibility to leverage target binding sites that are previously undruggable. Most existing methods are either inefficient or only concerned with the target-agnostic design of 1D sequences. In this paper, we propose a generative model for full-atom \textbf{Pep}tide design with \textbf{G}eometric \textbf{LA}tent \textbf{D}iffusion (PepGLAD). We first establish a benchmark consisting of both 1D sequences and 3D structures from Protein Data Bank (PDB) and literature for systematic evaluation. We then identify two major challenges of leveraging current diffusion-based models for peptide design: the full-atom geometry and the variable binding geometry. To tackle the first challenge, PepGLAD derives a variational autoencoder that first encodes full-atom residues of variable size into fixed-dimensional latent representations, and then decodes back to the residue space after conducting the diffusion process in the latent space. For the second issue, PepGLAD explores a receptor-specific affine transformation to convert the 3D coordinates into a shared standard space, enabling better generalization ability across different binding shapes. Experimental Results show that our method not only improves diversity and binding affinity significantly in the task of sequence-structure co-design, but also excels at recovering reference structures for binding conformation generation.
△ Less
Submitted 21 May, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Equivariant Pretrained Transformer for Unified Geometric Learning on Multi-Domain 3D Molecules
Authors:
Rui Jiao,
Xiangzhe Kong,
Ziyang Yu,
Wenbing Huang,
Yang Liu
Abstract:
Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models on a specific domain, either proteins or small molecules, missing the opportunity to leverage the cross-domain knowledge. To mitigate this gap, we introduce Equivariant Pretrained Transformer (EPT), a novel pretraining fr…
▽ More
Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models on a specific domain, either proteins or small molecules, missing the opportunity to leverage the cross-domain knowledge. To mitigate this gap, we introduce Equivariant Pretrained Transformer (EPT), a novel pretraining framework designed to harmonize the geometric learning of small molecules and proteins. To be specific, EPT unifies the geometric modeling of multi-domain molecules via the block-enhanced representation that can attend a broader context of each atom. Upon transformer framework, EPT is further enhanced with E(3) equivariance to facilitate the accurate representation of 3D structures. Another key innovation of EPT is its block-level pretraining task, which allows for joint pretraining on datasets comprising both small molecules and proteins. Experimental evaluations on a diverse group of benchmarks, including ligand binding affinity prediction, molecular property prediction, and protein property prediction, show that EPT significantly outperforms previous SOTA methods for affinity prediction, and achieves the best or comparable performance with existing domain-specific pretraining models for other tasks.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation
Authors:
Aiwei Liu,
Haoping Bai,
Zhiyun Lu,
Xiang Kong,
Simon Wang,
Jiulong Shan,
Meng Cao,
Lijie Wen
Abstract:
Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an aut…
▽ More
Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA). First, we use contrastive prompt pairs to automatically generate preference data. Then, we continue to evaluate the generated preference data using contrastive prompt pairs and calculate a self-rewarding score. Finally, we use the DPO algorithm to effectively align LLMs by combining this self-rewarding score. In the experimental stage, our DLMA method could surpass the \texttt{RLHF} method without relying on human-annotated preference data.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
EscherNet: A Generative Model for Scalable View Synthesis
Authors:
Xin Kong,
Shikun Liu,
Xiaoyang Lyu,
Marwan Taher,
Xiaojuan Qi,
Andrew J. Davison
Abstract:
We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scala…
▽ More
We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scalability in view synthesis -- it can generate more than 100 consistent target views simultaneously on a single consumer-grade GPU, despite being trained with a fixed number of 3 reference views to 3 target views. As a result, EscherNet not only addresses zero-shot novel view synthesis, but also naturally unifies single- and multi-image 3D reconstruction, combining these diverse tasks into a single, cohesive framework. Our extensive experiments demonstrate that EscherNet achieves state-of-the-art performance in multiple benchmarks, even when compared to methods specifically tailored for each individual problem. This remarkable versatility opens up new directions for designing scalable neural architectures for 3D vision. Project page: https://kxhit.github.io/EscherNet.
△ Less
Submitted 19 March, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Spontaneous nucleation and growth of GaN nanowires: Fundamental role of crystal polarity
Authors:
Sergio Fernández-Garrido,
Xiang Kong,
Tobias Gotschke,
Raffaella Calarco,
Lutz Geelhaar,
Achim Trampert,
Oliver Brandt
Abstract:
We experimentally investigate whether crystal polarity affects the growth of GaN nanowires in plasma-assisted molecular beam epitaxy and whether their formation has to be induced by defects. For this purpose, we prepare smooth and coherently strained AlN layers on 6H-SiC(0001) and SiC(000$\bar{1}$) substrates to ensure a well-defined polarity and an absence of structural and morphological defects.…
▽ More
We experimentally investigate whether crystal polarity affects the growth of GaN nanowires in plasma-assisted molecular beam epitaxy and whether their formation has to be induced by defects. For this purpose, we prepare smooth and coherently strained AlN layers on 6H-SiC(0001) and SiC(000$\bar{1}$) substrates to ensure a well-defined polarity and an absence of structural and morphological defects. On N-polar AlN, a homogeneous and dense N-polar GaN nanowire array forms, evidencing that GaN nanowires form spontaneously in the absence of defects. On Al-polar AlN, we do not observe the formation of Ga-polar GaN NWs. Instead, sparse N-polar GaN nanowires grow embedded in a Ga-polar GaN layer. These N-polar GaN nanowires are shown to be accidental in that the necessary polarity inversion is induced by the formation of Si$_{x}$N. The present findings thus demonstrate that spontaneously formed GaN nanowires are irrevocably N-polar. Due to the strong impact of the polarity on the properties of GaN-based devices, these results are not only essential to understand the spontaneous formation of GaN nanowires but also of high technological relevance.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
HiFAST: an HI data calibration and imaging pipeline for FAST
Authors:
Yingjie Jing,
Jie Wang,
Chen Xu,
Ziming Liu,
Qingze Chen,
Tiantian Liang,
Jinlong Xu,
Yixian Cao,
Jing Wang,
Huijie Hu,
Chuan-Peng Zhang,
Qi Guo,
Liang Gao,
Mei Ai,
Hengqian Gan,
Xuyang Gao,
Jinlin Han,
Ligang Hou,
Zhipeng Hou,
Peng Jiang,
Xu Kong,
Fujia Li,
Zerui Liu,
Li Shao,
Hengxing Pan
, et al. (8 additional authors not shown)
Abstract:
The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has the largest aperture and a 19-beam L-band receiver, making it powerful for investigating the neutral hydrogen atomic gas (HI) in the universe. We present HiFAST (https://hifast.readthedocs.io), a dedicated, modular, and self-contained calibration and imaging pipeline for processing the HI data of FAST. The pipeline consists of fr…
▽ More
The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has the largest aperture and a 19-beam L-band receiver, making it powerful for investigating the neutral hydrogen atomic gas (HI) in the universe. We present HiFAST (https://hifast.readthedocs.io), a dedicated, modular, and self-contained calibration and imaging pipeline for processing the HI data of FAST. The pipeline consists of frequency-dependent noise diode calibration, baseline fitting, standing wave removal using an FFT-based method, flux density calibration, stray radiation correction, and gridding to produce data cubes. These modules can be combined as needed to process the data from most FAST observation modes: tracking, drift scanning, On-The-Fly mapping, and most of their variants. With HiFAST, the RMS noises of the calibrated spectra from all 19 beams were only slightly (~ 5%) higher than the theoretical expectation. The results for the extended source M33 and the point sources are consistent with the results from Arecibo. The moment maps (0,1 and 2) of M33 agree well with the results from the Arecibo Galaxy Environment Survey (AGES) with a fractional difference of less than 10%. For a common sample of 221 sources with signal-to-noise ratio S/N >10 from the Arecibo Legacy Fast ALFA (ALFALFA) survey, the mean value of fractional difference in the integrated flux density, $S_{\mathrm{int}}$, between the two datasets is approximately 0.005 %, with a dispersion of 15.4%. Further checks on the integrated flux density of 23 sources with seven observations indicate that the variance in the flux density of the source with luminous objects ($S_\mathrm{int}$ $ > 2.5$ Jy km s$^{-1}$) is less than 5%. Our tests suggest that the FAST telescope, with the efficient, precise, and user-friendly pipeline HiFAST, will yield numerous significant scientific findings in the investigation of the HI in the universe.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Polarity-induced selective area epitaxy of GaN nanowires
Authors:
Ziani de Souza Schiaber,
Gabriele Calabrese,
Xiang Kong,
Achim Trampert,
Bernd Jenichen,
José Humberto Dias da Silva,
Lutz Geelhaar,
Oliver Brandt,
Sergio Fernández-Garrido
Abstract:
We present a conceptually novel approach to achieve selective area epitaxy of GaN nanowires. The approach is based on the fact that these nanostructures do not form in plasma-assisted molecular beam epitaxy on structurally and chemically uniform cation-polar substrates. By in situ depositing and nitridating Si on a Ga-polar GaN film, we locally reverse the polarity to induce the selective area epi…
▽ More
We present a conceptually novel approach to achieve selective area epitaxy of GaN nanowires. The approach is based on the fact that these nanostructures do not form in plasma-assisted molecular beam epitaxy on structurally and chemically uniform cation-polar substrates. By in situ depositing and nitridating Si on a Ga-polar GaN film, we locally reverse the polarity to induce the selective area epitaxy of N-polar GaN nanowires. We show that the nanowire number density can be controlled over several orders of magnitude by varying the amount of pre-deposited Si. Using this growth approach, we demonstrate the synthesis of single-crystalline and uncoalesced nanowires with diameters as small as 20 nm. The achievement of nanowire number densities low enough to prevent the shadowing of the nanowire sidewalls from the impinging fluxes paves the way for the realization of homogeneous core-shell heterostructures without the need of using ex situ pre-patterned substrates.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
MobFuzz: Adaptive Multi-objective Optimization in Gray-box Fuzzing
Authors:
Gen Zhang,
Pengfei Wang,
Tai Yue,
Xiangdong Kong,
Shan Huang,
Xu Zhou,
Kai Lu
Abstract:
Coverage-guided gray-box fuzzing (CGF) is an efficient software testing technique. There are usually multiple objectives to optimize in CGF. However, existing CGF methods cannot successfully find the optimal values for multiple objectives simultaneously. In this paper, we propose a gray-box fuzzer for multi-objective optimization (MOO) called MobFuzz. We model the multi-objective optimization proc…
▽ More
Coverage-guided gray-box fuzzing (CGF) is an efficient software testing technique. There are usually multiple objectives to optimize in CGF. However, existing CGF methods cannot successfully find the optimal values for multiple objectives simultaneously. In this paper, we propose a gray-box fuzzer for multi-objective optimization (MOO) called MobFuzz. We model the multi-objective optimization process as a multi-player multi-armed bandit (MPMAB). First, it adaptively selects the objective combination that contains the most appropriate objectives for the current situation. Second, our model deals with the power schedule, which adaptively allocates energy to the seeds under the chosen objective combination. In MobFuzz, we propose an evolutionary algorithm called NIC to optimize our chosen objectives simultaneously without incurring additional performance overhead. To prove the effectiveness of MobFuzz, we conduct experiments on 12 real-world programs and the MAGMA data set. Experiment results show that multi-objective optimization in MobFuzz outperforms single-objective fuzzing in the baseline fuzzers. In contrast to them, MobFuzz can select the optimal objective combination and increase the values of multiple objectives up to 107%, with at most a 55% reduction in the energy consumption. Moreover, MobFuzz has up to 6% more program coverage and finds 3x more unique bugs than the baseline fuzzers. The NIC algorithm has at least a 2x improvement with a performance overhead of approximately 3%.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models
Authors:
Jinchang Hou,
Chang Ao,
Haihong Wu,
Xiangtao Kong,
Zhigang Zheng,
Daijia Tang,
Chengming Li,
Xiping Hu,
Ruifeng Xu,
Shiwen Ni,
Min Yang
Abstract:
With the accelerating development of Large Language Models (LLMs), many LLMs are beginning to be used in the Chinese K-12 education domain. The integration of LLMs and education is getting closer and closer, however, there is currently no benchmark for evaluating LLMs that focuses on the Chinese K-12 education domain. Therefore, there is an urgent need for a comprehensive natural language processi…
▽ More
With the accelerating development of Large Language Models (LLMs), many LLMs are beginning to be used in the Chinese K-12 education domain. The integration of LLMs and education is getting closer and closer, however, there is currently no benchmark for evaluating LLMs that focuses on the Chinese K-12 education domain. Therefore, there is an urgent need for a comprehensive natural language processing benchmark to accurately assess the capabilities of various LLMs in the Chinese K-12 education domain. To address this, we introduce the E-EVAL, the first comprehensive evaluation benchmark specifically designed for the Chinese K-12 education field. The E-EVAL consists of 4,351 multiple-choice questions at the primary, middle, and high school levels across a wide range of subjects, including Chinese, English, Politics, History, Ethics, Physics, Chemistry, Mathematics, and Geography. We conducted a comprehensive evaluation of E-EVAL on advanced LLMs, including both English-dominant and Chinese-dominant models. Findings show that Chinese-dominant models perform well compared to English-dominant models, with many scoring even above the GPT 4.0. However, almost all models perform poorly in complex subjects such as mathematics. We also found that most Chinese-dominant LLMs did not achieve higher scores at the primary school level compared to the middle school level. We observe that the mastery of higher-order knowledge by the model does not necessarily imply the mastery of lower-order knowledge as well. Additionally, the experimental results indicate that the Chain of Thought (CoT) technique is effective only for the challenging science subjects, while Few-shot prompting is more beneficial for liberal arts subjects. With E-EVAL, we aim to analyze the strengths and limitations of LLMs in educational applications, and to contribute to the progress and development of Chinese K-12 education and LLMs.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Authors:
Fanghua Yu,
Jinjin Gu,
Zheyuan Li,
Jinfan Hu,
Xiangtao Kong,
Xintao Wang,
Jingwen He,
Yu Qiao,
Chao Dong
Abstract:
We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and…
▽ More
We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.
△ Less
Submitted 3 April, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Authors:
Siyuan Qi,
Shuo Chen,
Yexin Li,
Xiangyu Kong,
Junqi Wang,
Bangcheng Yang,
Pring Wong,
Yifan Zhong,
Xiaoyuan Zhang,
Zhaowei Zhang,
Nian Liu,
Wei Wang,
Yaodong Yang,
Song-Chun Zhu
Abstract:
The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment…
▽ More
The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.
△ Less
Submitted 12 March, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Physics-Informed Convolutional Decoder (PICD): A novel approach for direct inversion of heterogeneous subsurface flow
Authors:
Nanzhe Wang,
Xiang-Zhao Kong,
Dongxiao Zhang
Abstract:
In this study, we present the development and application of the physics-informed convolutional decoder (PICD) framework for inverse modeling of heterogenous groundwater flow. PICD stands out as a direct inversion method, eliminating the need for repeated forward model simulations. The framework leverages both data-driven and physics-driven approaches by integrating monitoring data and domain know…
▽ More
In this study, we present the development and application of the physics-informed convolutional decoder (PICD) framework for inverse modeling of heterogenous groundwater flow. PICD stands out as a direct inversion method, eliminating the need for repeated forward model simulations. The framework leverages both data-driven and physics-driven approaches by integrating monitoring data and domain knowledge (governing equation, boundary conditions, and initial conditions) into the inversion process. PICD utilizes a convolutional decoder to effectively approximate the spatial distribution of hydraulic heads, while Karhunen Loeve expansion (KLE) is employed to parameterize hydraulic conductivities. During the training process, the stochastic vector in KLE and the parameters of the convolutional decoder are adjusted simultaneously, ensuring that the predictions align with available measurements and adhere to domain-specific knowledge. The final optimized stochastic vectors correspond to the estimation of hydraulic conductivities, and the trained convolutional decoder demonstrates the ability to predict the evolution and distribution of hydraulic heads in heterogeneous fields. To validate the effectiveness of the proposed PICD framework, various scenarios of groundwater flow are examined. Results demonstrate the framework's capability to accurately estimate heterogeneous hydraulic conductivities and to deliver satisfactory predictions of hydraulic heads, even with sparse measurements. The proposed PICD framework emerges as a promising tool for efficient and informed groundwater flow inverse modeling.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Authors:
Tianyu Cui,
Yanling Wang,
Chuanpu Fu,
Yong Xiao,
Sijia Li,
Xinhao Deng,
Yunpeng Liu,
Qinglin Zhang,
Ziyi Qiu,
Peiyang Li,
Zhixing Tan,
Junwu Xiong,
Xinyu Kong,
Zujie Wen,
Ke Xu,
Qi Li
Abstract:
Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta,…
▽ More
Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta, and Anthropic have also made lots of efforts on responsible LLMs. Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. In this paper, we delve into four essential modules of an LLM system, including an input module for receiving prompts, a language model trained on extensive corpora, a toolchain module for development and deployment, and an output module for exporting LLM-generated content. Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to facilitate the risk assessment of LLM systems. We hope that this paper can help LLM participants embrace a systematic perspective to build their responsible LLM systems.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
SPT: Spectral Transformer for Red Giant Stars Age and Mass Estimation
Authors:
Mengmeng Zhang,
Fan Wu,
Yude Bu,
Shanshan Li,
Zhenping Yi,
Meng Liu,
Xiaoming Kong
Abstract:
The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlapping isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel fr…
▽ More
The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlapping isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel framework, Spectral Transformer (SPT), to predict the age and mass of red giants aligned with asteroseismology from their spectra. A key component of SPT, the Multi-head Hadamard Self-Attention mechanism, designed specifically for spectra, can capture complex relationships across different wavelength. Further, we introduced a Mahalanobis distance-based loss function to address scale imbalance and interaction mode loss, and incorporated Monte Carlo dropout for quantitative analysis of prediction uncertainty.Trained and tested on 3,880 red giant spectra from LAMOST, the SPT achieved remarkable age and mass estimations with average percentage errors of 17.64% and 6.61%, respectively, and provided uncertainties for each corresponding prediction. The results significantly outperform those of traditional machine learning algorithms and demonstrate a high level of consistency with asteroseismology methods and isochrone fitting techniques. In the future, our work will leverage datasets from the Chinese Space Station Telescope and the Large Synoptic Survey Telescope to enhance the precision of the model and broaden its applicability in the field of astronomy and astrophysics.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Projected rotational velocities for LAMOST stars with effective temperature lower than 9000 K
Authors:
Fang Zuo,
A-Li Luo,
Bing Du,
Yinbi Li,
Hugh R. A. Jones,
Yi-han Song,
Xiao Kong,
Yan-xin Guo
Abstract:
In Data Release 9 of LAMOST, we present measurements of v sin i for a total of 121,698 stars measured using the Medium Resolution Spectrograph (MRS) and 80,108 stars using the Low Resolution Spectrograph (LRS). These values were obtained through a chi^2 minimisation process, comparing LAMOST spectra with corresponding grids of synthetically broadened spectra. Due to the resolution and the spectral…
▽ More
In Data Release 9 of LAMOST, we present measurements of v sin i for a total of 121,698 stars measured using the Medium Resolution Spectrograph (MRS) and 80,108 stars using the Low Resolution Spectrograph (LRS). These values were obtained through a chi^2 minimisation process, comparing LAMOST spectra with corresponding grids of synthetically broadened spectra. Due to the resolution and the spectral range of LAMOST, v sin i measurements are limited to stars with effective temperature (Teff) ranging from 5000 K to 8500 K for MRS and 7000 K to 9000 K for LRS. The detectable v sin i for MRS is set between 27 km/s and 350 km/s , and for LRS between 110 km/s and 350 km/s, This limitation is because the convolved reference spectra become less informative beyond 350 km/s. The intrinsic precisions of v sin i , determined from multi-epoch observations, is approximately 4.0 km/s for MRS and 10.0 km/s for LRS at signal-to-noise ratio (S/N) greater than 50. Our v sin i values show consistence with those from APOGEE17, displaying a scatter of 8.79 km/s. They are also in agreement with measurements from the Gaia DR3 and SUN catalogs. An observed trend in LAMOST MRS data is the decrease in v sin i with dropping Teff, particularly transiting around 7000 K for dwarfs and 6500 K for giants, primarily observed in stars with near-solar abundances.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy
Authors:
Xiangtao Kong,
Chao Dong,
Lei Zhang
Abstract:
While single task image restoration (IR) has achieved significant successes, it remains a challenging issue to train a single model which can tackle multiple IR tasks. In this work, we investigate in-depth the multiple-in-one (MiO) IR problem, which comprises seven popular IR tasks. We point out that MiO IR faces two pivotal challenges: the optimization of diverse objectives and the adaptation to…
▽ More
While single task image restoration (IR) has achieved significant successes, it remains a challenging issue to train a single model which can tackle multiple IR tasks. In this work, we investigate in-depth the multiple-in-one (MiO) IR problem, which comprises seven popular IR tasks. We point out that MiO IR faces two pivotal challenges: the optimization of diverse objectives and the adaptation to multiple tasks. To tackle these challenges, we present two simple yet effective strategies. The first strategy, referred to as sequential learning, attempts to address how to optimize the diverse objectives, which guides the network to incrementally learn individual IR tasks in a sequential manner rather than mixing them together. The second strategy, i.e., prompt learning, attempts to address how to adapt to the different IR tasks, which assists the network to understand the specific task and improves the generalization ability. By evaluating on 19 test sets, we demonstrate that the sequential and prompt learning strategies can significantly enhance the MiO performance of commonly used CNN and Transformer backbones. Our experiments also reveal that the two strategies can supplement each other to learn better degradation representations and enhance the model robustness. It is expected that our proposed MiO IR formulation and strategies could facilitate the research on how to train IR models with higher generalization capabilities.
△ Less
Submitted 20 March, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Optical conductivity of overdoped cuprates from ab-initio out-of-plane impurity potentials
Authors:
D. M. Broun,
H. U. Özdemir,
Vivek Mishra,
N. R. Lee-Hone,
Xiangru Kong,
T. Berlijn,
P. J. Hirschfeld
Abstract:
Dopant impurity potentials determined by ab-initio supercell DFT calculations are used to calculate the optical conductivity of overdoped LSCO and Tl-2201 in the superconducting and normal states. Vertex corrections are included, to account for the effect of forward scattering on two-particle properties. This approach was previously shown to provide good, semiquantitative agreement with measuremen…
▽ More
Dopant impurity potentials determined by ab-initio supercell DFT calculations are used to calculate the optical conductivity of overdoped LSCO and Tl-2201 in the superconducting and normal states. Vertex corrections are included, to account for the effect of forward scattering on two-particle properties. This approach was previously shown to provide good, semiquantitative agreement with measurements of superfluid density in LSCO. Here we compare calculations of conductivity with measurements of THz conductivity on LSCO using identical impurity, band, and correlation parameters, and find similarly good correspondence with experiment. In the process, we delineate the impact of the different disorder mechanisms on single-particle and transport relaxation processes. In particular, we reveal the critical role of apical oxygen vacancies in transport scattering and show that transport relaxation rates in LSCO are significantly reduced when apical oxygen vacancies are annealed out. These considerations are shown to be crucial for understanding the variability of experimental results on overdoped LSCO in samples of nominally identical doping but different types. Finally, we give predictions for Tl-2201 THz conductivity experiments.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.