-
V2I-Calib: A Novel Calibration Approach for Collaborative Vehicle and Infrastructure LiDAR Systems
Authors:
Qianxin Qu,
Yijin Xiong,
Xin Wu,
Hanyu Li,
Shichun Guo
Abstract:
Cooperative vehicle and infrastructure LiDAR systems hold great potential, yet their implementation faces numerous challenges. Calibration of LiDAR systems across heterogeneous vehicle and infrastructure endpoints is a critical step to ensure the accuracy and consistency of perception system data, necessitating calibration methods that are real-time and stable. To this end, this paper introduces a…
▽ More
Cooperative vehicle and infrastructure LiDAR systems hold great potential, yet their implementation faces numerous challenges. Calibration of LiDAR systems across heterogeneous vehicle and infrastructure endpoints is a critical step to ensure the accuracy and consistency of perception system data, necessitating calibration methods that are real-time and stable. To this end, this paper introduces a novel calibration method for cooperative vehicle and road infrastructure LiDAR systems, which exploits spatial association information between detection boxes. The method centers around a novel Overall IoU metric that reflects the correlation of targets between vehicle and infrastructure, enabling real-time monitoring of calibration results. We search for common matching boxes between vehicle and infrastructure nodes by constructing an affinity matrix. Subsequently, these matching boxes undergo extrinsic parameter computation and optimization. Comparative and ablation experiments on the DAIR-V2X dataset confirm the superiority of our method. To better reflect the differences in calibration results, we have categorized the calibration tasks on the DAIR-V2X dataset based on their level of difficulty, enriching the dataset's utility for future research. Our project is available at https://github.com/MassimoQu/v2i-calib .
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands
Authors:
Zhengshen Zhang,
Lei Zhou,
Chenchen Liu,
Zhiyang Liu,
Chengran Yuan,
Sheng Guo,
Ruiteng Zhao,
Marcelo H. Ang Jr.,
Francis EH Tay
Abstract:
The versatility and adaptability of human grasping catalyze advancing dexterous robotic manipulation. While significant strides have been made in dexterous grasp generation, current research endeavors pivot towards optimizing object manipulation while ensuring functional integrity, emphasizing the synthesis of functional grasps following desired affordance instructions. This paper addresses the ch…
▽ More
The versatility and adaptability of human grasping catalyze advancing dexterous robotic manipulation. While significant strides have been made in dexterous grasp generation, current research endeavors pivot towards optimizing object manipulation while ensuring functional integrity, emphasizing the synthesis of functional grasps following desired affordance instructions. This paper addresses the challenge of synthesizing functional grasps tailored to diverse dexterous robotic hands by proposing DexGrasp-Diffusion, an end-to-end modularized diffusion-based pipeline. DexGrasp-Diffusion integrates MultiHandDiffuser, a novel unified data-driven diffusion model for multi-dexterous hands grasp estimation, with DexDiscriminator, which employs a Physics Discriminator and a Functional Discriminator with open-vocabulary setting to filter physically plausible functional grasps based on object affordances. The experimental evaluation conducted on the MultiDex dataset provides substantiating evidence supporting the superior performance of MultiHandDiffuser over the baseline model in terms of success rate, grasp diversity, and collision depth. Moreover, we demonstrate the capacity of DexGrasp-Diffusion to reliably generate functional grasps for household objects aligned with specific affordance instructions.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs
Authors:
Yiheng Huang,
Xiaowei Mao,
Shengnan Guo,
Yubin Chen,
Youfang Lin,
Huaiyu Wan
Abstract:
Spatial-temporal forecasting and imputation are important for real-world dynamic systems such as intelligent transportation, urban planning, and public health. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While large language models (LLMs) have exhibited st…
▽ More
Spatial-temporal forecasting and imputation are important for real-world dynamic systems such as intelligent transportation, urban planning, and public health. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While large language models (LLMs) have exhibited strong pattern recognition and reasoning abilities across various tasks, including few-shot and zero-shot learning, their development in understanding spatial-temporal data has been constrained by insufficient modeling of complex correlations such as the temporal correlations, spatial connectivity, non-pairwise and high-order spatial-temporal correlations within data. In this paper, we propose STD-LLM for understanding both spatial and temporal properties of \underline{S}patial-\underline{T}emporal \underline{D}ata with \underline{LLM}s, which is capable of implementing both spatial-temporal forecasting and imputation tasks. STD-LLM understands spatial-temporal correlations via explicitly designed spatial and temporal tokenizers as well as virtual nodes. Topology-aware node embeddings are designed for LLMs to comprehend and exploit the topology structure of data. Additionally, to capture the non-pairwise and higher-order correlations, we design a hypergraph learning module for LLMs, which can enhance the overall performance and improve efficiency. Extensive experiments demonstrate that STD-LLM exhibits strong performance and generalization capabilities across the forecasting and imputation tasks on various datasets. Moreover, STD-LLM achieves promising results on both few-shot and zero-shot learning tasks.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
AUITestAgent: Automatic Requirements Oriented GUI Function Testing
Authors:
Yongxiang Hu,
Xuan Wang,
Yingchuan Wang,
Yu Zhang,
Shiyu Guo,
Chaoyi Chen,
Xin Wang,
Yangfan Zhou
Abstract:
The Graphical User Interface (GUI) is how users interact with mobile apps. To ensure it functions properly, testing engineers have to make sure it functions as intended, based on test requirements that are typically written in natural language. While widely adopted manual testing and script-based methods are effective, they demand substantial effort due to the vast number of GUI pages and rapid it…
▽ More
The Graphical User Interface (GUI) is how users interact with mobile apps. To ensure it functions properly, testing engineers have to make sure it functions as intended, based on test requirements that are typically written in natural language. While widely adopted manual testing and script-based methods are effective, they demand substantial effort due to the vast number of GUI pages and rapid iterations in modern mobile apps. This paper introduces AUITestAgent, the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. Since test requirements typically contain interaction commands and verification oracles. AUITestAgent can extract GUI interactions from test requirements via dynamically organized agents. Then, AUITestAgent employs a multi-dimensional data extraction strategy to retrieve data relevant to the test requirements from the interaction trace and perform verification. Experiments on customized benchmarks demonstrate that AUITestAgent outperforms existing tools in the quality of generated GUI interactions and achieved the accuracy of verifications of 94%. Moreover, field deployment in Meituan has shown AUITestAgent's practical usability, with it detecting 4 new functional bugs during 10 regression tests in two months.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Spatially-Variant Degradation Model for Dataset-free Super-resolution
Authors:
Shaojie Guo,
Haofei Song,
Qingli Li,
Yan Wang
Abstract:
This paper focuses on the dataset-free Blind Image Super-Resolution (BISR). Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel. Our method also benefits from having a significantly smaller number of learnable parameters compared to data-driven spatial…
▽ More
This paper focuses on the dataset-free Blind Image Super-Resolution (BISR). Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel. Our method also benefits from having a significantly smaller number of learnable parameters compared to data-driven spatially-variant BISR methods. Concretely, each pixel's degradation kernel is expressed as a linear combination of a learnable dictionary composed of a small number of spatially-variant atom kernels. The coefficient matrices of the atom degradation kernels are derived using membership functions of fuzzy set theory. We construct a novel Probabilistic BISR model with tailored likelihood function and prior terms. Subsequently, we employ the Monte Carlo EM algorithm to infer the degradation kernels for each pixel. Our method achieves a significant improvement over other state-of-the-art BISR methods, with an average improvement of 1 dB (2x).Code will be released at https://github.com/shaojieguoECNU/SVDSR.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Solving General Natural-Language-Description Optimization Problems with Large Language Models
Authors:
Jihai Zhang,
Wei Wang,
Siyan Guo,
Li Wang,
Fangquan Lin,
Cheng Yang,
Wotao Yin
Abstract:
Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p…
▽ More
Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat).
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
On a planar Pierce--Yung operator
Authors:
David Beltran,
Shaoming Guo,
Jonathan Hickman
Abstract:
We show that the operator \begin{equation*}
\mathcal{C} f(x,y) := \sup_{v\in \mathbb{R}} \Big|\mathrm{p.v.} \int_{\mathbb{R}} f(x-t, y-t^2) e^{i v t^3} \frac{\mathrm{d} t}{t} \Big| \end{equation*} is bounded on $L^p(\mathbb{R}^2)$ for every $1 < p < \infty$. This gives an affirmative answer to a question of Pierce and Yung.
We show that the operator \begin{equation*}
\mathcal{C} f(x,y) := \sup_{v\in \mathbb{R}} \Big|\mathrm{p.v.} \int_{\mathbb{R}} f(x-t, y-t^2) e^{i v t^3} \frac{\mathrm{d} t}{t} \Big| \end{equation*} is bounded on $L^p(\mathbb{R}^2)$ for every $1 < p < \infty$. This gives an affirmative answer to a question of Pierce and Yung.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Oscillatory integral operators and variable Schrödinger propagators: beyond the universal estimates
Authors:
Mingfeng Chen,
Shengwen Gan,
Shaoming Guo,
Jonathan Hickman,
Marina Iliopoulou,
James Wright
Abstract:
We consider a class of Hörmander-type oscillatory integral operators in $\mathbb{R}^n$ for $n \geq 3$ odd with real analytic phase. We derive weak conditions on the phase which ensure $L^p$ bounds beyond the universal $p \geq 2 \cdot \frac{n+1}{n-1}$ range guaranteed by Stein's oscillatory integral theorem. This expands and elucidates pioneering work of Bourgain from the early 1990s. We also consi…
▽ More
We consider a class of Hörmander-type oscillatory integral operators in $\mathbb{R}^n$ for $n \geq 3$ odd with real analytic phase. We derive weak conditions on the phase which ensure $L^p$ bounds beyond the universal $p \geq 2 \cdot \frac{n+1}{n-1}$ range guaranteed by Stein's oscillatory integral theorem. This expands and elucidates pioneering work of Bourgain from the early 1990s. We also consider a closely related class of variable coefficient Schrödinger propagator-type operators, and show that the corresponding theory differs significantly from that of the Hörmander-type operators. The main ingredient in the proof is a curved Kakeya/Nikodym maximal function estimate. This is established by combining the polynomial method with certain uniform sublevel set estimates for real analytic functions. The sublevel set estimates are the main novelty in the argument and can be interpreted as a form of quantification of linear independence in the $C^ω$ category.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
AutoTask: Task Aware Multi-Faceted Single Model for Multi-Task Ads Relevance
Authors:
Shouchang Guo,
Sonam Damani,
Keng-hao Chang
Abstract:
Ads relevance models are crucial in determining the relevance between user search queries and ad offers, often framed as a classification problem. The complexity of modeling increases significantly with multiple ad types and varying scenarios that exhibit both similarities and differences. In this work, we introduce a novel multi-faceted attention model that performs task aware feature combination…
▽ More
Ads relevance models are crucial in determining the relevance between user search queries and ad offers, often framed as a classification problem. The complexity of modeling increases significantly with multiple ad types and varying scenarios that exhibit both similarities and differences. In this work, we introduce a novel multi-faceted attention model that performs task aware feature combination and cross task interaction modeling. Our technique formulates the feature combination problem as "language" modeling with auto-regressive attentions across both feature and task dimensions. Specifically, we introduce a new dimension of task ID encoding for task representations, thereby enabling precise relevance modeling across diverse ad scenarios with substantial improvement in generality capability for unseen tasks. We demonstrate that our model not only effectively handles the increased computational and maintenance demands as scenarios proliferate, but also outperforms generalized DNN models and even task-specific models across a spectrum of ad applications using a single unified model.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Novel Models for High-Dimensional Imaging: High-Resolution fMRI Acceleration and Quantification
Authors:
Shouchang Guo
Abstract:
The goals of functional Magnetic Resonance Imaging (fMRI) include high spatial and temporal resolutions with a high signal-to-noise ratio (SNR). To simultaneously improve spatial and temporal resolutions and maintain the high SNR advantage of OSSI, we present novel pipelines for fast acquisition and high-resolution fMRI reconstruction and physics parameter quantification. We propose a patch-tensor…
▽ More
The goals of functional Magnetic Resonance Imaging (fMRI) include high spatial and temporal resolutions with a high signal-to-noise ratio (SNR). To simultaneously improve spatial and temporal resolutions and maintain the high SNR advantage of OSSI, we present novel pipelines for fast acquisition and high-resolution fMRI reconstruction and physics parameter quantification. We propose a patch-tensor low-rank model, a physics-based manifold model, and a voxel-wise attention network. With novel models for acquisition and reconstruction, we demonstrate that we can improve SNR and resolution simultaneously without compromising scan time. All the proposed models outperform other comparison approaches with higher resolution and more functional information.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Multiple boundary states in bilayer and decorated Su-Schrieffer-Heeger-like models
Authors:
Shengqun Guo,
Jinke Huang,
Ruimin Huang,
Fengjiang Zhuang,
Zhili Lin,
Weibin Qiu
Abstract:
Topological boundary states have attracted widespread fascination due to their series of intriguing properties. In this paper, we investigate the multiple boundary states within the two kinds of extended Su-Schrieffer-Heeger (SSH) models. The coexistence of boundary states that exist both in the bulk and band gaps is realized based on the bilayer SSH-like model, which consists of two conventional…
▽ More
Topological boundary states have attracted widespread fascination due to their series of intriguing properties. In this paper, we investigate the multiple boundary states within the two kinds of extended Su-Schrieffer-Heeger (SSH) models. The coexistence of boundary states that exist both in the bulk and band gaps is realized based on the bilayer SSH-like model, which consists of two conventional square-root SSH models that are directly coupled. We further show the square-root topology within the decorated SSH-like model, which supports multiple boundary states that could be embedded into the bulk continuum by tuning the hopping parameters. In addition, the connection between the decorated SSH-like model and its effectively decomposed counterparts is revealed. Our results broaden insight into the multiple boundary states and open up an exciting avenue for the future exploration of square-root topology.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Gradient Diffusion: A Perturbation-Resilient Gradient Leakage Attack
Authors:
Xuan Liu,
Siqi Cai,
Qihua Zhou,
Song Guo,
Ruibin Li,
Kaiwei Lin
Abstract:
Recent years have witnessed the vulnerability of Federated Learning (FL) against gradient leakage attacks, where the private training data can be recovered from the exchanged gradients, making gradient protection a critical issue for the FL training process. Existing solutions often resort to perturbation-based mechanisms, such as differential privacy, where each participating client injects a spe…
▽ More
Recent years have witnessed the vulnerability of Federated Learning (FL) against gradient leakage attacks, where the private training data can be recovered from the exchanged gradients, making gradient protection a critical issue for the FL training process. Existing solutions often resort to perturbation-based mechanisms, such as differential privacy, where each participating client injects a specific amount of noise into local gradients before aggregating to the server, and the global distribution variation finally conceals the gradient privacy. However, perturbation is not always the panacea for gradient protection since the robustness heavily relies on the injected noise. This intuition raises an interesting question: \textit{is it possible to deactivate existing protection mechanisms by removing the perturbation inside the gradients?} In this paper, we present the answer: \textit{yes} and propose the Perturbation-resilient Gradient Leakage Attack (PGLA), the first attempt to recover the perturbed gradients, without additional access to the original model structure or third-party data. Specifically, we leverage the inherent diffusion property of gradient perturbation protection and construct a novel diffusion-based denoising model to implement PGLA. Our insight is that capturing the disturbance level of perturbation during the diffusion reverse process can release the gradient denoising capability, which promotes the diffusion model to generate approximate gradients as the original clean version through adaptive sampling steps. Extensive experiments demonstrate that PGLA effectively recovers the protected gradients and exposes the FL training process to the threat of gradient leakage, achieving the best quality in gradient denoising and data recovery compared to existing models. We hope to arouse public attention on PGLA and its defense.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
GriDB: Scaling Blockchain Database via Sharding and Off-Chain Cross-Shard Mechanism
Authors:
Zicong Hong,
Song Guo,
Enyuan Zhou,
Wuhui Chen,
Huawei Huang,
Albert Zomaya
Abstract:
Blockchain databases have attracted widespread attention but suffer from poor scalability due to underlying non-scalable blockchains. While blockchain sharding is necessary for a scalable blockchain database, it poses a new challenge named on-chain cross-shard database services. Each cross-shard database service (e.g., cross-shard queries or inter-shard load balancing) involves massive cross-shard…
▽ More
Blockchain databases have attracted widespread attention but suffer from poor scalability due to underlying non-scalable blockchains. While blockchain sharding is necessary for a scalable blockchain database, it poses a new challenge named on-chain cross-shard database services. Each cross-shard database service (e.g., cross-shard queries or inter-shard load balancing) involves massive cross-shard data exchanges, while the existing cross-shard mechanisms need to process each cross-shard data exchange via the consensus of all nodes in the related shards (i.e., on-chain) to resist a Byzantine environment of blockchain, which eliminates sharding benefits. To tackle the challenge, this paper presents GriDB, the first scalable blockchain database, by designing a novel off-chain cross-shard mechanism for efficient cross-shard database services. Borrowing the idea of off-chain payments, GriDB delegates massive cross-shard data exchange to a few nodes, each of which is randomly picked from a different shard. Considering the Byzantine environment, the untrusted delegates cooperate to generate succinct proof for cross-shard data exchanges, while the consensus is only responsible for the low-cost proof verification. However, different from payments, the database services' verification has more requirements (e.g., completeness, correctness, freshness, and availability); thus, we introduce several new authenticated data structures (ADS). Particularly, we utilize consensus to extend the threat model and reduce the complexity of traditional accumulator-based ADS for verifiable cross-shard queries with a rich set of relational operators. Moreover, we study the necessity of inter-shard load balancing for a scalable blockchain database and design an off-chain and live approach for both efficiency and availability during balancing.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Simplifying Kinematic Parameter Estimation in sEMG Prosthetic Hands: A Two-Point Approach
Authors:
Gang Liu,
Zhenxiang Wang,
Ziyang He,
Shanshan Guo,
Rui Zhang,
Dezhong Yao
Abstract:
Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict…
▽ More
Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict kinematic parameters. Finger flexion is recorded as 1, extension as -1, and a near-linear model is employed to interpolate intermediate values, offering a viable alternative for kinematic data. We validated the approach with twenty participants through offline analysis and online experiments. The offline analysis confirmed the model's capability to fill in intermediate points and the online experiments demonstrated that participants could control gestures, adjust force accurately. This study significantly reduces the complexity of collecting dynamic parameters in EMG-based regression prosthetics, thus enhancing usability for prosthetic hands.
△ Less
Submitted 1 May, 2024;
originally announced July 2024.
-
Data-driven methods for flow and transport in porous media: a review
Authors:
Guang Yang,
Ran Xu,
Yusong Tian,
Songyuan Guo,
Jingyi Wu,
Xu Chu
Abstract:
This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in…
▽ More
This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in accurately representing complex, heterogeneous structures, can still potentially be addressed by state-of-the-art data-driven methods. We analyzed the synergistic potential of these methods, addressed their limitations, and suggested how they can be effectively integrated to improve both the fidelity and efficiency of current research. A discussion on future research directions in this field was conducted, emphasizing the need for collaborative efforts that combine domain expertise in physics and advanced computationald and data-driven methodologies.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models
Authors:
Shouchang Guo,
Sonam Damani,
Keng-hao Chang
Abstract:
In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fin…
▽ More
In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fine-tuning, as it involves optimizing partial inputs of language models to produce desired outputs.
In this work, we aim to further reduce the amount of trainable parameters required for a language model to perform well on specific tasks. We propose Low-rank Prompt Tuning (LoPT), a low-rank model for prompts that achieves efficient prompt optimization. The proposed method demonstrates similar outcomes to full parameter prompt tuning while reducing the number of trainable parameters by a factor of 5. It also provides promising results compared to the state-of-the-art methods that would require 10 to 20 times more parameters.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
An Interpretable and Efficient Sleep Staging Algorithm: DetectsleepNet
Authors:
Shengwei Guo
Abstract:
Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep s…
▽ More
Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep status, significantly improving efficiency and reducing the cost of sleep monitoring, especially for continuous sleep monitoring. Most of the existing models, however, are deficient in computational efficiency, lightweight design, and model interpretability. In this paper, we propose a neural network architecture based on the prior knowledge of sleep experts. Specifically, 1) Propose an end-to-end model named DetectsleepNet that uses single-channel EEG signals without additional data processing, which has achieved an impressive 80.9% accuracy on the SHHS dataset and an outstanding 88.0% accuracy on the Physio2018 dataset. 2) Constructure an efficient lightweight sleep staging model named DetectsleepNet-tiny based on DetectsleepNet, which has just 6% of the parameter numbers of existing models, but its accuracy exceeds 99% of state-of-the-art models, 3) Introducing a specific inference header to assess the attention given to a specific EEG segment in each sleep frame, enhancing the transparency in the decisions of models. Our model comprises fewer parameters compared to existing ones and ulteriorly explores the interpretability of the model to facilitate its application in healthcare. The code is available at https://github.com/komdec/DetectSleepNet.git.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Continuous drive heterodyne microwave sensing with spin qubits in hexagonal boron nitride
Authors:
Charlie J. Patrickson,
Valentin Haemmerli,
Shi Guo,
Andrew J. Ramsay,
Isaac J. Luxmoore
Abstract:
Quantum sensors that use solid state spin defects have emerged as effective probes of weak alternating magnetic signals. By recording the phase of a signal relative to an external clock, these devices can resolve signal frequencies to a precision orders of magnitude longer than the spin state lifetime. However, these quantum heterodyne protocols suffer from sub-optimal sensitivity, as they are cur…
▽ More
Quantum sensors that use solid state spin defects have emerged as effective probes of weak alternating magnetic signals. By recording the phase of a signal relative to an external clock, these devices can resolve signal frequencies to a precision orders of magnitude longer than the spin state lifetime. However, these quantum heterodyne protocols suffer from sub-optimal sensitivity, as they are currently limited to pulsed spin control techniques, which are susceptible to cumulative pulse-area errors, or single continuous drives which offer no protection of the spin coherence. Here, we present a control scheme based on a continuous microwave drive that extends spin coherence towards the effective $T_2 \approx \frac{1}{2}T_1$ limit and can resolve the frequency, amplitude and phase of GHz magnetic fields. The scheme is demonstrated using an ensemble of boron vacancies in hexagonal boron nitride, and achieves an amplitude sensitivity of $η\approx 3-5 \:\mathrm{μT \sqrt{Hz}}$ and phase sensitivity of $η_φ \approx 0.076 \:\mathrm{rads \sqrt{Hz}}$. By repeatedly referencing the phase of a resonant signal against the coherent continuous microwave drive in a quantum heterodyne demonstration, we measure a GHz signal with a resolution $<$1 Hz over a 10 s measurement. Achieving this level of performance in a two-dimensional material platform could have broad applications, from probing nanoscale condensed matter systems to integration into heterostructures for quantum networking.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels
Authors:
Yanhu Wang,
Shuaishuai Guo,
Anming Dong,
Hui Zhao
Abstract:
Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference cha…
▽ More
Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference channels, where we propose an interference-robust semantic communication (IRSC) scheme. This scheme involves the development of transceivers based on neural networks (NNs), which integrate channel state information (CSI) either solely at the receiver or at both transmitter and receiver ends. Moreover, we establish a composite loss function for training IRSC transceivers, along with a dynamic mechanism for updating the weights of various components in the loss function to enhance system fairness among users. Experimental results demonstrate that the proposed IRSC scheme effectively learns to mitigate interference and outperforms baseline approaches, particularly in low signal-to-noise (SNR) regimes.
△ Less
Submitted 10 April, 2024;
originally announced June 2024.
-
This Looks Better than That: Better Interpretable Models with ProtoPNeXt
Authors:
Frank Willard,
Luke Moffett,
Emmanuel Mokel,
Jon Donnelly,
Stark Guo,
Julia Yang,
Giyoung Kim,
Alina Jade Barnett,
Cynthia Rudin
Abstract:
Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), w…
▽ More
Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), we create a new framework for integrating components of prototypical-part models -- ProtoPNeXt. Using ProtoPNeXt, we show that applying Bayesian hyperparameter tuning and an angular prototype similarity metric to the original ProtoPNet is sufficient to produce new state-of-the-art accuracy for prototypical-part models on CUB-200 across multiple backbones. We further deploy this framework to jointly optimize for accuracy and prototype interpretability as measured by metrics included in ProtoPNeXt. Using the same resources, this produces models with substantially superior semantics and changes in accuracy between +1.3% and -1.5%. The code and trained models will be made publicly available upon publication.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Non-abelian extensions of Lie triple systems and Wells exact sequences
Authors:
Qinxiu Sun,
Shuangjian Guo
Abstract:
In this paper, we investigate non-abelian extensions and inducibility of pairs of automorphisms of Lie triple systems.
First, we introduce non-abelian cohomology groups and classify the non-abelian extensions in terms of non-abelian cohomology groups.
Next, we characterize the non-abelian extensions using Maurer-Cartan elements. Furthermore, we explore the inducibility of pairs of automorphism…
▽ More
In this paper, we investigate non-abelian extensions and inducibility of pairs of automorphisms of Lie triple systems.
First, we introduce non-abelian cohomology groups and classify the non-abelian extensions in terms of non-abelian cohomology groups.
Next, we characterize the non-abelian extensions using Maurer-Cartan elements. Furthermore, we explore the inducibility of pairs of automorphisms and derive the analog Wells exact sequences under the circumstance of Lie triple systems. Finally, we state the previous results under the context of abelian extensions of Lie triple systems.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
IRASim: Learning Interactive Real-Robot Action Simulators
Authors:
Fangqi Zhu,
Hongtao Wu,
Song Guo,
Yuxiao Liu,
Chilam Cheang,
Tao Kong
Abstract:
Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext…
▽ More
Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate extremely realistic videos of a robot arm that executes a given action trajectory, starting from an initial given frame. To validate the effectiveness of our method, we create a new benchmark, IRASim Benchmark, based on three real-robot datasets and perform extensive experiments on the benchmark. Results show that IRASim outperforms all the baseline methods and is more preferable in human evaluations. We hope that IRASim can serve as an effective and scalable approach to enhance robot learning in the real world. To promote research for generative real-robot action simulators, we open-source code, benchmark, and checkpoints at https: //gen-irasim.github.io.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark
Authors:
Tao Han,
Song Guo,
Zhenghao Chen,
Wanghan Xu,
Lei Bai
Abstract:
Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific…
▽ More
Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from significant limitations, such as small sizes, limited temporal coverage, and a lack of comprehensive variables. These shortcomings prevent them from effectively reflecting the benchmarks of current forecasting methods and fail to support the real needs of operational weather forecasting. To address these challenges, we present the WEATHER-5K dataset. This dataset comprises a comprehensive collection of data from 5,672 weather stations worldwide, spanning a 10-year period with one-hour intervals. It includes multiple crucial weather elements, providing a more reliable and interpretable resource for forecasting. Furthermore, our WEATHER-5K dataset can serve as a benchmark for comprehensively evaluating existing well-known forecasting models, extending beyond GSWF methods to support future time-series research challenges and opportunities. The dataset and benchmark implementation are publicly available at: https://github.com/taohan10200/WEATHER-5K.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
Authors:
Patrik Reizinger,
Siyuan Guo,
Ferenc Huszár,
Bernhard Schölkopf,
Wieland Brendel
Abstract:
Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed)…
▽ More
Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed) data. We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning under the lens of exchangeability. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non--i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results. We hope this work will pave the way for further research in causal representation learning.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth
Authors:
Taixia Shi,
Dingding Liang,
Lu Wang,
Lin Li,
Shaogang Guo,
Jiawei Gao,
Xiaowei Li,
Chulun Lin,
Lei Shi,
Baogang Ding,
Shiyang Liu,
Fangyi Yang,
Chi Jiang,
Yang Chen
Abstract:
In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.…
▽ More
In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz. The IF LFM signal is converted to the optical domain via an intensity modulator and then filtered by a fiber Bragg grating (FBG) to generate only two 2nd-order optical LFM sidebands. In radar detection, the two optical LFM sidebands beat with each other to generate a frequency-and-bandwidth-quadrupled LFM signal, which is used for ranging, radial velocity measurement, and imaging. By changing the center frequency of the IF LFM signal, the radar function can be operated within 8 to 40 GHz. In spectrum sensing, one 2nd-order optical LFM sideband is selected by another FBG, which then works in conjunction with the stimulated Brillouin scattering gain spectrum to map the frequency of the signal under test to time with an instantaneous measurement bandwidth of 2 GHz. By using a frequency shift module to adjust the pump frequency, the frequency measurement range can be adjusted from 0 to 40 GHz. The prototype is comprehensively studied and tested, which is capable of achieving a range resolution of 3.75 cm, a range error of less than $\pm$ 2 cm, a radial velocity error within $\pm$ 1 cm/s, delivering clear imaging of multiple small targets, and maintaining a frequency measurement error of less than $\pm$ 7 MHz and a frequency resolution of better than 20 MHz.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
Authors:
Ziyang Meng,
Yu Dai,
Zezheng Gong,
Shaoxiong Guo,
Minglong Tang,
Tongquan Wei
Abstract:
Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha…
▽ More
Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in hallucinations and incorrect responses in GUI comprehension. To address these issues, we introduce VGA, a fine-tuned model designed for comprehensive GUI understanding. Our model aims to enhance the interpretation of visual data of GUI and reduce hallucinations. We first construct a Vision Question Answering (VQA) dataset of 63.8k high-quality examples with our propose Referent Method, which ensures the model's responses are highly depend on visual content within the image. We then design a two-stage fine-tuning method called Foundation and Advanced Comprehension (FAC) to enhance both the model's ability to extract information from image content and alignment with human intent. Experiments show that our approach enhances the model's ability to extract information from images and achieves state-of-the-art results in GUI understanding tasks. Our dataset and fine-tuning script will be released soon.
△ Less
Submitted 21 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Individually Addressed Entangling Gates in a Two-Dimensional Ion Crystal
Authors:
Y. -H. Hou,
Y. -J. Yi,
Y. -K. Wu,
Y. -Y. Chen,
L. Zhang,
Y. Wang,
Y. -L. Xu,
C. Zhang,
Q. -X. Mei,
H. -X. Yang,
J. -Y. Ma,
S. -A. Guo,
J. Ye,
B. -X. Qi,
Z. -C. Zhou,
P. -Y. Hou,
L. -M. Duan
Abstract:
Two-dimensional (2D) ion crystals have become a promising way to scale up qubit numbers for ion trap quantum information processing. However, to realize universal quantum computing in this system, individually addressed high-fidelity two-qubit entangling gates still remain challenging due to the inevitable micromotion of ions in a 2D crystal as well as the technical difficulty in 2D addressing. He…
▽ More
Two-dimensional (2D) ion crystals have become a promising way to scale up qubit numbers for ion trap quantum information processing. However, to realize universal quantum computing in this system, individually addressed high-fidelity two-qubit entangling gates still remain challenging due to the inevitable micromotion of ions in a 2D crystal as well as the technical difficulty in 2D addressing. Here we demonstrate two-qubit entangling gates between any ion pairs in a 2D crystal of four ions. We use symmetrically placed crossed acousto-optic deflectors (AODs) to drive Raman transitions and achieve an addressing crosstalk error below 0.1%. We design and demonstrate a gate sequence by alternatingly addressing two target ions, making it compatible with any single-ion addressing techniques without crosstalk from multiple addressing beams. We further examine the gate performance versus the micromotion amplitude of the ions and show that its effect can be compensated by a recalibration of the laser intensity without degrading the gate fidelity. Our work paves the way for ion trap quantum computing with hundreds to thousands of qubits on a 2D ion crystal.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Novae: An Important Source of Lithium in the Galaxy
Authors:
Jun Gao,
Chunhua Zhu,
Guoliang Lü,
Jinlong Yu,
Lin Li,
Helei Liu,
Sufen Guo
Abstract:
The source of the Galactic Lithium (Li) has long been a puzzle. With the discovery of Li in novae, extensive research has been conducted. However, there still exists a significant disparity between the observed abundance of lithium in novae and the existing theoretical predictions. Using the Modules for Experiments in Stellar Astrophysics (MESA), we simulate the evolution of nova with element diff…
▽ More
The source of the Galactic Lithium (Li) has long been a puzzle. With the discovery of Li in novae, extensive research has been conducted. However, there still exists a significant disparity between the observed abundance of lithium in novae and the existing theoretical predictions. Using the Modules for Experiments in Stellar Astrophysics (MESA), we simulate the evolution of nova with element diffusion and appropriately increased the amount of 3^He in the mixtures. Element diffusion enhances the transport efficiency between the nuclear reaction zone and the convective region on the surface of the white dwarf during nova eruptions, which results in more 7^Be to be transmitted to the white dwarf surface and ultimately ejected. Compared to the previous predictions, the abundance of 7^Be in novae simulated in our model significantly increases. And the result is able to explain almost all observed novae. Using the method of population synthesis, we calculate Li yield in the Galaxy. We find that the Galactic occurrence rate of nova is about 130 yr^{-1}, and about 110M Li produced by nova eruption is ejected into the interstellar medium (ISM). About 73\% of Li in the Galactic ISM originates from novae, and approximately 15\%-20\% of the entire Galaxy. It means that novae are the important source of Li in the Galactic.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Pixel-scale NIR-VIS Spectral Routers Based on 2D Mie-type Metagratings
Authors:
Yifan Shao,
Shuhan Guo,
Rui Chen,
Yongdi Dang,
Yi Zhou,
Yubo Wang,
Junjie Zhan,
Jiaqi Yu,
Bing-Feng Ju,
Yungui Ma
Abstract:
The out-of-band energy loss caused by in-built color filters significantly degrades the signal-to-noise ratio and the dynamic range of conventional image sensors, which has restricted the attempt to develop ultrahigh-density imaging devices by merely shrinking the pixel size. This issue will be more serious for security cameras which need to collect visible (VIS) light and near-infrared (NIR) phot…
▽ More
The out-of-band energy loss caused by in-built color filters significantly degrades the signal-to-noise ratio and the dynamic range of conventional image sensors, which has restricted the attempt to develop ultrahigh-density imaging devices by merely shrinking the pixel size. This issue will be more serious for security cameras which need to collect visible (VIS) light and near-infrared (NIR) photons as well. The existing solutions mostly explore complex photonic nanostructures, which are often too complicated for production. In this work, we demonstrate a pixel-scale spectral router utilizing two-dimensional (2D) Si3N4 Mie scattering metagratings that can spatially divide NIR (850 nm) and VIS (400-700 nm) light to different pixels at high efficiencies. It has a minimum feature size larger than 360 nm, highly promising for massive production. Compared with the traditional filter design, our router can gain about 42% and 30% signal enhancement for NIR and VIS band, respectively. We show that it also has good polarization insensitivity and incident angle tolerance. The NIR-VIS simultaneous imaging is inspected without any complex reconstruction algorithm. Mode analysis indicates that the multipolar scattering of our Mie-type metagratings provides the necessary degrees of freedom to spatially optimize the routing functions for broadband photons.
△ Less
Submitted 24 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Valley polarization in twisted altermagnetism
Authors:
San-Dong Guo,
Yichen Liu,
Cheng-Cheng Liu
Abstract:
The combination of altermagnetism, twistronics and valleytronics is of great significance for potential applications in advanced electronic devices. Twisted magnetic van der Waals bilayers have been identified as an ideal platform for altermagnetism of any type, such as $d$-wave, $g$-wave, and $i$-wave, by choosing the constituent monolayer with specific symmetry [arXiv:2404.17146 (2024)]. Here, w…
▽ More
The combination of altermagnetism, twistronics and valleytronics is of great significance for potential applications in advanced electronic devices. Twisted magnetic van der Waals bilayers have been identified as an ideal platform for altermagnetism of any type, such as $d$-wave, $g$-wave, and $i$-wave, by choosing the constituent monolayer with specific symmetry [arXiv:2404.17146 (2024)]. Here, we propose a way for achieving valley polarization in twisted altermagnetism by applying out-of-plane external electric field. Since the out-of-plane electric field creates a layer-dependent electrostatic potential, the valleys form different layers will stagger, producing valley polarization. We also demonstrate the effectiveness of our proposed way using the twisted tight-binding model. It is found that the applied electric field can also induce valley/spin-gapless semiconductor and half metal besides valley polarization. Based on first-principles calculations, our proposed way to achieve valley polarization can be verified in twisted bilayer VOBr and monolayer $\mathrm{Ca(CoN)_2}$ as a special twisted altermagnet. These findings provide new opportunities for innovative spintronics, twistronics and valleytronics applications.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
Authors:
Jie Feng,
Yuwei Du,
Tianhui Liu,
Siqi Guo,
Yuming Lin,
Yong Li
Abstract:
Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the ca…
▽ More
Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks by building a city-scale world model in the model. First, we construct a diverse instruction tuning dataset CityInstruction for injecting urban knowledge and enhancing spatial reasoning capability effectively. By using a mixture of CityInstruction and general instruction data, we fine-tune various LLMs (e.g., ChatGLM3-6B, Qwen1.5 and LLama3 series) to enhance their capability without sacrificing general abilities. To further validate the effectiveness of proposed methods, we construct a comprehensive benchmark CityEval to evaluate the capability of LLMs on diverse urban scenarios and problems. Extensive evaluation results demonstrate that small LLMs trained with CityInstruction can achieve competitive performance with commercial LLMs in the comprehensive evaluation of CityEval. The source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityGPT.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CityBench: Evaluating the Capabilities of Large Language Model as World Model
Authors:
Jie Feng,
Jun Zhang,
Junbo Yan,
Xin Zhang,
Tianjian Ouyang,
Tianhui Liu,
Yuwei Du,
Siqi Guo,
Yong Li
Abstract:
Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still…
▽ More
Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still lacking. The challenge in constructing a systematic evaluation benchmark for the urban domain lies in the diversity of data and scenarios, as well as the complex and dynamic nature of cities. In this paper, we propose CityBench, an interactive simulator based evaluation platform, as the first systematic evaluation benchmark for the capability of LLMs for urban domain. First, we build CitySim to integrate the multi-source data and simulate fine-grained urban dynamics. Based on CitySim, we design 7 tasks in 2 categories of perception-understanding and decision-making group to evaluate the capability of LLMs as city-scale world model for urban domain. Due to the flexibility and ease-of-use of CitySim, our evaluation platform CityBench can be easily extended to any city in the world. We evaluate 13 well-known LLMs including open source LLMs and commercial LLMs in 13 cities around the world. Extensive experiments demonstrate the scalability and effectiveness of proposed CityBench and shed lights for the future development of LLMs in urban domain. The dataset, benchmark and source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityBench
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Lewis Acidity and Basicity Diagnostics of Molten Salt for its Properties and Structure Online Monitoring
Authors:
Changzu Zhu,
Jia Song,
Xiaorui Xu,
Chengyu Wang,
Yang Tong,
Lve Lin,
Shaoqiang Guo,
Wentao Zhou,
Adrien Couet,
Yafei Wang
Abstract:
Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscop…
▽ More
Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscopy. With the accumulation of acidity-basicity data of NaCl-AlCl3 molten salt for a variety of compositions, the correlation between the acidity-basicity of salt and its measured fundamental properties are derived. To understand the physical and chemical features controlling the acidity-basicity variations, the structures of NaCl-xAlCl3 molten salts with different chemical compositions are investigated in terms of bonded complexes and coordination numbers. The comprehensive understanding of the correlation between composition, acidity-basicity, properties, and structures of molten salt can serve for the full screening and online monitoring of salt melt in extreme environments by simply measuring the salt acidity-basicity as developed in this study.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Cohomologies of Reynolds Lie-Yamaguti algebras of any weight and applications
Authors:
Wen Teng,
Shuangjian Guo
Abstract:
The purpose of the present paper is to investigate cohomologies of Reynolds Lie-Yamaguti algebras of any weight and provide some applications. First, we introduce the notion of Reynolds Lie-Yamaguti algebras and give some new examples. Moreover, cohomologies of Reynolds operators and Reynolds Lie-Yamaguti algebras with coefficients in a suitable representation are established. Finally, formal defo…
▽ More
The purpose of the present paper is to investigate cohomologies of Reynolds Lie-Yamaguti algebras of any weight and provide some applications. First, we introduce the notion of Reynolds Lie-Yamaguti algebras and give some new examples. Moreover, cohomologies of Reynolds operators and Reynolds Lie-Yamaguti algebras with coefficients in a suitable representation are established. Finally, formal deformations and abelian extensions of Reynolds Lie-Yamaguti algebras are characterized in terms of lower degree cohomology groups.
△ Less
Submitted 6 March, 2024;
originally announced June 2024.
-
Rastall gravity: accretion disk image in radiation fields context and visual transformations compared to Reissner-Nordstrom black holes
Authors:
Yu-Xiang Huang,
Sen Guo,
Yu Liang,
Yu-Hao Cui,
Qing-Quan Jiang,
Kai Lin
Abstract:
Our study investigates the astronomical implications of Rastall gravity, particularly its behavior amidst a radiation field compared to Reissner-Nordstrom (RN) black holes. Our research delineates a crucial correlation between the dynamics of the accretion disk and the parameters Q and N_{\rm r}, which aptly reflect the influence of spacetime metrics on the disk's appearance. Elevated electric cha…
▽ More
Our study investigates the astronomical implications of Rastall gravity, particularly its behavior amidst a radiation field compared to Reissner-Nordstrom (RN) black holes. Our research delineates a crucial correlation between the dynamics of the accretion disk and the parameters Q and N_{\rm r}, which aptly reflect the influence of spacetime metrics on the disk's appearance. Elevated electric charge Q prompts contraction in the disk's orbit due to enhanced gravitational effects, while higher N_{\rm r} values lead to outward expansion, influenced by the radiation field's attributes. Interestingly, the charged black holes surrounded by radiation fields display distinct visual disparities from RN black holes. Brightness decreases and expansion occurs within the accretion disk's innermost stable circular orbit with rising N_{\rm r} values. Our study also reveals the process by which the accretion disk transitions from a conventional disk-like structure to a hat-like form at different observation angles, with the redshift effect gradually intensifying. Moreover, the results of the Rastall gravity radiation field we consider are consistent with the constraints of the host galaxy's gravitational lensing on the Rastall gravity parameters, enhancing the consistency between theoretical predictions and actual observations.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities
Authors:
Zihao He,
Rebecca Dorn,
Siyi Guo,
Minh Duc Chu,
Kristina Lerman
Abstract:
Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, a…
▽ More
Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community's online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune an foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method's utility in accurately representing political and fitness communities on Reddit. Unlike prior methods requiring human-authored instructions, Community-Cross-Instruct generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains. This work enables cost-effective and automated surveying of diverse online communities.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Validating an Instrument for Teachers' Acceptance of Artificial Intelligence in Education
Authors:
Shuchen Guo,
Lehong Shi,
Xiaoming Zhai
Abstract:
As artificial intelligence (AI) receives wider attention in education, examining teachers' acceptance of AI (TAAI) becomes essential. However, existing instruments measuring TAAI reported limited reliability and validity evidence and faced some design challenges, such as missing informed definitions of AI to participants. This study aimed to develop and validate a TAAI instrument, with providing s…
▽ More
As artificial intelligence (AI) receives wider attention in education, examining teachers' acceptance of AI (TAAI) becomes essential. However, existing instruments measuring TAAI reported limited reliability and validity evidence and faced some design challenges, such as missing informed definitions of AI to participants. This study aimed to develop and validate a TAAI instrument, with providing sufficient evidence for high psychometric quality. Based on the literature, we first identified five dimensions of TAAI, including perceived usefulness, perceived ease of use, behavioral intention, self-efficacy, and anxiety, and then developed items to assess each dimension. We examined the face and content validity using expert review and think-aloud with pre-service teachers. Using the revised instrument, we collected responses from 274 pre-service teachers and examined the item discriminations to identify outlier items. We employed the confirmatory factor analysis and Cronbach's alpha to examine the construct validity, convergent validity, discriminant validity, and reliability. Results confirmed the dimensionality of the scale, resulting in 27 items distributed in five dimensions. The study exhibits robust validity and reliability evidence for TAAI, thus affirming its usefulness as a valid measurement instrument.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Authors:
Chenyang Shi,
Shasha Guo,
Boyi Wei,
Hanxiao Liu,
Yibo Zhang,
Ningfang Song,
Jing Jin
Abstract:
Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evalu…
▽ More
Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evaluations. Moreover, the majority of these metrics are monotonic, which can inflate scores by removing substantial noise and valid events. To overcome these limitations, we propose the first label-free and non-monotonic evaluation metric, the area of the continuous contrast curve (AOCC), which utilizes the area enclosed by event frame contrast curves across different time intervals. This metric is inspired by how events capture the edge contours of scenes or objects with high temporal resolution. An effective denoising method removes noise without eliminating these edge-contour events, thus preserving the contrast of event frames. Consequently, contrast across various time ranges serves as a metric to assess denoising effectiveness. As the time interval lengthens, the curve will initially rise and then fall. The proposed metric is validated through both theoretical and experimental evidence.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization
Authors:
Ziran Zhang,
Yongrui Ma,
Yueting Chen,
Feng Zhang,
Jinwei Gu,
Tianfan Xue,
Shi Guo
Abstract:
Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr…
▽ More
Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably trailing artifacts and signal latency, which hinder their direct applicability and generalization. Addressing these issues, we propose a novel per-scene optimization strategy tailored for low-light conditions. This approach utilizes the internal statistics of a sequence to handle degraded event data under low-light conditions, improving the generalizability to different lighting and camera settings. To evaluate its robustness in low-light condition, we further introduce EVFI-LL, a unique RGB+Event dataset captured under low-light conditions. Our results demonstrate state-of-the-art performance in low-light environments. Both the dataset and the source code will be made publicly available upon publication. Project page: https://naturezhanghn.github.io/sim2real.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Authors:
Zhengrui Ma,
Qingkai Fang,
Shaolei Zhang,
Shoutao Guo,
Yang Feng,
Min Zhang
Abstract:
Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee…
▽ More
Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization between the speaker and listener. To overcome these challenges, we propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2X), which integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework. We develop a non-autoregressive decoder capable of concurrently generating multiple text or acoustic unit tokens upon receiving fixed-length speech chunks. The decoder can generate blank or repeated tokens and employ CTC decoding to dynamically adjust its latency. Experimental results show that NAST-S2X outperforms state-of-the-art models in both speech-to-text and speech-to-speech tasks. It achieves high-quality simultaneous interpretation within a delay of less than 3 seconds and provides a 28 times decoding speedup in offline generation.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models
Authors:
Shoutao Guo,
Shaolei Zhang,
Zhengrui Ma,
Min Zhang,
Yang Feng
Abstract:
Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies,…
▽ More
Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies, their translation performance is suboptimal. Conversely, Large Language Models (LLMs), trained on extensive corpora, possess superior generation capabilities, but it is difficult for them to acquire translation policy through the training methods of SiMT. Therefore, we introduce Agent-SiMT, a framework combining the strengths of LLMs and traditional SiMT methods. Agent-SiMT contains the policy-decision agent and the translation agent. The policy-decision agent is managed by a SiMT model, which determines the translation policy using partial source sentence and translation. The translation agent, leveraging an LLM, generates translation based on the partial source sentence. The two agents collaborate to accomplish SiMT. Experiments demonstrate that Agent-SiMT attains state-of-the-art performance.
△ Less
Submitted 12 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Anderson localization for CMV matrices with Verblunsky coefficients defined by the hyperbolic toral automorphism
Authors:
Yanxue Lin,
Shuzheng Guo,
Daxiong Piao
Abstract:
In this paper, we prove the large deviation estimates and Anderson localization for CMV matrices on $\ell^2(\mathbb{Z}_+)$ with Verblunsky coefficients defined dynamically by the hyperbolic toral automorphism. Part of positivity results on the Lyapunov exponents of Chulaevsky-Spencer and Anderson localization results of Bourgain-Schlag on Schrödinger operators with strongly mixing potentials are e…
▽ More
In this paper, we prove the large deviation estimates and Anderson localization for CMV matrices on $\ell^2(\mathbb{Z}_+)$ with Verblunsky coefficients defined dynamically by the hyperbolic toral automorphism. Part of positivity results on the Lyapunov exponents of Chulaevsky-Spencer and Anderson localization results of Bourgain-Schlag on Schrödinger operators with strongly mixing potentials are extended to CMV matrices.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Decoder-only Streaming Transformer for Simultaneous Translation
Authors:
Shoutao Guo,
Shaolei Zhang,
Yang Feng
Abstract:
Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we e…
▽ More
Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we explore the potential of Decoder-only architecture, owing to its superior performance in various tasks and its inherent compatibility with SiMT. However, directly applying the Decoder-only architecture to SiMT poses challenges in terms of training and inference. To alleviate the above problems, we propose the first Decoder-only SiMT model, named Decoder-only Streaming Transformer (DST). Specifically, DST separately encodes the positions of the source and target prefixes, ensuring that the position of the target prefix remains unaffected by the expansion of the source prefix. Furthermore, we propose a Streaming Self-Attention (SSA) mechanism tailored for the Decoder-only architecture. It is capable of obtaining translation policy by assessing the sufficiency of input source information and integrating with the soft-attention mechanism to generate translations. Experiments demonstrate that our approach achieves state-of-the-art performance on three translation tasks.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Authors:
Shaolei Zhang,
Qingkai Fang,
Shoutao Guo,
Zhengrui Ma,
Min Zhang,
Yang Feng
Abstract:
Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing…
▽ More
Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing a double challenge of translation and policy. In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. Adhering to a multi-task learning approach, StreamSpeech can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an "All-in-One" seamless model. Experiments on CVSS benchmark demonstrate that StreamSpeech achieves state-of-the-art performance in both offline S2ST and Simul-S2ST tasks. Besides, StreamSpeech is able to present high-quality intermediate results (i.e., ASR or translation results) during simultaneous translation process, offering a more comprehensive real-time communication experience.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Open Grounded Planning: Challenges and Benchmark Construction
Authors:
Shiguang Guo,
Ziliang Deng,
Hongyu Lin,
Yaojie Lu,
Xianpei Han,
Le Sun
Abstract:
The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments…
▽ More
The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments. However, both approaches exhibit significant discrepancies from the open and executable requirements in real-world planning. In this paper, we propose a new planning task--open grounded planning. The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set, thereby ensuring the executability of the produced plan. To this end, we establishes a benchmark for open grounded planning spanning a wide range of domains. Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains. The outcomes of this paper define and establish a foundational dataset for open grounded planning, and shed light on the potential challenges and future directions of LLM-based planning.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Near-Room-Temperature Field-Controllable Exchange Bias in 2D van der Waals Ferromagnet Fe3GaTe2
Authors:
Jifeng Shao,
Xiaolong Yin,
Chunhao Bao,
Sirong Lu,
Xiaoming Ma,
Shu Guo,
Le Wang,
Xi Zhang,
Zhiyue Li,
Longxiang Li,
Yue Zhao,
Tingyong Chen
Abstract:
Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a rob…
▽ More
Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a robust EB phenomenon in Fe3GaTe2 thin-layer devices, which significantly increases the blocking temperature to a near-room-temperature record of 280 K. Both the bias direction and magnitude can be isothermally tuned by adjusting the field sweep range, in striking contrast to the conventional EB in ferromagnetic/antiferromagnetic (FM/AFM) bilayers. We propose an exchange spring model in which crystal defects with higher coercivity act as the pivotal pinning source for the observed EB phenomenon, deviating from the conventional FM/AFM interface mechanism. Cumulative growth of minor loops and multiple magnetization reversal paths are observed in field cycles below the saturation field, consistent with the hard FM defects behavior of our exchange spring model. These findings provide insights into the complex magnetic order in 2D ferromagnets and open new avenues for developing practical ultrathin vdW spintronic devices with EB-like properties at room temperature.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
#EpiTwitter: Public Health Messaging During the COVID-19 Pandemic
Authors:
Ashwin Rao,
Nazanin Sabri,
Siyi Guo,
Louiqa Raschid,
Kristina Lerman
Abstract:
Effective communication during health crises is critical, with social media serving as a key platform for public health experts (PHEs) to engage with the public. However, it also amplifies pseudo-experts promoting contrarian views. Despite its importance, the role of emotional and moral language in PHEs' communication during COVID-19 remains under explored. This study examines how PHEs and pseudo-…
▽ More
Effective communication during health crises is critical, with social media serving as a key platform for public health experts (PHEs) to engage with the public. However, it also amplifies pseudo-experts promoting contrarian views. Despite its importance, the role of emotional and moral language in PHEs' communication during COVID-19 remains under explored. This study examines how PHEs and pseudo-experts communicated on Twitter during the pandemic, focusing on emotional and moral language and their engagement with political elites. Analyzing tweets from 489 PHEs and 356 pseudo-experts from January 2020 to January 2021, alongside public responses, we identified key priorities and differences in messaging strategy. PHEs prioritize masking, healthcare, education, and vaccines, using positive emotional language like optimism. In contrast, pseudo-experts discuss therapeutics and lockdowns more frequently, employing negative emotions like pessimism and disgust. Negative emotional and moral language tends to drive engagement, but positive language from PHEs fosters positivity in public responses. PHEs exhibit liberal partisanship, expressing more positivity towards liberals and negativity towards conservative elites, while pseudo-experts show conservative partisanship. These findings shed light on the polarization of COVID-19 discourse and underscore the importance of strategic use of emotional and moral language by experts to mitigate polarization and enhance public trust.
△ Less
Submitted 10 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Authors:
Yubo Wang,
Xueguang Ma,
Ge Zhang,
Yuansheng Ni,
Abhranil Chandra,
Shiguang Guo,
Weiming Ren,
Aaran Arulraj,
Xuan He,
Ziyan Jiang,
Tianle Li,
Max Ku,
Kai Wang,
Alex Zhuang,
Rongqi Fan,
Xiang Yue,
Wenhu Chen
Abstract:
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in…
▽ More
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field.
△ Less
Submitted 23 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Pretrained Hybrids with MAD Skills
Authors:
Nicholas Roberts,
Samuel Guo,
Zhiqi Gao,
Satya Sai Srinath Namburi GNVV,
Sonia Cromp,
Chengjun Wu,
Chengyu Duan,
Frederic Sala
Abstract:
While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: i…
▽ More
While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose $\textbf{Manticore}$, a framework that addresses these challenges. Manticore $\textit{automates the design of hybrid architectures}$ while reusing pretrained models to create $\textit{pretrained}$ hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then fine-tune hybrids that combine pretrained models from different architecture families -- such as the GPT series and Mamba -- end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to $\textit{program}$ pretrained hybrids to have certain capabilities. Manticore hybrids outperform existing manually-designed hybrids, achieve strong performance on Long Range Arena (LRA) tasks, and can improve on pretrained transformers and state space models.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Authors:
Ge Zhang,
Scott Qu,
Jiaheng Liu,
Chenchen Zhang,
Chenghua Lin,
Chou Leuang Yu,
Danny Pan,
Esther Cheng,
Jie Liu,
Qunshu Lin,
Raven Yuan,
Tuney Zheng,
Wei Pang,
Xinrun Du,
Yiming Liang,
Yinghao Ma,
Yizhi Li,
Ziyang Ma,
Bill Lin,
Emmanouil Benetos,
Huan Yang,
Junting Zhou,
Kaijing Ma,
Minghao Liu,
Morry Niu
, et al. (20 additional authors not shown)
Abstract:
Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl…
▽ More
Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs.
△ Less
Submitted 10 July, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.