-
A Hybrid Training-time and Run-time Defense Against Adversarial Attacks in Modulation Classification
Authors:
Lu Zhang,
Sangarapillai Lambotharan,
Gan Zheng,
Guisheng Liao,
Ambra Demontis,
Fabio Roli
Abstract:
Motivated by the superior performance of deep learning in many applications including computer vision and natural language processing, several recent studies have focused on applying deep neural network for devising future generations of wireless networks. However, several recent works have pointed out that imperceptible and carefully designed adversarial examples (attacks) can significantly deter…
▽ More
Motivated by the superior performance of deep learning in many applications including computer vision and natural language processing, several recent studies have focused on applying deep neural network for devising future generations of wireless networks. However, several recent works have pointed out that imperceptible and carefully designed adversarial examples (attacks) can significantly deteriorate the classification accuracy. In this paper, we investigate a defense mechanism based on both training-time and run-time defense techniques for protecting machine learning-based radio signal (modulation) classification against adversarial attacks. The training-time defense consists of adversarial training and label smoothing, while the run-time defense employs a support vector machine-based neural rejection (NR). Considering a white-box scenario and real datasets, we demonstrate that our proposed techniques outperform existing state-of-the-art technologies.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Countermeasures Against Adversarial Examples in Radio Signal Classification
Authors:
Lu Zhang,
Sangarapillai Lambotharan,
Gan Zheng,
Basil AsSadhan,
Fabio Roli
Abstract:
Deep learning algorithms have been shown to be powerful in many communication network design problems, including that in automatic modulation classification. However, they are vulnerable to carefully crafted attacks called adversarial examples. Hence, the reliance of wireless networks on deep learning algorithms poses a serious threat to the security and operation of wireless networks. In this let…
▽ More
Deep learning algorithms have been shown to be powerful in many communication network design problems, including that in automatic modulation classification. However, they are vulnerable to carefully crafted attacks called adversarial examples. Hence, the reliance of wireless networks on deep learning algorithms poses a serious threat to the security and operation of wireless networks. In this letter, we propose for the first time a countermeasure against adversarial examples in modulation classification. Our countermeasure is based on a neural rejection technique, augmented by label smoothing and Gaussian noise injection, that allows to detect and reject adversarial examples with high accuracy. Our results demonstrate that the proposed countermeasure can protect deep-learning based modulation classification systems against adversarial examples.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Homogeneous Distributed Observers for Quasilinear Systems
Authors:
Min Li,
Andrey Polyakov,
Siyuan Wang,
Gang Zheng
Abstract:
The problem of finite/fixed-time cooperative state estimation is considered for a class of quasilinear systems with nonlinearities satisfying a Hölder condition. A strongly connected nonlinear distributed observer is designed under the assumption of global observability. By proper parameter tuning with linear matrix inequalities, the observer error equation possesses finite/fixed-time stability in…
▽ More
The problem of finite/fixed-time cooperative state estimation is considered for a class of quasilinear systems with nonlinearities satisfying a Hölder condition. A strongly connected nonlinear distributed observer is designed under the assumption of global observability. By proper parameter tuning with linear matrix inequalities, the observer error equation possesses finite/fixed-time stability in the perturbation-free case and input-to-state stability with respect to bounded perturbations. Numerical simulations are performed to validate this design.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Authors:
Arindam Mitra,
Luciano Del Corro,
Guoqing Zheng,
Shweti Mahajan,
Dany Rouhana,
Andres Codas,
Yadong Lu,
Wei-ge Chen,
Olga Vrousgos,
Corby Rosset,
Fillipe Silva,
Hamed Khanpour,
Yash Lara,
Ahmed Awadallah
Abstract:
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually r…
▽ More
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching. We introduce AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. AgentInstruct can create both the prompts and responses, using only raw data sources like text documents and code files as seeds. We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc. The dataset can be used for instruction tuning of any base model. We post-train Mistral-7b with the data. When comparing the resulting model Orca-3 to Mistral-7b-Instruct (which uses the same base model), we observe significant improvements across many benchmarks. For example, 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH and 45% improvement on AlpacaEval. Additionally, it consistently outperforms other models such as LLAMA-8B-instruct and GPT-3.5-turbo.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Multiphase buffering by ammonia sustains sulfate production in atmospheric aerosols
Authors:
Guangjie Zheng,
Hang Su,
Meinrat O. Andreae,
Ulrich Pöschl,
Yafang Cheng
Abstract:
Multiphase oxidation of sulfur dioxide (SO2) is an important source of sulfate in the atmosphere. There are, however, concerns that protons produced during SO2 oxidation may cause rapid acidification of aerosol water and thereby quickly shut down the fast reactions favored at high pH. Here, we show that the sustainability of sulfate production is controlled by the competing effects of multiphase b…
▽ More
Multiphase oxidation of sulfur dioxide (SO2) is an important source of sulfate in the atmosphere. There are, however, concerns that protons produced during SO2 oxidation may cause rapid acidification of aerosol water and thereby quickly shut down the fast reactions favored at high pH. Here, we show that the sustainability of sulfate production is controlled by the competing effects of multiphase buffering and acidification, which can be well described by a characteristic buffering time, τbuff. We find that globally, τbuff is long enough (days) to sustain sulfate production over most populated regions, where the acidification of aerosol water is counteracted by the strong buffering effect of NH4+/NH3. Our results highlight the importance of anthropogenic ammonia emissions and pervasive human influences in shaping the chemical environment of the atmosphere.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Authors:
Wenqian Ye,
Guangtao Zheng,
Yunsheng Ma,
Xu Cao,
Bolin Lai,
James M. Rehg,
Aidong Zhang
Abstract:
Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. How…
▽ More
Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. However, whether spurious biases are prevalent in MLLMs remains under-explored. We mitigate this gap by analyzing the spurious biases in a multimodal setting, uncovering the specific test data patterns that can manifest this problem when biases in the vision model cascade into the alignment between visual and text tokens in MLLMs. To better understand this problem, we introduce MM-SpuBench, a comprehensive visual question-answering (VQA) benchmark designed to evaluate MLLMs' reliance on nine distinct categories of spurious correlations from five open-source image datasets. The VQA dataset is built from human-understandable concept information (attributes). Leveraging this benchmark, we conduct a thorough evaluation of current state-of-the-art MLLMs. Our findings illuminate the persistence of the reliance on spurious correlations from these models and underscore the urge for new methodologies to mitigate spurious biases. To support the MLLM robustness research, we release our VQA benchmark at https://huggingface.co/datasets/mmbench/MM-SpuBench.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Authors:
Yongting Zhang,
Lu Chen,
Guodong Zheng,
Yifeng Gao,
Rui Zheng,
Jinlan Fu,
Zhenfei Yin,
Senjie Jin,
Yu Qiao,
Xuanjing Huang,
Feng Zhao,
Tao Gui,
Jing Shao
Abstract:
The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To addr…
▽ More
The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To address these limitations, we propose a Safety Preference Alignment dataset for Vision Language Models named SPA-VL. In terms of breadth, SPA-VL covers 6 harmfulness domains, 13 categories, and 53 subcategories, and contains 100,788 samples of the quadruple (question, image, chosen response, rejected response). In terms of depth, the responses are collected from 12 open- (e.g., QwenVL) and closed-source (e.g., Gemini) VLMs to ensure diversity. The experimental results indicate that models trained with alignment techniques on the SPA-VL dataset exhibit substantial improvements in harmlessness and helpfulness while maintaining core capabilities. SPA-VL, as a large-scale, high-quality, and diverse dataset, represents a significant milestone in ensuring that VLMs achieve both harmlessness and helpfulness. We have made our code https://github.com/EchoseChen/SPA-VL-RLHF and SPA-VL dataset url https://huggingface.co/datasets/sqrti/SPA-VL publicly available.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG
Authors:
Xueying Du,
Geng Zheng,
Kaixin Wang,
Jiayi Feng,
Wentai Deng,
Mingwei Liu,
Bihuan Chen,
Xin Peng,
Tao Ma,
Yiling Lou
Abstract:
Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in…
▽ More
Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77.
△ Less
Submitted 19 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Catalytic evolution of cooperation in a population with behavioural bimodality
Authors:
Anhui Sheng,
Jing Zhang,
Guozhong Zheng,
Jiqiang Zhang,
Weiran Cai,
Li Chen
Abstract:
The remarkable adaptability of humans in response to complex environments is often demonstrated by the context-dependent adoption of different behavioral modes. However, the existing game-theoretic studies mostly focus on the single-mode assumption, and the impact of this behavioral multimodality on the evolution of cooperation remains largely unknown. Here, we study how cooperation evolves in a p…
▽ More
The remarkable adaptability of humans in response to complex environments is often demonstrated by the context-dependent adoption of different behavioral modes. However, the existing game-theoretic studies mostly focus on the single-mode assumption, and the impact of this behavioral multimodality on the evolution of cooperation remains largely unknown. Here, we study how cooperation evolves in a population with two behavioral modes. Specifically, we incorporate Q-learning and Tit-for-Tat (TFT) rules into our toy model, where prisoner's dilemma game is played and we investigate the impact of the mode mixture on the evolution of cooperation. While players in Q-learning mode aim to maximize their accumulated payoffs, players within TFT mode repeat what their neighbors have done to them. In a structured mixing implementation where the updating rule is fixed for each individual, we find that the mode mixture greatly promotes the overall cooperation prevalence. The promotion is even more significant in the probabilistic mixing, where players randomly select one of the two rules at each step. Finally, this promotion is robust when players are allowed to adaptively choose the two modes by real-time comparison. In all three scenarios, players within the Q-learning mode act as catalyzer that turns the TFT players to be more cooperative, and as a result drive the whole population to be highly cooperative. The analysis of Q-tables explains the underlying mechanism of cooperation promotion, which captures the ``psychologic evolution" in the players' mind. Our study indicates that the variety of behavioral modes is non-negligible, and could be crucial to clarify the emergence of cooperation in the real world.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Spuriousness-Aware Meta-Learning for Learning Robust Classifiers
Authors:
Guangtao Zheng,
Wenqian Ye,
Aidong Zhang
Abstract:
Spurious correlations are brittle associations between certain attributes of inputs and target variables, such as the correlation between an image background and an object class. Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold. Mitigating the impact of spurious correlations is crucial towards robust model gen…
▽ More
Spurious correlations are brittle associations between certain attributes of inputs and target variables, such as the correlation between an image background and an object class. Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold. Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data -- a strong assumption in practice. In this paper, we propose a novel learning framework based on meta-learning, termed SPUME -- SPUriousness-aware MEta-learning, to train an image classifier to be robust to spurious correlations. We design the framework to iteratively detect and mitigate the spurious correlations that the classifier excessively relies on for predictions. To achieve this, we first propose to utilize a pre-trained vision-language model to extract text-format attributes from images. These attributes enable us to curate data with various class-attribute correlations, and we formulate a novel metric to measure the degree of these correlations' spuriousness. Then, to mitigate the reliance on spurious correlations, we propose a meta-learning strategy in which the support (training) sets and query (test) sets in tasks are curated with different spurious correlations that have high degrees of spuriousness. By meta-training the classifier on these spuriousness-aware meta-learning tasks, our classifier can learn to be invariant to the spurious correlations. We demonstrate that our method is robust to spurious correlations without knowing them a priori and achieves the best on five benchmark datasets with different robustness measures.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection
Authors:
Wenjie Wang,
Yehao Lu,
Guangcong Zheng,
Shuigen Zhan,
Xiaoqing Ye,
Zichang Tan,
Jingdong Wang,
Gaoang Wang,
Xi Li
Abstract:
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we p…
▽ More
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. Specifically, instead of bringing the image features contained in a frustum point to a single BEV grid, BEVSpread considers each frustum point as a source and spreads the image features to the surrounding BEV grids with adaptive weights. To achieve superior propagation performance, a specific weight function is designed to dynamically control the decay speed of the weights according to distance and depth. Aided by customized CUDA parallel acceleration, BEVSpread achieves comparable inference time as the original voxel pooling. Extensive experiments on two large-scale roadside benchmarks demonstrate that, as a plug-in, BEVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin of (1.12, 5.26, 3.01) AP in vehicle, pedestrian and cyclist.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting
Authors:
Chaolv Zeng,
Zhanyu Liu,
Guanjie Zheng,
Linghe Kong
Abstract:
In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constr…
▽ More
In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constraints impede their effectiveness in modeling complex time series, particularly those with numerous variables. Additionally, many models adopt the Channel-Independent (CI) strategy, treating multivariate time series as uncorrelated univariate series while ignoring their correlations. For models considering inter-channel relationships, whether through the self-attention mechanism, linear combination, or convolution, they all incur high computational costs and focus solely on weighted summation relationships, neglecting potential proportional relationships between channels. In this work, we address these issues by leveraging the newly introduced state space model and propose \textbf{C-Mamba}, a novel approach that captures cross-channel dependencies while maintaining linear complexity without losing the global receptive field. Our model consists of two key components: (i) channel mixup, where two channels are mixed to enhance the training sets; (ii) channel attention enhanced patch-wise Mamba encoder that leverages the ability of the state space models to capture cross-time dependencies and models correlations between channels by mining their weight relationships. Our model achieves state-of-the-art performance on seven real-world time series datasets. Moreover, the proposed mixup and attention strategy exhibits strong generalizability across other frameworks.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
MagiNet: Mask-Aware Graph Imputation Network for Incomplete Traffic Data
Authors:
Jianping Zhou,
Bin Lu,
Zhanyu Liu,
Siyu Pan,
Xuejun Feng,
Hua Wei,
Guanjie Zheng,
Xinbing Wang,
Chenghu Zhou
Abstract:
Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, intro…
▽ More
Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, introducing inevitable noises. Moreover, we observe prevalent over-smoothing interpolations, falling short in revealing the intrinsic spatio-temporal correlations of incomplete traffic data. To this end, we propose Mask-Aware Graph imputation Network: MagiNet. Our method designs an adaptive mask spatio-temporal encoder to learn the latent representations of incomplete data, eliminating the reliance on pre-filling missing values. Furthermore, we devise a spatio-temporal decoder that stacks multiple blocks to capture the inherent spatial and temporal dependencies within incomplete traffic data, alleviating over-smoothing imputation. Extensive experiments demonstrate that our method outperforms state-of-the-art imputation methods on five real-world traffic datasets, yielding an average improvement of 4.31% in RMSE and 3.72% in MAPE.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Data and Model-Driven Deep Learning Approach to Robust Downlink Beamforming Optimization
Authors:
Kai Liang,
Gan Zheng,
Zan Li,
Kai-Kit Wong,
Chan-Byoung Chae
Abstract:
This paper investigates the optimization of the long-standing probabilistically robust transmit beamforming problem with channel uncertainties in the multiuser multiple-input single-output (MISO) downlink transmission. This problem poses significant analytical and computational challenges. Currently, the state-of-the-art optimization method relies on convex restrictions as tractable approximations…
▽ More
This paper investigates the optimization of the long-standing probabilistically robust transmit beamforming problem with channel uncertainties in the multiuser multiple-input single-output (MISO) downlink transmission. This problem poses significant analytical and computational challenges. Currently, the state-of-the-art optimization method relies on convex restrictions as tractable approximations to ensure robustness against Gaussian channel uncertainties. However, this method not only exhibits high computational complexity and suffers from the rank relaxation issue but also yields conservative solutions. In this paper, we propose an unsupervised deep learning-based approach that incorporates the sampling of channel uncertainties in the training process to optimize the probabilistic system performance. We introduce a model-driven learning approach that defines a new beamforming structure with trainable parameters to account for channel uncertainties. Additionally, we employ a graph neural network to efficiently infer the key beamforming parameters. We successfully apply this approach to the minimum rate quantile maximization problem subject to outage and total power constraints. Furthermore, we propose a bisection search method to address the more challenging power minimization problem with probabilistic rate constraints by leveraging the aforementioned approach. Numerical results confirm that our approach achieves non-conservative robust performance, higher data rates, greater power efficiency, and faster execution compared to state-of-the-art optimization methods.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting
Authors:
Zhanyu Liu,
Jianrong Ding,
Guanjie Zheng
Abstract:
The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While…
▽ More
The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While previous cross-city few-shot forecasting methods ignore the frequency similarity between cities, we have made an observation that the traffic data is more similar in the frequency domain between cities. Based on this fact, we propose a \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross}). FEPCross has a pre-training stage and a fine-tuning stage. In the pre-training stage, we propose a novel Cross-Domain Spatial-Temporal Encoder that incorporates the information of the time and frequency domain and trains it with self-supervised tasks encompassing reconstruction and contrastive objectives. In the fine-tuning stage, we design modules to enrich training samples and maintain a momentum-updated graph structure, thereby mitigating the risk of overfitting to the few-shot training data. Empirical evaluations performed on real-world traffic datasets validate the exceptional efficacy of FEPCross, outperforming existing approaches of diverse categories and demonstrating characteristics that foster the progress of cross-city few-shot forecasting.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
Authors:
Jianrong Ding,
Zhanyu Liu,
Guanjie Zheng,
Haiming Jin,
Linghe Kong
Abstract:
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing c…
▽ More
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.
△ Less
Submitted 11 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
The Energy Budget in the Jet of High-frequency Peaked BL Lacertae Objects
Authors:
X. Z. Zhao,
H. Y. Yang,
Y. G. Zheng,
S. J. Kang
Abstract:
Energy equipartition and the energy budget in the jet are import issues for the radiation mechanism of blazars. Early work predominantly concentrated on flat-spectrum radio quasars and a limited number of BL Lacertae objects (BL Lacs). In this paper, we compile 348 high-frequency peaked BL Lac objects (HBLs) based on the catalog of active galactic nuclei (4LAC-DR3) from Fermi-LAT, and employ \text…
▽ More
Energy equipartition and the energy budget in the jet are import issues for the radiation mechanism of blazars. Early work predominantly concentrated on flat-spectrum radio quasars and a limited number of BL Lacertae objects (BL Lacs). In this paper, we compile 348 high-frequency peaked BL Lac objects (HBLs) based on the catalog of active galactic nuclei (4LAC-DR3) from Fermi-LAT, and employ \textit{JetSet} to fit the spectral energy distributions (SEDs) of these HBLs in the framework of the one-zone lepton model. We aim to determine whether the energy budget is reasonable and whether the energy equipartition is satisfied in HBLs. The results of the statistical analysis suggest that: (1) SEDs of HBLs can be reproduced well by using the one-zone lepton model; however it cannot achieve the energy equalization, and the relativistic electron energy density is far greater than the magnetic field energy density, $U_{e} \gtrsim100 U_{B}$; (2) the majority of the HBLs are located in the $t_{cool}$$<$$t_{dyn}$ region (where the horizontal coordinate represents the jet power of electrons, while the ordinate indicates the ratio between the dynamic time scale to the cooling timescale), and the jet kinetic power of HBLs is greater than the jet power of radiation; there is a very low radiation efficiency, we deduce that HBLs may have optically thin advection-dominated accretion flows; (3) the $\logε_{B}$ of HBLs is less than zero, which indicates that the jet kinetic power of HBLs is not affected by Poynting flux; (4) the relationships with $U_{e} >U_{Syn}\sim U_{B}$, $L_{e}\sim L_{p}>L_{B}\sim L_{rad}$ and $\logε_{e}>0.5$ are established. These relations indicate that most of the energy of HBLs is stored in the population of low-energy electrons.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Instruction Tuning with Retrieval-based Examples Ranking for Aspect-based Sentiment Analysis
Authors:
Guangmin Zheng,
Jin Wang,
Liang-Chih Yu,
Xuejie Zhang
Abstract:
Aspect-based sentiment analysis (ABSA) identifies sentiment information related to specific aspects and provides deeper market insights to businesses and organizations. With the emergence of large language models (LMs), recent studies have proposed using fixed examples for instruction tuning to reformulate ABSA as a generation task. However, the performance is sensitive to the selection of in-cont…
▽ More
Aspect-based sentiment analysis (ABSA) identifies sentiment information related to specific aspects and provides deeper market insights to businesses and organizations. With the emergence of large language models (LMs), recent studies have proposed using fixed examples for instruction tuning to reformulate ABSA as a generation task. However, the performance is sensitive to the selection of in-context examples; several retrieval methods are based on surface similarity and are independent of the LM generative objective. This study proposes an instruction learning method with retrieval-based example ranking for ABSA tasks. For each target sample, an LM was applied as a scorer to estimate the likelihood of the output given the input and a candidate example as the prompt, and training examples were labeled as positive or negative by ranking the scores. An alternating training schema is proposed to train both the retriever and LM. Instructional prompts can be constructed using high-quality examples. The LM is used for both scoring and inference, improving the generation efficiency without incurring additional computational costs or training difficulties. Extensive experiments on three ABSA subtasks verified the effectiveness of the proposed method, demonstrating its superiority over various strong baseline models. Code and data are released at https://github.com/zgMin/IT-RER-ABSA.
△ Less
Submitted 29 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding
Authors:
Yuhang Liu,
Boyi Sun,
Guixu Zheng,
Yishuo Wang,
Jing Wang,
Fei-Yue Wang
Abstract:
LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent…
▽ More
LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent LiDAR systems, which tightly integrate physical, digital, and social systems. To endow LiDAR systems with cognitive capabilities, we introduce the 3D visual grounding task into parallel LiDARs and present a novel human-computer interaction paradigm for LiDAR systems. We propose Talk2LiDAR, a large-scale benchmark dataset tailored for 3D visual grounding in autonomous driving. Additionally, we present a two-stage baseline approach and an efficient one-stage method named BEVGrounding, which significantly improves grounding accuracy by fusing coarse-grained sentence and fine-grained word embeddings with visual features. Our experiments on Talk2Car-3D and Talk2LiDAR datasets demonstrate the superior performance of BEVGrounding, laying a foundation for further research in this domain.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Real-time equilibrium reconstruction by neural network based on HL-3 tokamak
Authors:
Guohui Zheng,
Songfen Liu,
Zongyu Yang,
Rui Ma,
Xinwen Gong,
Ao Wang,
Shuo Wang,
Wulyu Zhong
Abstract:
A neural network model, EFITNN, has been developed capable of real-time magnetic equilibrium reconstruction based on HL-3 tokamak magnetic measurement signals. The model processes inputs from 68 channels of magnetic measurement data gathered from 1159 HL-3 experimental discharges, including plasma current, loop voltage, and the poloidal magnetic fields measured by equilibrium probes. The outputs o…
▽ More
A neural network model, EFITNN, has been developed capable of real-time magnetic equilibrium reconstruction based on HL-3 tokamak magnetic measurement signals. The model processes inputs from 68 channels of magnetic measurement data gathered from 1159 HL-3 experimental discharges, including plasma current, loop voltage, and the poloidal magnetic fields measured by equilibrium probes. The outputs of the model feature eight key plasma parameters, alongside high-resolution ($129\times129$) reconstructions of the toroidal current density $J_{\text P}$ and poloidal magnetic flux profiles $Ψ_{rz}$. Moreover, the network's architecture employs a multi-task learning structure, which enables the sharing of weights and mutual correction among different outputs, and lead to increase the model's accuracy by up to 32%. The performance of EFITNN demonstrates remarkable consistency with the offline EFIT, achieving average $R^2 = 0.941, 0.997$ and $0.959$ for eight plasma parameters, $Ψ_{rz}$ and $J_{\text P}$, respectively. The model's robust generalization capabilities are particularly evident in its successful predictions of quasi-snowflake (QSF) divertor configurations and its adept handling of data from shot numbers or plasma current intervals not previously encountered during training. Compared to numerical methods, EFITNN significantly enhances computational efficiency with average computation time ranging from 0.08ms to 0.45ms, indicating its potential utility in real-time isoflux control and plasma profile management.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Ptychographic non-line-of-sight imaging for depth-resolved visualization of hidden objects
Authors:
Pengming Song,
Qianhao Zhao,
Ruihai Wang,
Ninghe Liu,
Yingqi Qiang,
Tianbo Wang,
Xincheng Zhang,
Yi Zhang,
Liangcai Cao,
Guoan Zheng
Abstract:
Non-line-of-sight (NLOS) imaging enables the visualization of objects hidden from direct view, with applications in surveillance, remote sensing, and light detection and ranging. Here, we introduce a NLOS imaging technique termed ptychographic NLOS (pNLOS), which leverages coded ptychography for depth-resolved imaging of obscured objects. Our approach involves scanning a laser spot on a wall to il…
▽ More
Non-line-of-sight (NLOS) imaging enables the visualization of objects hidden from direct view, with applications in surveillance, remote sensing, and light detection and ranging. Here, we introduce a NLOS imaging technique termed ptychographic NLOS (pNLOS), which leverages coded ptychography for depth-resolved imaging of obscured objects. Our approach involves scanning a laser spot on a wall to illuminate the hidden objects in an obscured region. The reflected wavefields from these objects then travel back to the wall, get modulated by the wall's complex-valued profile, and the resulting diffraction patterns are captured by a camera. By modulating the object wavefields, the wall surface serves the role of the coded layer as in coded ptychography. As we scan the laser spot to different positions, the reflected object wavefields on the wall translate accordingly, with the shifts varying for objects at different depths. This translational diversity enables the acquisition of a set of modulated diffraction patterns referred to as a ptychogram. By processing the ptychogram, we recover both the objects at different depths and the modulation profile of the wall surface. Experimental results demonstrate high-resolution, high-fidelity imaging of hidden objects, showcasing the potential of pNLOS for depth-aware vision beyond the direct line of sight.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling
Authors:
Guangmin Zheng,
Jin Wang,
Xiaobing Zhou,
Xuejie Zhang
Abstract:
Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improv…
▽ More
Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Strain-induced long-range charge-density wave order in the optimally doped Bi$_2$Sr$_{2-x}$La$_x$CuO$_{6}$ superconductor
Authors:
Shinji Kawasaki,
Nao Tsukuda,
Chengtian Lin,
Guo-qing Zheng
Abstract:
The mechanism of high-temperature superconductivity in copper oxides (cuprate) remains elusive, with the pseudogap phase considered a potential factor. Recent attention has focused on a long-range symmetry-broken charge-density wave (CDW) order in the underdoped regime, induced by strong magnetic fields. Here by $^{63,65}$Cu-nuclear magnetic resonance, we report the discovery of a long-range CDW o…
▽ More
The mechanism of high-temperature superconductivity in copper oxides (cuprate) remains elusive, with the pseudogap phase considered a potential factor. Recent attention has focused on a long-range symmetry-broken charge-density wave (CDW) order in the underdoped regime, induced by strong magnetic fields. Here by $^{63,65}$Cu-nuclear magnetic resonance, we report the discovery of a long-range CDW order in the optimally doped Bi$_2$Sr$_{2-x}$La$_x$CuO$_6$ superconductor, induced by in-plane strain exceeding $|$$\varepsilon$$|$ = 0.15 %, which deliberately breaks the crystal symmetry of the CuO$_2$ plane. We find that compressive/tensile strains reduce superconductivity but enhance CDW, leaving superconductivity to coexist with CDW. The findings show that a long-range CDW order is an underlying hidden order in the pseudogap state, not limited to the underdoped regime, becoming apparent under strain. Our result sheds light on the intertwining of various orders in the cuprates.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Quasiparticle and superfluid dynamics in Magic-Angle Graphene
Authors:
Elías Portolés,
Marta Perego,
Pavel A. Volkov,
Mathilde Toschini,
Yana Kemna,
Alexandra Mestre-Torà,
Giulia Zheng,
Artem O. Denisov,
Folkert K. de Vries,
Peter Rickhaus,
Takashi Taniguchi,
Kenji Watanabe,
J. H. Pixley,
Thomas Ihn,
Klaus Ensslin
Abstract:
Magic-Angle Twisted Bilayer Graphene shows a wide range of correlated phases which are electrostatically tunable. Despite a growing knowledge of the material, there is yet no consensus on the microscopic mechanisms driving its superconducting phase. In particular, elucidating the symmetry and formation mechanism of the superconducting phase may provide key insights for the understanding of unconve…
▽ More
Magic-Angle Twisted Bilayer Graphene shows a wide range of correlated phases which are electrostatically tunable. Despite a growing knowledge of the material, there is yet no consensus on the microscopic mechanisms driving its superconducting phase. In particular, elucidating the symmetry and formation mechanism of the superconducting phase may provide key insights for the understanding of unconventional, strongly coupled and topological superconductivity. A major obstacle to progress in this direction is that key thermodynamic properties, such as specific heat, electron-phonon coupling and superfluid stiffness, are extremely challenging to measure due to the 2D nature of the material and its relatively low energy scales. Here, we use a gate-defined, radio frequency-biased, Josephson junction to probe the electronic dynamics of magic-angle twisted bilayer graphene (MATBG). We reveal both the electronic quasiparticle dynamics, driven by their thermalization through phonon scattering, as well as the condensate dynamics, driven by the inertia of Cooper pairs. From these properties we recover the evolution of thermalization rates, and the superfluid stiffness across the phase diagram. Our findings favor an anisotropic or nodal pairing state and allow to estimate the strength of electron-phonon coupling. These results contribute to understanding the underlying mechanisms of superconductivity in MATBG while establishing an easy-to-implement method for characterizing thermal and superfluid properties of superconducting 2D materials.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
GeoViz: A Multi-View Visualization Platform for Spatio-temporal Knowledge Graph
Authors:
Jianping Zhou,
Junhao Li,
Guanjie Zheng,
Yunqiang Zhu,
Xinbing Wang,
Chenghu Zhou
Abstract:
In this paper, we propose a multi-view visualization technology for spatio-temporal knowledge graph(STKG), which utilizes three distinct perspectives: knowledge tree, knowledge net, and knowledge map, to facilitate a comprehensive analysis of the STKG. The knowledge tree enables the visualization of hierarchical interrelation within the STKG, while the knowledge net elucidates semantic relationshi…
▽ More
In this paper, we propose a multi-view visualization technology for spatio-temporal knowledge graph(STKG), which utilizes three distinct perspectives: knowledge tree, knowledge net, and knowledge map, to facilitate a comprehensive analysis of the STKG. The knowledge tree enables the visualization of hierarchical interrelation within the STKG, while the knowledge net elucidates semantic relationships among knowledge entities. Additionally, the knowledge map displays spatial and temporal distributions via spatial maps and time axes, respectively. Our visualization technology addresses the limitations inherent in single-view approaches and the deficiency of interaction in spatio-temporal perspectives evident in existing visualization methods. Moreover, we have encapsulated this technology within an integrated, open-source platform named GeoViz. A demo video of GeoViz can be accessed at https://anonymous.4open.science/r/GeoViz.
△ Less
Submitted 29 April, 2024;
originally announced May 2024.
-
Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation
Authors:
Guangtao Zheng,
Wenqian Ye,
Aidong Zhang
Abstract:
Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and pr…
▽ More
Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and propose a self-guided spurious correlation mitigation framework. Our framework automatically constructs fine-grained training labels tailored for a classifier obtained with empirical risk minimization to improve its robustness against spurious correlations. The fine-grained training labels are formulated with different prediction behaviors of the classifier identified in a novel spuriousness embedding space. We construct the space with automatically detected conceptual attributes and a novel spuriousness metric which measures how likely a class-attribute correlation is exploited for predictions. We demonstrate that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori and outperforms prior methods on five real-world datasets.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Simultaneously Cloaking Electric and Hydrodynamic Fields via Electro-osmosis
Authors:
Hongyu Liu,
Zhi-Qiang Miao,
Guang-Hui Zheng
Abstract:
In this paper, we develop a general mathematical framework for the electro-osmosis problem to design simultaneous microscale electric and hydrodynamic cloaking in a Hele-Shaw configuration. A novel approach to achieving simultaneously cloaking both the electric and flow fields through a combination of scattering-cancellation technology and an electro-osmosis effect is proposed. In the design, the…
▽ More
In this paper, we develop a general mathematical framework for the electro-osmosis problem to design simultaneous microscale electric and hydrodynamic cloaking in a Hele-Shaw configuration. A novel approach to achieving simultaneously cloaking both the electric and flow fields through a combination of scattering-cancellation technology and an electro-osmosis effect is proposed. In the design, the electric field is manipulated with scattering-cancellation technology while the pressure with electro-osmosis effect. As proof of this concept, the perfect electric and hydrodynamic cloaking conditions are derived for the cloaks with the cross-sectional shape being annulus or confocal ellipses using the layer potential techniques. Furthermore, we also propose an optimization scheme for the design of approximate cloaks within general geometries and prove the well-posedness of the optimization problem. In particular, the conditions that can ensure the simultaneous occurrence of approximate cloaks for general geometries are also established. Our theoretical findings are validated by a variety of numerical results and guide efficiently designing electric-related multiphysics cloaking.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method
Authors:
Zhixin Guo,
Tao Wang,
Chaoyang Wang,
Jianping Zhou,
Guanjie Zheng,
Xinbing Wang,
Chenghu Zhou
Abstract:
The rare earth elements Sm and Nd significantly address fundamental questions about crustal growth, such as its spatiotemporal evolution and the interplay between orogenesis and crustal accretion. Their relative immobility during high-grade metamorphism makes the Sm-Nd isotopic system crucial for inferring crustal formation times. Historically, data have been disseminated sporadically in the scien…
▽ More
The rare earth elements Sm and Nd significantly address fundamental questions about crustal growth, such as its spatiotemporal evolution and the interplay between orogenesis and crustal accretion. Their relative immobility during high-grade metamorphism makes the Sm-Nd isotopic system crucial for inferring crustal formation times. Historically, data have been disseminated sporadically in the scientific literature due to complicated and costly sampling procedures, resulting in a fragmented knowledge base. However, the scattering of critical geoscience data across multiple publications poses significant challenges regarding human capital and time. In response, we present an automated tabular extraction method for harvesting tabular geoscience data. We collect 10,624 Sm-Nd data entries from 9,138 tables in over 20,000 geoscience publications using this method. We manually selected 2,118 data points from it to supplement our previously constructed global Sm-Nd dataset, increasing its sample count by over 20\%. Our automatic data collection methodology enhances the efficiency of data acquisition processes spanning various scientific domains. Furthermore, the constructed Sm-Nd isotopic dataset should motivate the research of classifying global orogenic belts.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Robust Finite-time Stabilization of Linear Systems with Limited State Quantization
Authors:
Yu Zhou,
Andrey Polyakov,
Gang Zheng
Abstract:
This paper investigates the robust asymptotic stabilization of a linear time-invariant (LTI) system by a static feedback with a static state quantization. It is shown that the controllable LTI system can be stabilized to zero in a finite time by means of a nonlinear feedback with a quantizer having a limited (finite) number of values (quantization seeds) even when all parameters of the controller…
▽ More
This paper investigates the robust asymptotic stabilization of a linear time-invariant (LTI) system by a static feedback with a static state quantization. It is shown that the controllable LTI system can be stabilized to zero in a finite time by means of a nonlinear feedback with a quantizer having a limited (finite) number of values (quantization seeds) even when all parameters of the controller and the quantizer are time-invariant. The control design is based on generalized homogeneity. A homogeneous spherical quantizer is introduced. The static homogeneous feedback is shown to be local (or global) finite-time stabilizer for the linear system (dependently of the system matrix). The tuning rules for both the quantizer and the feedback law are obtained in the form of Linear Matrix Inequalities (LMIs). The closed-loop system is proven to be robust with respect to some bounded matched and vanishing mismatched perturbations. Theoretical results are supported by numerical simulations. \
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion
Authors:
Guiyong Zheng,
Jinqi Jiang,
Chen Feng,
Shaojie Shen,
Boyu Zhou
Abstract:
Surface prediction and completion have been widely studied in various applications. Recently, research in surface completion has evolved from small objects to complex large-scale scenes. As a result, researchers have begun increasing the volume of data and leveraging a greater variety of data modalities including rendered RGB images, descriptive texts, depth images, etc, to enhance algorithm perfo…
▽ More
Surface prediction and completion have been widely studied in various applications. Recently, research in surface completion has evolved from small objects to complex large-scale scenes. As a result, researchers have begun increasing the volume of data and leveraging a greater variety of data modalities including rendered RGB images, descriptive texts, depth images, etc, to enhance algorithm performance. However, existing datasets suffer from a deficiency in the amounts of scene-level models along with the corresponding multi-modal information. Therefore, a method to scale the datasets and generate multi-modal information in them efficiently is essential. To bridge this research gap, we propose MASSTAR: a Multi-modal lArge-scale Scene dataset with a verSatile Toolchain for surfAce pRediction and completion. We develop a versatile and efficient toolchain for processing the raw 3D data from the environments. It screens out a set of fine-grained scene models and generates the corresponding multi-modal data. Utilizing the toolchain, we then generate an example dataset composed of over a thousand scene-level models with partial real-world data added. We compare MASSTAR with the existing datasets, which validates its superiority: the ability to efficiently extract high-quality models from complex scenarios to expand the dataset. Additionally, several representative surface completion algorithms are benchmarked on MASSTAR, which reveals that existing algorithms can hardly deal with scene-level completion. We will release the source code of our toolchain and the dataset. For more details, please see our project page at https://sysu-star.github.io/MASSTAR.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Dual-Channel Multiplex Graph Neural Networks for Recommendation
Authors:
Xiang Li,
Chaofan Fu,
Zhongying Zhao,
Guanjie Zheng,
Chao Huang,
Junyu Dong,
Yanwei Yu
Abstract:
Efficient recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interaction relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping pla…
▽ More
Efficient recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interaction relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping platforms. Nevertheless, these approaches still grapple with two significant shortcomings: (1) Insufficient modeling and exploitation of the impact of various behavior patterns formed by multiplex relations between users and items on representation learning, and (2) ignoring the effect of different relations in the behavior patterns on the target relation in recommender system scenarios. In this study, we introduce a novel recommendation framework, Dual-Channel Multiplex Graph Neural Network (DCMGNN), which addresses the aforementioned challenges. It incorporates an explicit behavior pattern representation learner to capture the behavior patterns composed of multiplex user-item interaction relations, and includes a relation chain representation learning and a relation chain-aware encoder to discover the impact of various auxiliary relations on the target relation, the dependencies between different relations, and mine the appropriate order of relations in a behavior pattern. Extensive experiments on three real-world datasets demonstrate that our \model surpasses various state-of-the-art recommendation methods. It outperforms the best baselines by 10.06\% and 12.15\% on average across all datasets in terms of R@10 and N@10 respectively.
△ Less
Submitted 29 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
NetTrack: Tracking Highly Dynamic Objects with a Net
Authors:
Guangze Zheng,
Shijie Lin,
Haobo Zuo,
Changhong Fu,
Jia Pan
Abstract:
The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To addr…
▽ More
The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To address this problem, this work proposes NetTrack, an efficient, generic, and affordable tracking framework to introduce fine-grained learning that is robust to dynamicity. Specifically, NetTrack constructs a dynamicity-aware association with a fine-grained Net, leveraging point-level visual cues. Correspondingly, a fine-grained sampler and matching method have been incorporated. Furthermore, NetTrack learns object-text correspondence for fine-grained localization. To evaluate MOT in extremely dynamic open-world scenarios, a bird flock tracking (BFT) dataset is constructed, which exhibits high dynamicity with diverse species and open-world scenarios. Comprehensive evaluation on BFT validates the effectiveness of fine-grained learning on object dynamicity, and thorough transfer experiments on challenging open-world benchmarks, i.e., TAO, TAO-OW, AnimalTrack, and GMOT-40, validate the strong generalization ability of NetTrack even without finetuning. Project page: https://george-zhuang.github.io/nettrack/.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Graph Data Condensation via Self-expressive Graph Structure Reconstruction
Authors:
Zhanyu Liu,
Chaolv Zeng,
Guanjie Zheng
Abstract:
With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. Ho…
▽ More
With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. However, existing methods concentrate either on optimizing node features exclusively or endeavor to independently learn node features and the graph structure generator. They could not explicitly leverage the information of the original graph structure and failed to construct an interpretable graph structure for the synthetic dataset. To address these issues, we introduce a novel framework named \textbf{G}raph Data \textbf{C}ondensation via \textbf{S}elf-expressive Graph Structure \textbf{R}econstruction (\textbf{GCSR}). Our method stands out by (1) explicitly incorporating the original graph structure into the condensing process and (2) capturing the nuanced interdependencies between the condensed nodes by reconstructing an interpretable self-expressive graph structure. Extensive experiments and comprehensive analysis validate the efficacy of the proposed method across diverse GNN models and datasets. Our code is available at \url{https://github.com/zclzcl0223/GCSR}.
△ Less
Submitted 7 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Dataset Condensation for Time Series Classification via Dual Domain Matching
Authors:
Zhanyu Liu,
Ke Hao,
Guanjie Zheng,
Yanwei Yu
Abstract:
Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has…
▽ More
Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has comparable performance to the full real dataset in downstream tasks such as classification. However, previous methods are primarily designed for image and graph datasets, and directly adapting them to the time series dataset leads to suboptimal performance due to their inability to effectively leverage the rich information inherent in time series data, particularly in the frequency domain. In this paper, we propose a novel framework named Dataset \textit{\textbf{Cond}}ensation for \textit{\textbf{T}}ime \textit{\textbf{S}}eries \textit{\textbf{C}}lassification via Dual Domain Matching (\textbf{CondTSC}) which focuses on the time series classification dataset condensation task. Different from previous methods, our proposed framework aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains. Specifically, CondTSC incorporates multi-view data augmentation, dual domain training, and dual surrogate objectives to enhance the dataset condensation process in the time and frequency domains. Through extensive experiments, we demonstrate the effectiveness of our proposed framework, which outperforms other baselines and learns a condensed synthetic dataset that exhibits desirable characteristics such as conforming to the distribution of the original data.
△ Less
Submitted 10 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
AceMap: Knowledge Discovery through Academic Graph
Authors:
Xinbing Wang,
Luoyi Fu,
Xiaoying Gan,
Ying Wen,
Guanjie Zheng,
Jiaxin Ding,
Liyao Xiang,
Nanyang Ye,
Meng Jin,
Shiyu Liang,
Bin Lu,
Haiwen Wang,
Yi Xu,
Cheng Deng,
Shao Zhang,
Huquan Kang,
Xingli Wang,
Qi Li,
Zhixin Guo,
Jiexing Qi,
Pan Liu,
Yuyang Ren,
Lyuwen Wu,
Jungang Yang,
Jianping Zhou
, et al. (1 additional authors not shown)
Abstract:
The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publicatio…
▽ More
The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications. The representation of heterogeneous graphs and the effective measurement, analysis, and mining of such graphs pose significant challenges. To address these challenges, we present AceMap, an academic system designed for knowledge discovery through academic graph. We present advanced database construction techniques to build the comprehensive AceMap database with large-scale academic entities that contain rich visual, textual, and numerical information. AceMap also employs innovative visualization, quantification, and analysis methods to explore associations and logical relationships among academic entities. AceMap introduces large-scale academic network visualization techniques centered on nebular graphs, providing a comprehensive view of academic networks from multiple perspectives. In addition, AceMap proposes a unified metric based on structural entropy to quantitatively measure the knowledge content of different academic entities. Moreover, AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas through citation relationships and concept co-occurrence, and generating concise summaries informed by this evolutionary process. In addition, AceMap uses machine reading methods to generate potential new ideas at the intersection of different fields. Exploring the integration of large language models and knowledge graphs is a promising direction for future research in idea evolution. Please visit \url{https://www.acemap.info} for further exploration.
△ Less
Submitted 14 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Bayesian inference via geometric optics approximation
Authors:
Zejun Sun,
Guang-Hui Zheng
Abstract:
Markov chain Monte Carlo (MCMC) simulations have been widely used to generate samples from the complex posterior distribution in Bayesian inferences. However, these simulations often require multiple computations of the forward model in the likelihood function for each drawn sample. This computational burden renders MCMC sampling impractical when the forward model is computationally expensive, suc…
▽ More
Markov chain Monte Carlo (MCMC) simulations have been widely used to generate samples from the complex posterior distribution in Bayesian inferences. However, these simulations often require multiple computations of the forward model in the likelihood function for each drawn sample. This computational burden renders MCMC sampling impractical when the forward model is computationally expensive, such as in the case of partial differential equation models. In this paper, we propose a novel sampling approach called the geometric optics approximation method (GOAM) for Bayesian inverse problems, which entirely circumvents the need for MCMC simulations. Our method is rooted in the problem of reflector shape design, which focuses on constructing a reflecting surface that redirects rays from a source, with a predetermined density, towards a target domain while achieving a desired density distribution. The key idea is to consider the unnormalized Bayesian posterior as the density on the target domain within the optical system and define a geometric optics approximation measure with respect to posterior by a reflecting surface. Consequently, once such a reflecting surface is obtained, we can utilize it to draw an arbitrary number of independent and uncorrelated samples from the posterior measure for Bayesian inverse problems. In theory, we have shown that the geometric optics approximation measure is well-posed. The efficiency and robustness of our proposed sampler, employing the geometric optics approximation method, are demonstrated through several numerical examples provided in this paper.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
The Fermi-LAT view of the changing-look blazar OQ 334
Authors:
S. S. Ren,
R. X. Zhou,
Y. G. Zheng,
S. J. Kang,
Q. Wu
Abstract:
Context.Unusually, there are still certain characteristics of the changing-look (CL) active galactic nuclei (AGNs) that remain undetected.Consequently,the trigger mechanism behind the CL phenomenon observed in partial AGNs remains unknown.Aims.We explore the light curve and spectral energy distribution (SED) of the CL blazar OQ 334 as obtained by Fermi-LAT. Methods. By examining the variability of…
▽ More
Context.Unusually, there are still certain characteristics of the changing-look (CL) active galactic nuclei (AGNs) that remain undetected.Consequently,the trigger mechanism behind the CL phenomenon observed in partial AGNs remains unknown.Aims.We explore the light curve and spectral energy distribution (SED) of the CL blazar OQ 334 as obtained by Fermi-LAT. Methods. By examining the variability of the equivalent width (EW), we categorise the Fermi-LAT light curves of OQ 334 during the epoch of MJD 54628-58677 into seven distinct epochs, including the flat spectrum radio quasar (FSRQ) state, the transition state, and the BL Lac state. We obtained both a Fermi-LAT SED and a multi-wavelength SED for each of these distinct epochs. Results. The source exhibits a transformation from a quiescent state to a highly active state, as evidenced by the variability of the EW. The multi-wavelength SEDs display a prominent external Compton characteristic, even though the Fermi-LAT SED reveals both a FSRQ and a BL Lac state across the seven different epochs. To gain further insights, we employed a leptonic model that takes into account the soft photon fields originating from both synchrotron radiation and the external environment. By simulating the multi-wavelength SEDs for each epoch, we uncover the following results. Firstly, the energy density of the external photon fields evolves in an oscillatory manner over the seven different epochs. Also, the energy density of the external photon fields in the BL Lac state is lower than that in the FSRQ state.Conclusions. These findings suggest that the CL blazar represents a unique phase in the blazar sequence.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Spurious Correlations in Machine Learning: A Survey
Authors:
Wenqian Ye,
Guangtao Zheng,
Xu Cao,
Yunsheng Ma,
Aidong Zhang
Abstract:
Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's genera…
▽ More
Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to aid future research. The paper concludes with a discussion of the recent advancements and future challenges in this field, aiming to provide valuable insights for researchers in the related domains.
△ Less
Submitted 16 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Optimal Rejection of Bounded Perturbations in Linear Leader-Following Consensus Protocol: Method Invariant Ellipsoid
Authors:
Siyuan Wang,
Andrey Polyakov,
Min Li,
Gang Zheng,
Driss Boutat
Abstract:
The objective of the invariant ellipsoid method is to minimize the smallest invariant and attractive set of a linear control system operating under the influence of bounded external disturbances. In this paper, this method is extended into the leader-following consensus problem. Initially, a linear control protocol is designed for the Multi-agent System without disturbances. Subsequently, in the p…
▽ More
The objective of the invariant ellipsoid method is to minimize the smallest invariant and attractive set of a linear control system operating under the influence of bounded external disturbances. In this paper, this method is extended into the leader-following consensus problem. Initially, a linear control protocol is designed for the Multi-agent System without disturbances. Subsequently, in the presence of bounded disturbances, by employing a similar linear control protocol, a necessary and sufficient condition is introduced to derive the optimal control parameters for the MAS such that the state of followers converge and remain in an minimal invariant ellipsoid around the state of the leader.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
A Lattice-Reduction Aided Vector Perturbation Precoder Relying on Quantum Annealing
Authors:
Samuel Winter,
Yangyishi Zhang,
Gan Zheng,
Lajos Hanzo
Abstract:
Quantum annealing (QA) is proposed for vector perturbation precoding (VPP) in multiple input multiple output (MIMO) communications systems. The mathematical framework of VPP is presented, outlining the problem formulation and the benefits of lattice reduction algorithms. Lattice reduction aided quantum vector perturbation (LRAQVP) is designed by harnessing physical quantum hardware, and the optimi…
▽ More
Quantum annealing (QA) is proposed for vector perturbation precoding (VPP) in multiple input multiple output (MIMO) communications systems. The mathematical framework of VPP is presented, outlining the problem formulation and the benefits of lattice reduction algorithms. Lattice reduction aided quantum vector perturbation (LRAQVP) is designed by harnessing physical quantum hardware, and the optimization of hardware parameters is discussed. We observe a 5dB gain over lattice reduction zero forcing precoding (LRZFP), which behaves similarly to a quantum annealing algorithm operating without a lattice reduction stage. The proposed algorithm is also shown to approach the performance of a sphere encoder, which exhibits an exponentially escalating complexity.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
Authors:
Yixin Ou,
Ningyu Zhang,
Honghao Gui,
Ziwen Xu,
Shuofei Qiao,
Yida Xue,
Runnan Fang,
Kangwei Liu,
Lei Li,
Zhen Bi,
Guozhou Zheng,
Huajun Chen
Abstract:
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist am…
▽ More
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.
△ Less
Submitted 23 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
A Low-Cost Multi-Band Waveform Security Framework in Resource-Constrained Communications
Authors:
Tongyang Xu,
Zhongxiang Wei,
Tianhua Xu,
Gan Zheng
Abstract:
Traditional physical layer secure beamforming is achieved via precoding before signal transmission using channel state information (CSI). However, imperfect CSI will compromise the performance with imperfect beamforming and potential information leakage. In addition, multiple RF chains and antennas are needed to support the narrow beam generation, which complicates hardware implementation and is n…
▽ More
Traditional physical layer secure beamforming is achieved via precoding before signal transmission using channel state information (CSI). However, imperfect CSI will compromise the performance with imperfect beamforming and potential information leakage. In addition, multiple RF chains and antennas are needed to support the narrow beam generation, which complicates hardware implementation and is not suitable for resource-constrained Internet-of-Things (IoT) devices. Moreover, with the advancement of hardware and artificial intelligence (AI), low-cost and intelligent eavesdropping to wireless communications is becoming increasingly detrimental. In this paper, we propose a multi-carrier based multi-band waveform-defined security (WDS) framework, independent from CSI and RF chains, to defend against AI eavesdropping. Ideally, the continuous variations of sub-band structures lead to an infinite number of spectral features, which can potentially prevent brute-force eavesdropping. Sub-band spectral pattern information is efficiently constructed at legitimate users via a proposed chaotic sequence generator. A novel security metric, termed signal classification accuracy (SCA), is used to evaluate the security robustness under AI eavesdropping. Communication error probability and complexity are also investigated to show the reliability and practical capability of the proposed framework. Finally, compared to traditional secure beamforming techniques, the proposed multi-band WDS framework reduces power consumption by up to six times.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Multi-scale Traffic Pattern Bank for Cross-city Few-shot Traffic Forecasting
Authors:
Zhanyu Liu,
Guanjie Zheng,
Yanwei Yu
Abstract:
Traffic forecasting is crucial for intelligent transportation systems (ITS), aiding in efficient resource allocation and effective traffic control. However, its effectiveness often relies heavily on abundant traffic data, while many cities lack sufficient data due to limited device support, posing a significant challenge for traffic forecasting. Recognizing this challenge, we have made a noteworth…
▽ More
Traffic forecasting is crucial for intelligent transportation systems (ITS), aiding in efficient resource allocation and effective traffic control. However, its effectiveness often relies heavily on abundant traffic data, while many cities lack sufficient data due to limited device support, posing a significant challenge for traffic forecasting. Recognizing this challenge, we have made a noteworthy observation: traffic patterns exhibit similarities across diverse cities. Building on this key insight, we propose a solution for the cross-city few-shot traffic forecasting problem called Multi-scale Traffic Pattern Bank (MTPB). Primarily, MTPB initiates its learning process by leveraging data-rich source cities, effectively acquiring comprehensive traffic knowledge through a spatial-temporal-aware pre-training process. Subsequently, the framework employs advanced clustering techniques to systematically generate a multi-scale traffic pattern bank derived from the learned knowledge. Next, the traffic data of the data-scarce target city could query the traffic pattern bank, facilitating the aggregation of meta-knowledge. This meta-knowledge, in turn, assumes a pivotal role as a robust guide in subsequent processes involving graph reconstruction and forecasting. Empirical assessments conducted on real-world traffic datasets affirm the superior performance of MTPB, surpassing existing methods across various categories and exhibiting numerous attributes conducive to the advancement of cross-city few-shot forecasting methodologies. The code is available in https://github.com/zhyliu00/MTPB.
△ Less
Submitted 26 February, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
MouSi: Poly-Visual-Expert Vision-Language Models
Authors:
Xiaoran Fan,
Tao Ji,
Changhao Jiang,
Shuo Li,
Senjie Jin,
Sirui Song,
Junke Wang,
Boyang Hong,
Lu Chen,
Guodong Zheng,
Ming Zhang,
Caishuang Huang,
Rui Zheng,
Zhiheng Xi,
Yuhao Zhou,
Shihan Dou,
Junjie Ye,
Hang Yan,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability…
▽ More
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability of VLMs. This paper proposes the use of ensemble experts technique to synergizes the capabilities of individual visual encoders, including those skilled in image-text matching, OCR, image segmentation, etc. This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs. In addition, we explore different positional encoding schemes to alleviate the waste of positional encoding caused by lengthy image feature sequences, effectively addressing the issue of position overflow and length limitations. For instance, in our implementation, this technique significantly reduces the positional occupancy in models like SAM, from a substantial 4096 to a more efficient and manageable 64 or even down to 1. Experimental results demonstrate that VLMs with multiple experts exhibit consistently superior performance over isolated visual encoders and mark a significant performance boost as more experts are integrated. We have open-sourced the training code used in this report. All of these resources can be found on our project website.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Ptycho-endoscopy on a lensless ultrathin fiber bundle tip
Authors:
Pengming Song,
Ruihai Wang,
Lars Loetgering,
Jia Liu,
Peter Vouras,
Yujin Lee,
Shaowei Jiang,
Bin Feng,
Andrew Maiden,
Changhuei Yang,
Guoan Zheng
Abstract:
Synthetic aperture radar (SAR) utilizes an aircraft-carried antenna to emit electromagnetic pulses and detect the returning echoes. As the aircraft travels across a designated area, it synthesizes a large virtual aperture to improve image resolution. Inspired by SAR, we introduce synthetic aperture ptycho-endoscopy (SAPE) for micro-endoscopic imaging beyond the diffraction limit. SAPE operates by…
▽ More
Synthetic aperture radar (SAR) utilizes an aircraft-carried antenna to emit electromagnetic pulses and detect the returning echoes. As the aircraft travels across a designated area, it synthesizes a large virtual aperture to improve image resolution. Inspired by SAR, we introduce synthetic aperture ptycho-endoscopy (SAPE) for micro-endoscopic imaging beyond the diffraction limit. SAPE operates by hand-holding a lensless fiber bundle tip to record coherent diffraction patterns from specimens. The fiber cores at the distal tip modulate the diffracted wavefield within a confined area, emulating the role of the 'airborne antenna' in SAR. The handheld operation introduces positional shifts to the tip, analogous to the aircraft's movement. These shifts facilitate the acquisition of a ptychogram and synthesize a large virtual aperture extending beyond the bundle's physical limit. We mitigate the influences of hand motion and fiber bending through a low-rank spatiotemporal decomposition of the bundle's modulation profile. Our tests demonstrate the ability to resolve a 548-nm linewidth on a resolution target. The achieved space-bandwidth product is ~1.1 million effective pixels, representing a 36-fold increase compared to that of the original fiber bundle. Furthermore, SAPE's refocusing capability enables imaging over an extended depth of field exceeding 2 cm. The aperture synthesizing process in SAPE surpasses the diffraction limit set by the probe's maximum collection angle, opening new opportunities for both fiber-based and distal-chip endoscopy in applications such as medical diagnostics and industrial inspection.
△ Less
Submitted 6 July, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Emergence of cooperation under punishment: A reinforcement learning perspective
Authors:
Chenyang Zhao,
Guozhong Zheng,
Chun Zhang,
Jiqiang Zhang,
Li Chen
Abstract:
Punishment is a common tactic to sustain cooperation and has been extensively studied for a long time. While most of previous game-theoretic work adopt the imitation learning where players imitate the strategies who are better off, the learning logic in the real world is often much more complex. In this work, we turn to the reinforcement learning paradigm, where individuals make their decisions ba…
▽ More
Punishment is a common tactic to sustain cooperation and has been extensively studied for a long time. While most of previous game-theoretic work adopt the imitation learning where players imitate the strategies who are better off, the learning logic in the real world is often much more complex. In this work, we turn to the reinforcement learning paradigm, where individuals make their decisions based upon their past experience and long-term returns. Specifically, we investigate the Prisoners' dilemma game with Q-learning algorithm, and cooperators probabilistically pose punishment on defectors in their neighborhood. Interestingly, we find that punishment could lead to either continuous or discontinuous cooperation phase transitions, and the nucleation process of cooperation clusters is reminiscent of the liquid-gas transition. The uncovered first-order phase transition indicates that great care needs to be taken when implementing the punishment compared to the continuous scenario.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Authors:
Chaochao Lu,
Chen Qian,
Guodong Zheng,
Hongxing Fan,
Hongzhi Gao,
Jie Zhang,
Jing Shao,
Jingyi Deng,
Jinlan Fu,
Kexin Huang,
Kunchang Li,
Lijun Li,
Limin Wang,
Lu Sheng,
Meiqi Chen,
Ming Zhang,
Qibing Ren,
Sirui Chen,
Tao Gui,
Wanli Ouyang,
Yali Wang,
Yan Teng,
Yaru Wang,
Yi Wang,
Yinan He
, et al. (11 additional authors not shown)
Abstract:
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde…
▽ More
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications.
△ Less
Submitted 29 January, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Emergence of anti-coordinated patterns in snowdrift game by reinforcement learning
Authors:
Zhen-Wei Ding,
Ji-Qiang Zhang,
Guo-Zhong Zheng,
Wei-Ran Cai,
Chao-Ran Cai,
Li Chen,
Xu-Ming Wang
Abstract:
Patterns by self-organization in nature have garnered significant interest in a range of disciplines due to their intriguing structures. In the context of the snowdrift game (SDG), which is considered as an anti-coordination game, but the anti-coordination patterns are counterintuitively rare. In the work, we introduce a model called the Two-Agents, Two-Action Reinforcement Learning Evolutionary G…
▽ More
Patterns by self-organization in nature have garnered significant interest in a range of disciplines due to their intriguing structures. In the context of the snowdrift game (SDG), which is considered as an anti-coordination game, but the anti-coordination patterns are counterintuitively rare. In the work, we introduce a model called the Two-Agents, Two-Action Reinforcement Learning Evolutionary Game ($2\times 2$ RLEG), and apply it to the SDG on regular lattices. We uncover intriguing phenomena in the form of Anti-Coordinated domains (AC-domains), where different frustration regions are observed and continuous phase transitions at the boundaries are identified. To understand the underlying mechanism, we develop a perturbation theory to analyze the stability of different AC-domains. Our theory accurately partitions the parameter space into non-anti-coordinated, anti-coordinated, and mixed areas, and captures their dependence on the learning parameters. Lastly, abnormal scenarios with a large learning rate and a large discount factor that deviate from the theory are investigated by examining the growth and nucleation of AC-domains. Our work provides insights into the emergence of spatial patterns in nature, and contributes to the development of theory for analysing their structural complexities.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Optimal higher regularity for biharmonic maps via quantitative stratification
Authors:
Chang-Yu Guo,
Gui-Chun Jiang,
Chang-Lin Xiang,
Gao-Feng Zheng
Abstract:
This little note is devoted to refining the almost optimal regularity results of Breiner and Lamm \cite{Breiner-Lamm-2015} on minimizing and stationary biharmonic maps via the powerful quantitative stratification method introduced by Cheeger and Naber \cite{Cheeger-Naber-2013} and further developed by Naber and Valtorta \cite{Naber-V-2017,Naber-V-2018} for harmonic maps. In particular, we obtain a…
▽ More
This little note is devoted to refining the almost optimal regularity results of Breiner and Lamm \cite{Breiner-Lamm-2015} on minimizing and stationary biharmonic maps via the powerful quantitative stratification method introduced by Cheeger and Naber \cite{Cheeger-Naber-2013} and further developed by Naber and Valtorta \cite{Naber-V-2017,Naber-V-2018} for harmonic maps. In particular, we obtain an optimal regularity results for minimizing biharmonic maps.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Interferometric Single-Shot Parity Measurement in an InAs-Al Hybrid Device
Authors:
Morteza Aghaee,
Alejandro Alcaraz Ramirez,
Zulfi Alam,
Rizwan Ali,
Mariusz Andrzejczuk,
Andrey Antipov,
Mikhail Astafev,
Amin Barzegar,
Bela Bauer,
Jonathan Becker,
Umesh Kumar Bhaskar,
Alex Bocharov,
Srini Boddapati,
David Bohn,
Jouri Bommer,
Leo Bourdet,
Arnaud Bousquet,
Samuel Boutin,
Lucas Casparis,
Benjamin James Chapman,
Sohail Chatoor,
Anna Wulff Christensen,
Cassandra Chua,
Patrick Codd,
William Cole
, et al. (137 additional authors not shown)
Abstract:
The fusion of non-Abelian anyons or topological defects is a fundamental operation in measurement-only topological quantum computation. In topological superconductors, this operation amounts to a determination of the shared fermion parity of Majorana zero modes. As a step towards this, we implement a single-shot interferometric measurement of fermion parity in indium arsenide-aluminum heterostruct…
▽ More
The fusion of non-Abelian anyons or topological defects is a fundamental operation in measurement-only topological quantum computation. In topological superconductors, this operation amounts to a determination of the shared fermion parity of Majorana zero modes. As a step towards this, we implement a single-shot interferometric measurement of fermion parity in indium arsenide-aluminum heterostructures with a gate-defined nanowire. The interferometer is formed by tunnel-coupling the proximitized nanowire to quantum dots. The nanowire causes a state-dependent shift of these quantum dots' quantum capacitance of up to 1 fF. Our quantum capacitance measurements show flux h/2e-periodic bimodality with a signal-to-noise ratio of 1 in 3.7 $μ$s at optimal flux values. From the time traces of the quantum capacitance measurements, we extract a dwell time in the two associated states that is longer than 1 ms at in-plane magnetic fields of approximately 2 T. These results are consistent with a measurement of the fermion parity encoded in a pair of Majorana zero modes that are separated by approximately 3 $μ$m and subjected to a low rate of poisoning by non-equilibrium quasiparticles. The large capacitance shift and long poisoning time enable a parity measurement error probability of 1%.
△ Less
Submitted 2 April, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.