-
FedsLLM: Federated Split Learning for Large Language Models over Communication Networks
Authors:
Kai Zhao,
Zhaohui Yang,
Chongwen Huang,
Xiaoming Chen,
Zhaoyang Zhang
Abstract:
Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into clie…
▽ More
Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into client subnetworks and server subnetworks. It leverages a federated server to aggregate and update client models. As the training data are transmitted through a wireless network between clients and both main and federated servers, the training delay is determined by the learning accuracy and the allocation of communication bandwidth. This paper models the minimization of the training delay by integrating computation and communication optimization, simplifying the optimization problem into a convex problem to find the optimal solution. Additionally, it presents a lemma that describes the precise solutions to this problem. Simulation results demonstrate that the proposed optimization algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection
Authors:
Zhiqiang Yang,
Qiu Guan,
Keer Zhao,
Jianmin Yang,
Xinli Xu,
Haixia Long,
Ying Tang
Abstract:
Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi…
▽ More
Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi-Branch Auxiliary FPN (MAFPN). Within MAFPN, the Superficial Assisted Fusion (SAF) module is designed to combine the output of the backbone with the neck, preserving an optimal level of shallow information to facilitate subsequent learning. Meanwhile, the Advanced Assisted Fusion (AAF) module deeply embedded within the neck conveys a more diverse range of gradient information to the output layer.
Furthermore, our proposed Re-parameterized Heterogeneous Efficient Layer Aggregation Network (RepHELAN) module ensures that both the overall model architecture and convolutional design embrace the utilization of heterogeneous large convolution kernels. Therefore, this guarantees the preservation of information related to small targets while simultaneously achieving the multi-scale receptive field. Finally, taking the nano version of MAF-YOLO for example, it can achieve 42.4% AP on COCO with only 3.76M learnable parameters and 10.51G FLOPs, and approximately outperforms YOLOv8n by about 5.1%. The source code of this work is available at: https://github.com/yang-0201/MAF-YOLO.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization
Authors:
Daoce Wang,
Pascal Grosset,
Jesus Pulido,
Tushar M. Athawale,
Jiannan Tian,
Kai Zhao,
Zarija Lukić,
Axel Huebl,
Zhe Wang,
James Ahrens,
Dingwen Tao
Abstract:
Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To thi…
▽ More
Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To this end, we introduce an innovative workflow that facilitates high-quality multi-resolution data compression for both uniform and AMR simulations. Initially, to extend the usability of multi-resolution techniques, our workflow employs a compression-oriented Region of Interest (ROI) extraction method, transforming uniform data into a multi-resolution format. Subsequently, to bridge the gap between multi-resolution techniques and lossy compressors, we optimize three distinct compressors, ensuring their optimal performance on multi-resolution data. Lastly, we incorporate an advanced uncertainty visualization method into our workflow to understand the potential impacts of lossy compression. Experimental evaluation demonstrates that our workflow achieves significant compression quality improvements.
△ Less
Submitted 11 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Cross-Slice Attention and Evidential Critical Loss for Uncertainty-Aware Prostate Cancer Detection
Authors:
Alex Ling Yu Hung,
Haoxin Zheng,
Kai Zhao,
Kaifeng Pang,
Demetri Terzopoulos,
Kyunghyun Sung
Abstract:
Current deep learning-based models typically analyze medical images in either 2D or 3D albeit disregarding volumetric information or suffering sub-optimal performance due to the anisotropic resolution of MR data. Furthermore, providing an accurate uncertainty estimation is beneficial to clinicians, as it indicates how confident a model is about its prediction. We propose a novel 2.5D cross-slice a…
▽ More
Current deep learning-based models typically analyze medical images in either 2D or 3D albeit disregarding volumetric information or suffering sub-optimal performance due to the anisotropic resolution of MR data. Furthermore, providing an accurate uncertainty estimation is beneficial to clinicians, as it indicates how confident a model is about its prediction. We propose a novel 2.5D cross-slice attention model that utilizes both global and local information, along with an evidential critical loss, to perform evidential deep learning for the detection in MR images of prostate cancer, one of the most common cancers and a leading cause of cancer-related death in men. We perform extensive experiments with our model on two different datasets and achieve state-of-the-art performance in prostate cancer detection along with improved epistemic uncertainty estimation. The implementation of the model is available at https://github.com/aL3x-O-o-Hung/GLCSA_ECLoss.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
BioMNER: A Dataset for Biomedical Method Entity Recognition
Authors:
Chen Tang,
Bohao Yang,
Kun Zhao,
Bo Lv,
Chenghao Xiao,
Frank Guerin,
Chenghua Lin
Abstract:
Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources…
▽ More
Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources, primarily attributed to the intricate nature of methodological concepts, which necessitate a profound understanding for precise delineation. In this study, we propose a novel dataset for biomedical method entity recognition, employing an automated BioMethod entity recognition and information retrieval system to assist human annotation. Furthermore, we comprehensively explore a range of conventional and contemporary open-domain NER methodologies, including the utilization of cutting-edge large-scale language models (LLMs) customised to our dataset. Our empirical findings reveal that the large parameter counts of language models surprisingly inhibit the effective assimilation of entity extraction patterns pertaining to biomedical methods. Remarkably, the approach, leveraging the modestly sized ALBERT model (only 11MB), in conjunction with conditional random fields (CRF), achieves state-of-the-art (SOTA) performance.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
SimsChat: A Customisable Persona-Driven Role-Playing Agent
Authors:
Bohao Yang,
Dong Liu,
Chen Tang,
Chenghao Xiao,
Kun Zhao,
Chao Li,
Lin Yuan,
Guang Yang,
Lanxiao Huang,
Chenghua Lin
Abstract:
Large Language Models (LLMs) possess the remarkable capability to understand human instructions and generate high-quality text, enabling them to act as agents that simulate human behaviours. This capability allows LLMs to emulate human beings in a more advanced manner, beyond merely replicating simple human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters…
▽ More
Large Language Models (LLMs) possess the remarkable capability to understand human instructions and generate high-quality text, enabling them to act as agents that simulate human behaviours. This capability allows LLMs to emulate human beings in a more advanced manner, beyond merely replicating simple human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters from several aspects. In this work, we introduce the Customisable Conversation Agent Framework, which employs LLMs to simulate real-world characters that can be freely customised according to different user preferences. The customisable framework is helpful for designing customisable characters and role-playing agents according to human's preferences. We first propose the SimsConv dataset, which comprises 68 different customised characters, 1,360 multi-turn role-playing dialogues, and encompasses 13,971 interaction dialogues in total. The characters are created from several real-world elements, such as career, aspiration, trait, and skill. Building on these foundations, we present SimsChat, a freely customisable role-playing agent. It incorporates different real-world scenes and topic-specific character interaction dialogues, simulating characters' life experiences in various scenarios and topic-specific interactions with specific emotions. Experimental results show that our proposed framework achieves desirable performance and provides helpful guideline for building better simulacra of human beings in the future. Our data and code are available at https://github.com/Bernard-Yang/SimsChat.
△ Less
Submitted 30 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms
Authors:
Kun Zhao,
Chenghao Xiao,
Chen Tang,
Bohao Yang,
Kai Ye,
Noura Al Moubayed,
Liang Zhan,
Chenghua Lin
Abstract:
Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This…
▽ More
Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}.
△ Less
Submitted 30 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback
Authors:
Zhongtao Miao,
Kaiyan Zhao,
Yoshimasa Tsuruoka
Abstract:
Current representations used in reasoning steps of large language models can mostly be categorized into two main types: (1) natural language, which is difficult to verify; and (2) non-natural language, usually programming code, which is difficult for people who are unfamiliar with coding to read. In this paper, we propose to use a semi-structured form to represent reasoning steps of large language…
▽ More
Current representations used in reasoning steps of large language models can mostly be categorized into two main types: (1) natural language, which is difficult to verify; and (2) non-natural language, usually programming code, which is difficult for people who are unfamiliar with coding to read. In this paper, we propose to use a semi-structured form to represent reasoning steps of large language models. Specifically, we use relation tuples, which are not only human-readable but also machine-friendly and easier to verify than natural language. We implement a framework that includes three main components: (1) introducing relation tuples into the reasoning steps of large language models; (2) implementing an automatic verification process of reasoning steps with a local code interpreter based on relation tuples; and (3) integrating a simple and effective dynamic feedback mechanism, which we found helpful for self-improvement of large language models. The experimental results on various arithmetic datasets demonstrate the effectiveness of our method in improving the arithmetic reasoning ability of large language models. The source code is available at https://github.com/gpgg/art.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Authors:
Zhongzhi Yu,
Zheng Wang,
Yuhan Li,
Haoran You,
Ruijie Gao,
Xiaoya Zhou,
Sreenidhi Reedy Bommu,
Yang Katie Zhao,
Yingyan Celine Lin
Abstract:
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef…
▽ More
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and effective LLM adaptation on edge devices. Specifically, Edge-LLM features three core components: (1) a layer-wise unified compression (LUC) technique to reduce the computation overhead by generating layer-wise pruning sparsity and quantization bit-width policies, (2) an adaptive layer tuning and voting scheme to reduce the memory overhead by reducing the backpropagation depth, and (3) a complementary hardware scheduling strategy to handle the irregular computation patterns introduced by LUC and adaptive layer tuning, thereby achieving efficient computation and data movements. Extensive experiments demonstrate that Edge-LLM achieves a 2.92x speed up and a 4x memory overhead reduction as compared to vanilla tuning methods with comparable task accuracy. Our code is available at https://github.com/GATECH-EIC/Edge-LLM
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
Authors:
Wenjing Zhang,
Xuejiao Lei,
Zhaoxiang Liu,
Meijuan An,
Bikun Yang,
KaiKai Zhao,
Kai Wang,
Shiguo Lian
Abstract:
With the profound development of large language models(LLMs), their safety concerns have garnered increasing attention. However, there is a scarcity of Chinese safety benchmarks for LLMs, and the existing safety taxonomies are inadequate, lacking comprehensive safety detection capabilities in authentic Chinese scenarios. In this work, we introduce CHiSafetyBench, a dedicated safety benchmark for e…
▽ More
With the profound development of large language models(LLMs), their safety concerns have garnered increasing attention. However, there is a scarcity of Chinese safety benchmarks for LLMs, and the existing safety taxonomies are inadequate, lacking comprehensive safety detection capabilities in authentic Chinese scenarios. In this work, we introduce CHiSafetyBench, a dedicated safety benchmark for evaluating LLMs' capabilities in identifying risky content and refusing answering risky questions in Chinese contexts. CHiSafetyBench incorporates a dataset that covers a hierarchical Chinese safety taxonomy consisting of 5 risk areas and 31 categories. This dataset comprises two types of tasks: multiple-choice questions and question-answering, evaluating LLMs from the perspectives of risk content identification and the ability to refuse answering risky questions respectively. Utilizing this benchmark, we validate the feasibility of automatic evaluation as a substitute for human evaluation and conduct comprehensive automatic safety assessments on mainstream Chinese LLMs. Our experiments reveal the varying performance of different models across various safety domains, indicating that all models possess considerable potential for improvement in Chinese safety capabilities. Our dataset is publicly available at https://github.com/UnicomAI/DataSet/tree/main/TestData/Safety.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
What is the best model? Application-driven Evaluation for Large Language Models
Authors:
Shiguo Lian,
Kaikai Zhao,
Xinhui Liu,
Xuejiao Lei,
Bikun Yang,
Wenjing Zhang,
Kai Wang,
Zhaoxiang Liu
Abstract:
General large language models enhanced with supervised fine-tuning and reinforcement learning from human feedback are increasingly popular in academia and industry as they generalize foundation models to various practical tasks in a prompt manner. To assist users in selecting the best model in practical application scenarios, i.e., choosing the model that meets the application requirements while m…
▽ More
General large language models enhanced with supervised fine-tuning and reinforcement learning from human feedback are increasingly popular in academia and industry as they generalize foundation models to various practical tasks in a prompt manner. To assist users in selecting the best model in practical application scenarios, i.e., choosing the model that meets the application requirements while minimizing cost, we introduce A-Eval, an application-driven LLMs evaluation benchmark for general large language models. First, we categorize evaluation tasks into five main categories and 27 sub-categories from a practical application perspective. Next, we construct a dataset comprising 678 question-and-answer pairs through a process of collecting, annotating, and reviewing. Then, we design an objective and effective evaluation method and evaluate a series of LLMs of different scales on A-Eval. Finally, we reveal interesting laws regarding model scale and task difficulty level and propose a feasible method for selecting the best model. Through A-Eval, we provide clear empirical and engineer guidance for selecting the best model, reducing barriers to selecting and using LLMs and promoting their application and development. Our benchmark is publicly available at https://github.com/UnicomAI/DataSet/tree/main/TestData/GeneralAbility.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition
Authors:
Eleni Triantafillou,
Peter Kairouz,
Fabian Pedregosa,
Jamie Hayes,
Meghdad Kurmanji,
Kairan Zhao,
Vincent Dumoulin,
Julio Jacques Junior,
Ioannis Mitliagkas,
Jun Wan,
Lisheng Sun Hosoya,
Sergio Escalera,
Gintare Karolina Dziugaite,
Peter Triantafillou,
Isabelle Guyon
Abstract:
We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In thi…
▽ More
We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In this paper, we analyze top solutions and delve into discussions on benchmarking unlearning, which itself is a research problem. The evaluation methodology we developed for the competition measures forgetting quality according to a formal notion of unlearning, while incorporating model utility for a holistic evaluation. We analyze the effectiveness of different instantiations of this evaluation framework vis-a-vis the associated compute cost, and discuss implications for standardizing evaluation. We find that the ranking of leading methods remains stable under several variations of this framework, pointing to avenues for reducing the cost of evaluation. Overall, our findings indicate progress in unlearning, with top-performing competition entries surpassing existing algorithms under our evaluation framework. We analyze trade-offs made by different algorithms and strengths or weaknesses in terms of generalizability to new datasets, paving the way for advancing both benchmarking and algorithm development in this important area.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Authors:
Yutao Sun,
Mingshuai Chen,
Kangjia Zhao,
He Li,
Jintao Chen,
Linyu Yang,
Zhongyi Wang,
Tiancheng Zhao,
Jianwei Yin
Abstract:
Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show how HORAE facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named HORAE that automates the…
▽ More
Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show how HORAE facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named HORAE that automates the HORAE modeling process, thereby yielding an end-to-end framework for fully automated intelligent service regulation.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Authors:
Quandong Wang,
Yuxuan Yuan,
Xiaoyu Yang,
Ruike Zhang,
Kang Zhao,
Wei Liu,
Jian Luan,
Daniel Povey,
Bin Wang
Abstract:
While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The sub…
▽ More
While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. We shall release the source code of the proposed architecture in the published version.
△ Less
Submitted 17 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Simple smooth modules over the Ramond algebra and applications to vertex operator superalgebras
Authors:
Yulu Chen,
Ran Shen,
Yufeng Yao,
Kaiming Zhao
Abstract:
Simple smooth modules over the Virasoro algebra and one of the super-Virasoro algebra named the Neveu-Schwarz algebra were classified. This problem remained unsolved for the other super-Virasoro algebra called the Ramond algebra. In this paper, all simple smooth modules over the Ramond algebra are classified. More precisely, a simple smooth module over the Ramond algebra is either a simple highest…
▽ More
Simple smooth modules over the Virasoro algebra and one of the super-Virasoro algebra named the Neveu-Schwarz algebra were classified. This problem remained unsolved for the other super-Virasoro algebra called the Ramond algebra. In this paper, all simple smooth modules over the Ramond algebra are classified. More precisely, a simple smooth module over the Ramond algebra is either a simple highest weight module or isomorphic to an induced module from a simple module over a finite dimensional solvable Lie superalgebra. As an application we obtain all simple weak $ψ$-twisted modules over some veterx operator superalgebras.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Biderivations of Lie algebras
Authors:
Qiufan Chen,
Yufeng Yao,
Kaiming Zhao
Abstract:
In this paper, we first introduce the concept of symmetric biderivation radicals and characteristic subalgebras of Lie algebras, and study their properties. Based on these results, we precisely determine biderivations of some Lie algebras including finite-dimensional simple Lie algebras over arbitrary fields of characteristic not $2$ or $3$, and the Witt algebras $\mathcal{W}^+_n$ over fields of c…
▽ More
In this paper, we first introduce the concept of symmetric biderivation radicals and characteristic subalgebras of Lie algebras, and study their properties. Based on these results, we precisely determine biderivations of some Lie algebras including finite-dimensional simple Lie algebras over arbitrary fields of characteristic not $2$ or $3$, and the Witt algebras $\mathcal{W}^+_n$ over fields of characteristic $0$. As an application, commutative post-Lie algebra structure on aforementioned Lie algebras is shown to be trivial.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification
Authors:
Chun Liu,
Hongguang Zhang,
Kainan Zhao,
Xinghai Ju,
Lin Yang
Abstract:
With the booming of Large Language Models (LLMs), prompt-learning has become a promising method mainly researched in various research areas. Recently, many attempts based on prompt-learning have been made to improve the performance of text classification. However, most of these methods are based on heuristic Chain-of-Thought (CoT), and tend to be more complex but less efficient. In this paper, we…
▽ More
With the booming of Large Language Models (LLMs), prompt-learning has become a promising method mainly researched in various research areas. Recently, many attempts based on prompt-learning have been made to improve the performance of text classification. However, most of these methods are based on heuristic Chain-of-Thought (CoT), and tend to be more complex but less efficient. In this paper, we rethink the LLM-based text classification methodology, propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task. To illustrate, we first study how to properly extract and fuse the text embeddings via various lightweight LLMs at different network depths to improve their robustness and discrimination, then adapt such embeddings to train the classifier. We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead using lightweight LLM backbones compared to recent methods based on larger LLMs, i.e. GPT-3, and sophisticated prompt-based strategies. Our LLMEmbed achieves adequate accuracy on publicly available benchmarks without any fine-tuning while merely use 4% model parameters, 1.8% electricity consumption and 1.5% runtime compared to its counterparts. Code is available at: https://github.com/ChunLiu-cs/LLMEmbed-ACL2024.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A novel measurement method for SiPM external crosstalk probability at low temperature
Authors:
Guanda Li,
Lei Wang,
Xilei Sun,
Fang Liu,
Cong Guo,
Kangkang Zhao,
Lei Tian,
Zeyuan Yu,
Zhilong Hou,
Chi Li,
Yu Lei,
Bin Wang,
Rongbin Zhou
Abstract:
Silicon photomultipliers (SiPMs) are being considered as potential replacements for conventional photomultiplier tubes (PMTs). However, a significant disadvantage of SiPMs is crosstalk (CT), wherein photons propagate through other pixels, resulting in secondary avalanches. CT can be categorized into internal crosstalk and external crosstalk based on whether the secondary avalanche occurs within th…
▽ More
Silicon photomultipliers (SiPMs) are being considered as potential replacements for conventional photomultiplier tubes (PMTs). However, a significant disadvantage of SiPMs is crosstalk (CT), wherein photons propagate through other pixels, resulting in secondary avalanches. CT can be categorized into internal crosstalk and external crosstalk based on whether the secondary avalanche occurs within the same SiPM or a different one. Numerous methods exist for quantitatively estimating the percentage of internal crosstalk (iCT). However, external crosstalk (eCT) has not been extensively studied.
This article presents a novel measurement method for the probability of emitting an external crosstalk photon during a single pixel avalanche, using a setup involving two identical SiPMs facing each other, and without the need for complex optical designs. The entire apparatus is enclosed within a stainless steel chamber, functioning as a light-tight enclosure, and maintained at liquid nitrogen temperature. The experimental setup incorporates two Sensl J-60035 SiPM chips along with two 0.5-inch Hamamatsu Photonics (HPK) VUV4 S13370-6050CN SiPM arrays. The findings show a linear relationship between the probability of emitting an external crosstalk photon and the SiPM overvoltage for both SiPM samples. Surprisingly, this novel measurement method also rovides measurements of the SiPM photon detection efficiency (PDE) for eCT photons at low temperature.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
What makes unlearning hard and what to do about it
Authors:
Kairan Zhao,
Meghdad Kurmanji,
George-Octavian Bărbulescu,
Eleni Triantafillou,
Peter Triantafillou
Abstract:
Machine unlearning is the problem of removing the effect of a subset of training data (the ''forget set'') from a trained model without damaging the model's utility e.g. to comply with users' requests to delete their data, or remove mislabeled, poisoned or otherwise problematic data. With unlearning research still being at its infancy, many fundamental open questions exist: Are there interpretable…
▽ More
Machine unlearning is the problem of removing the effect of a subset of training data (the ''forget set'') from a trained model without damaging the model's utility e.g. to comply with users' requests to delete their data, or remove mislabeled, poisoned or otherwise problematic data. With unlearning research still being at its infancy, many fundamental open questions exist: Are there interpretable characteristics of forget sets that substantially affect the difficulty of the problem? How do these characteristics affect different state-of-the-art algorithms? With this paper, we present the first investigation aiming to answer these questions. We identify two key factors affecting unlearning difficulty and the performance of unlearning algorithms. Evaluation on forget sets that isolate these identified factors reveals previously-unknown behaviours of state-of-the-art algorithms that don't materialize on random forget sets. Based on our insights, we develop a framework coined Refined-Unlearning Meta-algorithm (RUM) that encompasses: (i) refining the forget set into homogenized subsets, according to different characteristics; and (ii) a meta-algorithm that employs existing algorithms to unlearn each subset and finally delivers a model that has unlearned the overall forget set. We find that RUM substantially improves top-performing unlearning algorithms. Overall, we view our work as an important step in (i) deepening our scientific understanding of unlearning and (ii) revealing new pathways to improving the state-of-the-art.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Report on Methods and Applications for Crafting 3D Humans
Authors:
Lei Liu,
Ke Zhao
Abstract:
This paper presents an in-depth exploration of 3D human model and avatar generation technology, propelled by the rapid advancements in large-scale models and artificial intelligence. The paper reviews the comprehensive process of 3D human model generation, from scanning to rendering, and highlights the pivotal role these models play in entertainment, VR, AR, healthcare, and education. We underscor…
▽ More
This paper presents an in-depth exploration of 3D human model and avatar generation technology, propelled by the rapid advancements in large-scale models and artificial intelligence. The paper reviews the comprehensive process of 3D human model generation, from scanning to rendering, and highlights the pivotal role these models play in entertainment, VR, AR, healthcare, and education. We underscore the significance of diffusion models in generating high-fidelity images and videos. It emphasizes the indispensable nature of 3D human models in enhancing user experiences and functionalities across various fields. Furthermore, this paper anticipates the potential of integrating large-scale models with deep learning to revolutionize 3D content generation, offering insights into the future prospects of the technology. It concludes by emphasizing the importance of continuous innovation in the field, suggesting that ongoing advancements will significantly expand the capabilities and applications of 3D human models and avatars.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Correlated Electronic Structure and Density-Wave Gap in Trilayer Nickelate La4Ni3O10
Authors:
X. Du,
Y. D. Li,
Y. T. Cao,
C. Y. Pei,
M. X. Zhang,
W. X. Zhao,
K. Y. Zhai,
R. Z. Xu,
Z. K. Liu,
Z. W. Li,
J. K. Zhao,
G. Li,
Y. L. Chen,
Y. P. Qi,
H. J. Guo,
L. X. Yang
Abstract:
The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popular…
▽ More
The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popularity of nickelates in the Ruddlesden-Popper phase. In this study, combining high-resolution angle-resolved photoemission spectroscopy and ab initio calculation, we systematically investigate the electronic structures of La4Ni3O10 at ambient pressure. We reveal a high resemblance of La4Ni3O10 with La3Ni2O7 in the orbital-dependent fermiology and electronic structure, suggesting a similar electronic correlation between the two compounds. The temperature-dependent measurements imply an orbital-dependent energy gap related to the density-wave transition in La4Ni3O10. By comparing the theoretical pressure-dependent electronic structure, clues about the superconducting high-pressure phase can be deduced from the ambient measurements, providing crucial information for deciphering the unconventional superconductivity in nickelates.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning
Authors:
Yihang Wang,
Yuying Qiu,
Peng Chen,
Kai Zhao,
Yang Shu,
Zhongwen Rao,
Lujia Pan,
Bin Yang,
Chenjuan Guo
Abstract:
With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to…
▽ More
With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to capture domain-specific features from time series data across various domains for adaptive transfer in downstream tasks. To address these challenges, we propose a Register Assisted General Time Series Forecasting Model with Decomposed Frequency Learning (ROSE), a novel pre-trained model for time series forecasting. ROSE employs Decomposed Frequency Learning for the pre-training task, which decomposes coupled semantic and periodic information in time series with frequency-based masking and reconstruction to obtain unified representations across domains. We also equip ROSE with a Time Series Register, which learns to generate a register codebook to capture domain-specific representations during pre-training and enhances domain-adaptive transfer by selecting related register tokens on downstream tasks. After pre-training on large-scale time series data, ROSE achieves state-of-the-art forecasting performance on 8 real-world benchmarks. Remarkably, even in few-shot scenarios, it demonstrates competitive or superior performance compared to existing methods trained with full data.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Dominant Shuffle: A Simple Yet Powerful Data Augmentation for Time-series Prediction
Authors:
Kai Zhao,
Zuojie He,
Alex Hung,
Dan Zeng
Abstract:
Recent studies have suggested frequency-domain Data augmentation (DA) is effec tive for time series prediction. Existing frequency-domain augmentations disturb the original data with various full-spectrum noises, leading to excess domain gap between augmented and original data. Although impressive performance has been achieved in certain cases, frequency-domain DA has yet to be generalized to time…
▽ More
Recent studies have suggested frequency-domain Data augmentation (DA) is effec tive for time series prediction. Existing frequency-domain augmentations disturb the original data with various full-spectrum noises, leading to excess domain gap between augmented and original data. Although impressive performance has been achieved in certain cases, frequency-domain DA has yet to be generalized to time series prediction datasets. In this paper, we found that frequency-domain augmentations can be significantly improved by two modifications that limit the perturbations. First, we found that limiting the perturbation to only dominant frequencies significantly outperforms full-spectrum perturbations. Dominant fre quencies represent the main periodicity and trends of the signal and are more important than other frequencies. Second, we found that simply shuffling the dominant frequency components is superior over sophisticated designed random perturbations. Shuffle rearranges the original components (magnitudes and phases) and limits the external noise. With these two modifications, we proposed dominant shuffle, a simple yet effective data augmentation for time series prediction. Our method is very simple yet powerful and can be implemented with just a few lines of code. Extensive experiments with eight datasets and six popular time series models demonstrate that our method consistently improves the baseline performance under various settings and significantly outperforms other DA methods. Code can be accessed at https://kaizhao.net/time-series.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Authors:
Kun Zhao,
Bohao Yang,
Chen Tang,
Chenghua Lin,
Liang Zhan
Abstract:
The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonse…
▽ More
The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonsense reasoning biases within LLMs may hinder their performance in domainspecific evaluations. To address both issues, we propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation), that leverages both a small, specialised model (SLM), and LLMs for the evaluation of open domain dialogues. Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that combines embedding cosine distances with similarity learned through neural networks, and (3) a strategy for incorporating the evaluation results from both the SLM and LLMs. Our empirical results demonstrate that our approach achieves state-of-the-art performance in both the classification and evaluation tasks, and additionally the SLIDE evaluator exhibits better correlation with human judgements. Our code is available at https:// github.com/hegehongcha/SLIDE-ACL2024.
△ Less
Submitted 29 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Challenges and Opportunities in 3D Content Generation
Authors:
Ke Zhao,
Andreas Larsen
Abstract:
This paper explores the burgeoning field of 3D content generation within the landscape of Artificial Intelligence Generated Content (AIGC) and large-scale models. It investigates innovative methods like Text-to-3D and Image-to-3D, which translate text or images into 3D objects, reshaping our understanding of virtual and real-world simulations. Despite significant advancements in text and image gen…
▽ More
This paper explores the burgeoning field of 3D content generation within the landscape of Artificial Intelligence Generated Content (AIGC) and large-scale models. It investigates innovative methods like Text-to-3D and Image-to-3D, which translate text or images into 3D objects, reshaping our understanding of virtual and real-world simulations. Despite significant advancements in text and image generation, automatic 3D content generation remains nascent. This paper emphasizes the urgency for further research in this area. By leveraging pre-trained diffusion models, which have demonstrated prowess in high-fidelity image generation, this paper aims to summary 3D content creation, addressing challenges such as data scarcity and computational resource limitations. Additionally, this paper discusses the challenges and proposes solutions for using pre-trained diffusion models in 3D content generation. By synthesizing relevant research and outlining future directions, this study contributes to advancing the field of 3D content generation amidst the proliferation of large-scale AIGC models.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Authors:
Qichao Shentu,
Beibu Li,
Kai Zhao,
Yang Shu,
Zhongwen Rao,
Lujia Pan,
Bin Yang,
Chenjuan Guo
Abstract:
Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomal…
▽ More
Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomaly detection model, which is pre-trained on extensive multi-domain datasets and can subsequently apply to a multitude of downstream scenarios. The significant divergence of time series data across different domains presents two primary challenges in building such a general model: (1) meeting the diverse requirements of appropriate information bottlenecks tailored to different datasets in one unified model, and (2) enabling distinguishment between multiple normal and abnormal patterns, both are crucial for effective anomaly detection in various target scenarios. To tackle these two challenges, we propose a General time series anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders (DADA), which enables flexible selection of bottlenecks based on different data and explicitly enhances clear differentiation between normal and abnormal series. We conduct extensive experiments on nine target datasets from different domains. After pre-training on multi-domain data, DADA, serving as a zero-shot anomaly detector for these datasets, still achieves competitive or even superior results compared to those models tailored to each specific dataset.
△ Less
Submitted 2 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Authors:
Sang Keun Choe,
Hwijeen Ahn,
Juhan Bae,
Kewen Zhao,
Minsoo Kang,
Youngseog Chung,
Adithya Pratapa,
Willie Neiswanger,
Emma Strubell,
Teruko Mitamura,
Jeff Schneider,
Eduard Hovy,
Roger Grosse,
Eric Xing
Abstract:
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast trai…
▽ More
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast training datasets has been largely limited by prohibitive compute and memory costs. In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability with an efficient gradient projection strategy called LoGra that leverages the gradient structure in backpropagation. We then provide a theoretical motivation of gradient projection approaches to influence functions to promote trust in the data valuation process. Lastly, we lower the barrier to implementing data valuation systems by introducing LogIX, a software package that can transform existing training code into data valuation code with minimal effort. In our data valuation experiments, LoGra achieves competitive accuracy against more expensive baselines while showing up to 6,500x improvement in throughput and 5x reduction in GPU memory usage when applied to Llama3-8B-Instruct and the 1B-token dataset.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation
Authors:
Haoteng Tang,
Guodong Liu,
Siyuan Dai,
Kai Ye,
Kun Zhao,
Wenlu Wang,
Carl Yang,
Lifang He,
Alex Leow,
Paul Thompson,
Heng Huang,
Liang Zhan
Abstract:
The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun…
▽ More
The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal functional dynamics. In this study, we first construct the brain-effective network via the dynamic causal model. Subsequently, we introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE). This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks via an ordinary differential equation (ODE) model, which characterizes spatial-temporal brain dynamics. Our framework is validated on several clinical phenotype prediction tasks using two independent publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Uniqueness of tangent flows at infinity for finite-entropy shortening curves
Authors:
Kyeongsu Choi,
Dong-Hwi Seo,
Wei-Bo Su,
Kai-Wei Zhao
Abstract:
In this paper, we prove that an ancient smooth curve shortening flow with finite-entropy embedded in $\mathbb{R}^2$ has a unique tangent flow at infinity. To this end, we show that its rescaled flows backwardly converge to a line with multiplity $m\geq 3$ exponentially fast in any compact region, unless the flow is a shrinking circle, a static line, a paper clip, or a translating grim reaper. In a…
▽ More
In this paper, we prove that an ancient smooth curve shortening flow with finite-entropy embedded in $\mathbb{R}^2$ has a unique tangent flow at infinity. To this end, we show that its rescaled flows backwardly converge to a line with multiplity $m\geq 3$ exponentially fast in any compact region, unless the flow is a shrinking circle, a static line, a paper clip, or a translating grim reaper. In addition, we figure out the exact numbers of tips, vertices, and inflection points of the curves at negative enough time. Moreover, the exponential growth rate of graphical radius and the convergence of vertex regions to grim reaper curves will be shown.
△ Less
Submitted 8 June, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Coded Event-triggered Control for Nonlinear Systems
Authors:
Ruihang Ji,
Shuzhi Sam Ge,
Kai Zhao
Abstract:
This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather compl…
▽ More
This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather complicated yet challenging where the performance constraints are satisfied some time after (rather than from the beginning of) system operation, rendering normally employed prescribed performance control invalid because they may be not defined in the initial interval. By introducing auxiliary functions, we develop a Self-adjustable Prescribed Performance (SPP) mechanism which can flexibly adjust the symmetric or asymmetric performance boundaries to accommodate different initial conditions, providing an effective solution for the underlying tracking problem. In this way, the resulted CEC can not only consume less communication resources but also regulate the tracking error under any initial condition into an allowable set before a given time in a bounded and customizable manner. Simulation results verify and clarify the theoretical findings.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment
Authors:
L. T. Yang,
S. K. Liu,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio…
▽ More
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Polarization Perspectives on Hercules X-1: Further Constraining the Geometry
Authors:
Qingchang Zhao,
Hancheng Li,
Lian Tao,
Hua Feng,
Shuangnan Zhang,
Roland Walter,
Mingyu Ge,
Hao Tong,
Long Ji,
Liang Zhang,
Jinlu Qu,
Yue Huang,
Xiang Ma,
Shu Zhang,
Qianqing Yin,
Hongxing Yin,
Ruican Ma,
Shujie Zhao,
Panping Li,
Zixu Yang,
Hexin Liu,
Wei Yu,
Yiming Huang,
Zexi Li,
Yajun Li
, et al. (2 additional authors not shown)
Abstract:
We conduct a comprehensive analysis of the accreting X-ray pulsar, Hercules X-1, utilizing data from IXPE and NuSTAR. IXPE performed five observations of Her X-1, consisting of three in the Main-on state and two in the Short-on state. Our time-resolved analysis uncovers the linear correlations between the flux and polarization degree as well as the pulse fraction and polarization degree. Geometry…
▽ More
We conduct a comprehensive analysis of the accreting X-ray pulsar, Hercules X-1, utilizing data from IXPE and NuSTAR. IXPE performed five observations of Her X-1, consisting of three in the Main-on state and two in the Short-on state. Our time-resolved analysis uncovers the linear correlations between the flux and polarization degree as well as the pulse fraction and polarization degree. Geometry parameters are rigorously constrained by fitting the phase-resolved modulations of Cyclotron Resonance Scattering Feature and polarization angle with a simple dipole model and Rotating Vector Model respectively, yielding roughly consistent results. The changes of $χ_{\rm p}$ (the position angle of the pulsar's spin axis on the plane of the sky) between different Main-on observations suggest the possible forced precession of the neutron star crust. Furthermore, a linear association between the energy of Cyclotron Resonance Scattering Feature and polarization angle implies the prevalence of a dominant dipole magnetic field, and their phase-resolved modulations likely arise from viewing angle effects.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing
Authors:
Zhongze Wang,
Haitao Zhao,
Jingchao Peng,
Lujian Yao,
Kaijie Zhao
Abstract:
Unpaired image dehazing (UID) holds significant research importance due to the challenges in acquiring haze/clear image pairs with identical backgrounds. This paper proposes a novel method for UID named Orthogonal Decoupling Contrastive Regularization (ODCR). Our method is grounded in the assumption that an image consists of both haze-related features, which influence the degree of haze, and haze-…
▽ More
Unpaired image dehazing (UID) holds significant research importance due to the challenges in acquiring haze/clear image pairs with identical backgrounds. This paper proposes a novel method for UID named Orthogonal Decoupling Contrastive Regularization (ODCR). Our method is grounded in the assumption that an image consists of both haze-related features, which influence the degree of haze, and haze-unrelated features, such as texture and semantic information. ODCR aims to ensure that the haze-related features of the dehazing result closely resemble those of the clear image, while the haze-unrelated features align with the input hazy image. To accomplish the motivation, Orthogonal MLPs optimized geometrically on the Stiefel manifold are proposed, which can project image features into an orthogonal space, thereby reducing the relevance between different features. Furthermore, a task-driven Depth-wise Feature Classifier (DWFC) is proposed, which assigns weights to the orthogonal features based on the contribution of each channel's feature in predicting whether the feature source is hazy or clear in a self-supervised fashion. Finally, a Weighted PatchNCE (WPNCE) loss is introduced to achieve the pulling of haze-related features in the output image toward those of clear images, while bringing haze-unrelated features close to those of the hazy input. Extensive experiments demonstrate the superior performance of our ODCR method on UID.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Unleashing the Potential of Fractional Calculus in Graph Neural Networks with FROND
Authors:
Qiyu Kang,
Kai Zhao,
Qinxu Ding,
Feng Ji,
Xuhao Li,
Wenfei Liang,
Yang Song,
Wee Peng Tay
Abstract:
We introduce the FRactional-Order graph Neural Dynamical network (FROND), a new continuous graph neural network (GNN) framework. Unlike traditional continuous GNNs that rely on integer-order differential equations, FROND employs the Caputo fractional derivative to leverage the non-local properties of fractional calculus. This approach enables the capture of long-term dependencies in feature update…
▽ More
We introduce the FRactional-Order graph Neural Dynamical network (FROND), a new continuous graph neural network (GNN) framework. Unlike traditional continuous GNNs that rely on integer-order differential equations, FROND employs the Caputo fractional derivative to leverage the non-local properties of fractional calculus. This approach enables the capture of long-term dependencies in feature updates, moving beyond the Markovian update mechanisms in conventional integer-order models and offering enhanced capabilities in graph representation learning. We offer an interpretation of the node feature updating process in FROND from a non-Markovian random walk perspective when the feature updating is particularly governed by a diffusion process. We demonstrate analytically that oversmoothing can be mitigated in this setting. Experimentally, we validate the FROND framework by comparing the fractional adaptations of various established integer-order continuous GNNs, demonstrating their consistently improved performance and underscoring the framework's potential as an effective extension to enhance traditional continuous GNNs. The code is available at \url{https://github.com/zknus/ICLR2024-FROND}.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph
Authors:
Xiaochen Kev Gao,
Feng Yao,
Kewen Zhao,
Beilei He,
Animesh Kumar,
Vish Krishnan,
Jingbo Shang
Abstract:
Model scaling is becoming the default choice for many language tasks due to the success of large language models (LLMs). However, it can fall short in specific scenarios where simple customized methods excel. In this paper, we delve into the patent approval pre-diction task and unveil that simple domain-specific graph methods outperform enlarging the model, using the intrinsic dependencies within…
▽ More
Model scaling is becoming the default choice for many language tasks due to the success of large language models (LLMs). However, it can fall short in specific scenarios where simple customized methods excel. In this paper, we delve into the patent approval pre-diction task and unveil that simple domain-specific graph methods outperform enlarging the model, using the intrinsic dependencies within the patent data. Specifically, we first extend the embedding-based state-of-the-art (SOTA) by scaling up its backbone model with various sizes of open-source LLMs, then explore prompt-based methods to harness proprietary LLMs' potential, but find the best results close to random guessing, underlining the ineffectiveness of model scaling-up. Hence, we propose a novel Fine-grained cLAim depeNdency (FLAN) Graph through meticulous patent data analyses, capturing the inherent dependencies across segments of the patent text. As it is model-agnostic, we apply cost-effective graph models to our FLAN Graph to obtain representations for approval prediction. Extensive experiments and detailed analyses prove that incorporating FLAN Graph via various graph models consistently outperforms all LLM baselines significantly. We hope that our observations and analyses in this paper can bring more attention to this challenging task and prompt further research into the limitations of LLMs. Our source code and dataset can be obtained from http://github.com/ShangDataLab/FLAN-Graph.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer
Authors:
Rui She,
Qiyu Kang,
Sijie Wang,
Wee Peng Tay,
Kai Zhao,
Yang Song,
Tianyu Geng,
Yi Xu,
Diego Navarro Navarro,
Andreas Hartmannsgruber
Abstract:
Point cloud registration is a fundamental technique in 3-D computer vision with applications in graphics, autonomous driving, and robotics. However, registration tasks under challenging conditions, under which noise or perturbations are prevalent, can be difficult. We propose a robust point cloud registration approach that leverages graph neural partial differential equations (PDEs) and heat kerne…
▽ More
Point cloud registration is a fundamental technique in 3-D computer vision with applications in graphics, autonomous driving, and robotics. However, registration tasks under challenging conditions, under which noise or perturbations are prevalent, can be difficult. We propose a robust point cloud registration approach that leverages graph neural partial differential equations (PDEs) and heat kernel signatures. Our method first uses graph neural PDE modules to extract high dimensional features from point clouds by aggregating information from the 3-D point neighborhood, thereby enhancing the robustness of the feature representations. Then, we incorporate heat kernel signatures into an attention mechanism to efficiently obtain corresponding keypoints. Finally, a singular value decomposition (SVD) module with learnable weights is used to predict the transformation between two point clouds. Empirical experiments on a 3-D point cloud dataset demonstrate that our approach not only achieves state-of-the-art performance for point cloud registration but also exhibits better robustness to additive noise or 3-D shape perturbations.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Quantum simulation of honeycomb lattice model by high-order moiré pattern
Authors:
Qiang Wan,
Chunlong Wu,
Xun-Jiang Luo,
Shenghao Dai,
Cao Peng,
Renzhe Li,
Shangkun Mo,
Keming Zhao,
Wen-Xuan Qiu,
Hao Zhong,
Yiwei Li,
Chendong Zhang,
Fengcheng Wu,
Nan Xu
Abstract:
Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hopping and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysi…
▽ More
Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hopping and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysis, we demonstrate that high-order moiré patterns in graphene-monolayered xenon/krypton heterostructures can simulate honeycomb model in mesoscale, with in-situ tunable Hamiltonians parameters. The length scale of simulated lattice constant can be tuned by annealing processes, which in-situ adjusts intervalley interaction and hopping parameters in the simulated honeycomb lattice. The sign of the lattice constant can be switched by choosing xenon or krypton monolayer deposited on graphene, which controls sublattice degree of freedom and valley arrangment of Dirac fermions. Our work establishes a novel path for experimentally simulating the honeycomb model with tunable parameters by high-order moiré patterns.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment
Authors:
J. X. Liu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne…
▽ More
We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Authors:
Zheng Chen,
Zongwei Wu,
Eduard Zamfir,
Kai Zhang,
Yulun Zhang,
Radu Timofte,
Xiaokang Yang,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Zhijuan Huang,
Yajun Zou,
Yuan Huang,
Jiamin Lin,
Bingnan Han,
Xianyu Guan,
Yongsheng Yu,
Daoan Zhang,
Xuanwu Yin,
Kunlong Zuo,
Jinhua Hao,
Kai Zhao,
Kun Yuan,
Ming Sun,
Chao Zhou
, et al. (63 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i…
▽ More
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Two-Sided Flexibility in Platforms
Authors:
Daniel Freund,
Sébastien Martin,
Jiayu Kamessi Zhao
Abstract:
Flexibility is a cornerstone of operations management, crucial to hedge stochasticity in product demands, service requirements, and resource allocation. In two-sided platforms, flexibility is also two-sided and can be viewed as the compatibility of agents on one side with agents on the other side. Platform actions often influence the flexibility on either the demand or the supply side. But how sho…
▽ More
Flexibility is a cornerstone of operations management, crucial to hedge stochasticity in product demands, service requirements, and resource allocation. In two-sided platforms, flexibility is also two-sided and can be viewed as the compatibility of agents on one side with agents on the other side. Platform actions often influence the flexibility on either the demand or the supply side. But how should flexibility be jointly allocated across different sides? Whereas the literature has traditionally focused on only one side at a time, our work initiates the study of two-sided flexibility in matching platforms. We propose a parsimonious matching model in random graphs and identify the flexibility allocation that optimizes the expected size of a maximum matching. Our findings reveal that flexibility allocation is a first-order issue: for a given flexibility budget, the resulting matching size can vary greatly depending on how the budget is allocated. Moreover, even in the simple and symmetric settings we study, the quest for the optimal allocation is complicated. In particular, easy and costly mistakes can be made if the flexibility decisions on the demand and supply side are optimized independently (e.g., by two different teams in the company), rather than jointly. To guide the search for optimal flexibility allocation, we uncover two effects, flexibility cannibalization, and flexibility abundance, that govern when the optimal design places the flexibility budget only on one side or equally on both sides. In doing so we identify the study of two-sided flexibility as a significant aspect of platform efficiency.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Smooth representations of affine Kac-Moody algebras
Authors:
Vyacheslav Futorny,
Xiangquin Guo,
Yaohui Xue,
Kaiming Zhao
Abstract:
Smooth modules for affine Kac-Moody algebras have a prime importance for the quantum field theory as they correspond to the representations of the universal affine vertex algebras. But, very little is known about such modules beyond the category of positive energy representations. We construct a new class of smooth modules over affine Kac-Moody algebras. In a particular case, these modules are iso…
▽ More
Smooth modules for affine Kac-Moody algebras have a prime importance for the quantum field theory as they correspond to the representations of the universal affine vertex algebras. But, very little is known about such modules beyond the category of positive energy representations. We construct a new class of smooth modules over affine Kac-Moody algebras. In a particular case, these modules are isomorphic to those induced from generalized Whittaker modules for Takiff Lie algebras. We establish the irreducibility criterion for constructed modules in the case of the affine sl(2) Lie algebra.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention
Authors:
Ziru Liu,
Shuchang Liu,
Zijian Zhang,
Qingpeng Cai,
Xiangyu Zhao,
Kesen Zhao,
Lantao Hu,
Peng Jiang,
Kun Gai
Abstract:
In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards. Nevertheless, it suffers from instability in the learning process, stemming from the intricate interactions among bootstrapping, off-policy training, and function approximation. Moreover, in multi-reward rec…
▽ More
In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards. Nevertheless, it suffers from instability in the learning process, stemming from the intricate interactions among bootstrapping, off-policy training, and function approximation. Moreover, in multi-reward recommendation scenarios, designing a proper reward setting that reconciles the inner dynamics of various tasks is quite intricate. In response to these challenges, we introduce DT4IER, an advanced decision transformer-based recommendation model that is engineered to not only elevate the effectiveness of recommendations but also to achieve a harmonious balance between immediate user engagement and long-term retention. The DT4IER applies an innovative multi-reward design that adeptly balances short and long-term rewards with user-specific attributes, which serve to enhance the contextual richness of the reward sequence ensuring a more informed and personalized recommendation process. To enhance its predictive capabilities, DT4IER incorporates a high-dimensional encoder, skillfully designed to identify and leverage the intricate interrelations across diverse tasks. Furthermore, we integrate a contrastive learning approach within the action embedding predictions, a strategy that significantly boosts the model's overall performance. Experiments on three real-world datasets demonstrate the effectiveness of DT4IER against state-of-the-art Sequential Recommender Systems (SRSs) and Multi-Task Learning (MTL) models in terms of both prediction accuracy and effectiveness in specific tasks. The source code is accessible online to facilitate replication
△ Less
Submitted 10 June, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
A Survey on Error-Bounded Lossy Compression for Scientific Datasets
Authors:
Sheng Di,
Jinyang Liu,
Kai Zhao,
Xin Liang,
Robert Underwood,
Zhaorui Zhang,
Milan Shah,
Yafan Huang,
Jiajun Huang,
Xiaodong Yu,
Congrong Ren,
Hanqi Guo,
Grant Wilkins,
Dingwen Tao,
Jiannan Tian,
Sian Jin,
Zizhe Jian,
Daoce Wang,
MD Hasanur Rahman,
Boyuan Zhang,
Jon C. Calhoun,
Guanpeng Li,
Kazutomo Yoshii,
Khalid Ayed Alharthi,
Franck Cappello
Abstract:
Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each…
▽ More
Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each of them features particular pros and cons. In this paper we provide a comprehensive survey of emerging error-bounded lossy compression techniques for different use cases each involving big data to process. The key contribution is fourfold. (1) We summarize an insightful taxonomy of lossy compression into 6 classic compression models. (2) We provide a comprehensive survey of 10+ commonly used compression components/modules used in error-bounded lossy compressors. (3) We provide a comprehensive survey of 10+ state-of-the-art error-bounded lossy compressors as well as how they combine the various compression modules in their designs. (4) We provide a comprehensive survey of the lossy compression for 10+ modern scientific applications and use-cases. We believe this survey is useful to multiple communities including scientific applications, high-performance computing, lossy compression, and big data.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data
Authors:
Congrong Ren,
Sheng Di,
Longtao Zhang,
Kai Zhao,
Hanqi Guo
Abstract:
This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it…
▽ More
This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it possible to represent floating-point values with strict pointwise accuracy guarantees, the lack of correlations in particle data's storage ordering often limits the compression ratio. Inspired by quantization-encoding schemes in SZ lossy compressors, we dynamically determine the number of bits to encode particles of the dataset to increase the compression ratio. Specifically, we utilize a k-d tree to partition particles into subregions and generate ``bit boxes'' centered at particles for each subregion to encode their positions. These bit boxes ensure error control while reducing the bit count used for compression. We comprehensively evaluate our method against state-of-the-art compressors on cosmology, fluid dynamics, and fusion plasma datasets.
△ Less
Submitted 4 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Enhancing Cross-lingual Sentence Embedding for Low-resource Languages with Word Alignment
Authors:
Zhongtao Miao,
Qiyu Wu,
Kaiyan Zhao,
Zilong Wu,
Yoshimasa Tsuruoka
Abstract:
The field of cross-lingual sentence embeddings has recently experienced significant advancements, but research concerning low-resource languages has lagged due to the scarcity of parallel corpora. This paper shows that cross-lingual word representation in low-resource languages is notably under-aligned with that in high-resource languages in current models. To address this, we introduce a novel fr…
▽ More
The field of cross-lingual sentence embeddings has recently experienced significant advancements, but research concerning low-resource languages has lagged due to the scarcity of parallel corpora. This paper shows that cross-lingual word representation in low-resource languages is notably under-aligned with that in high-resource languages in current models. To address this, we introduce a novel framework that explicitly aligns words between English and eight low-resource languages, utilizing off-the-shelf word alignment models. This framework incorporates three primary training objectives: aligned word prediction and word translation ranking, along with the widely used translation ranking. We evaluate our approach through experiments on the bitext retrieval task, which demonstrate substantial improvements on sentence embeddings in low-resource languages. In addition, the competitive performance of the proposed model across a broader range of tasks in high-resource languages underscores its practicality.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Accelerating Transformer Pre-training with 2:4 Sparsity
Authors:
Yuezhou Hu,
Kang Zhao,
Weiyu Huang,
Jianfei Chen,
Jun Zhu
Abstract:
Training large transformers is slow, but recent innovations on GPU architecture give us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this property, we comprehensively investigate the feasibility of accelerating feed-forward networks (FFNs) of transformers in pre-training. First, we define a ``fli…
▽ More
Training large transformers is slow, but recent innovations on GPU architecture give us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this property, we comprehensively investigate the feasibility of accelerating feed-forward networks (FFNs) of transformers in pre-training. First, we define a ``flip rate'' to monitor the stability of a 2:4 training process. Utilizing this metric, we propose three techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the masked decay term on gradients, to determine a feasible decay factor in warm-up stage, and to enhance the model's quality by a dense fine-tuning procedure near the end of pre-training. Besides, we devise two techniques to practically accelerate training: to calculate transposable 2:4 masks by convolution, and to accelerate gated activation functions by reducing GPU L2 cache miss. Experiments show that our 2:4 sparse training algorithm achieves similar convergence to dense training algorithms on several transformer pre-training tasks, while actual acceleration can be observed on different shapes of transformer block apparently. Our toolkit is available at https://github.com/huyz2023/2by4-pretrain.
△ Less
Submitted 27 May, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Class-Incremental Few-Shot Event Detection
Authors:
Kailin Zhao,
Xiaolong Jin,
Long Bai,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called c…
▽ More
Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called class-incremental few-shot event detection. Nevertheless, this task faces two problems, i.e., old knowledge forgetting and new class overfitting. To solve these problems, this paper further presents a novel knowledge distillation and prompt learning based method, called Prompt-KD. Specifically, to handle the forgetting problem about old knowledge, Prompt-KD develops an attention based multi-teacher knowledge distillation framework, where the ancestor teacher model pre-trained on base classes is reused in all learning sessions, and the father teacher model derives the current student model via adaptation. On the other hand, in order to cope with the few-shot learning scenario and alleviate the corresponding new class overfitting problem, Prompt-KD is also equipped with a prompt learning mechanism. Extensive experiments on two benchmark datasets, i.e., FewEvent and MAVEN, demonstrate the superior performance of Prompt-KD.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Structured Information Matters: Incorporating Abstract Meaning Representation into LLMs for Improved Open-Domain Dialogue Evaluation
Authors:
Bohao Yang,
Kun Zhao,
Chen Tang,
Liang Zhan,
Chenghua Lin
Abstract:
Automatic open-domain dialogue evaluation has attracted increasing attention. Trainable evaluation metrics are commonly trained with true positive and randomly selected negative responses, resulting in a tendency for them to assign a higher score to the responses that share higher content similarity with a given context. However, adversarial negative responses possess high content similarity with…
▽ More
Automatic open-domain dialogue evaluation has attracted increasing attention. Trainable evaluation metrics are commonly trained with true positive and randomly selected negative responses, resulting in a tendency for them to assign a higher score to the responses that share higher content similarity with a given context. However, adversarial negative responses possess high content similarity with the contexts whilst being semantically different. Therefore, existing evaluation metrics are not robust enough to evaluate such responses, resulting in low correlations with human judgments. While recent studies have shown some efficacy in utilizing Large Language Models (LLMs) for open-domain dialogue evaluation, they still encounter challenges in effectively handling adversarial negative examples. In this paper, we propose a simple yet effective framework for open-domain dialogue evaluation, which combines domain-specific language models (SLMs) with LLMs. The SLMs can explicitly incorporate Abstract Meaning Representation (AMR) graph information of the dialogue through a gating mechanism for enhanced semantic representation learning. The evaluation result of SLMs and AMR graph information are plugged into the prompt of LLM, for the enhanced in-context learning performance. Experimental results on open-domain dialogue evaluation tasks demonstrate the superiority of our method compared to a wide range of state-of-the-art baselines, especially in discriminating adversarial negative responses. Our code is available at https://github.com/Bernard-Yang/SIMAMR.
△ Less
Submitted 6 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Localized Version of Hypergraph Erdos-Gallai Theorem
Authors:
Kai Zhao,
Xiao-Dong Zhang
Abstract:
This paper focuses on extensions of the classic Erdős-Gallai Theorem for the set of weighted function of each edge in a graph. The weighted function of an edge $e$ of an $n$-vertex uniform hypergraph $\mathcal{H}$ is defined to a special function with respect to the number of edges of the longest Berge path containing $e$. We prove that the summation of the weighted function of all edges is at mos…
▽ More
This paper focuses on extensions of the classic Erdős-Gallai Theorem for the set of weighted function of each edge in a graph. The weighted function of an edge $e$ of an $n$-vertex uniform hypergraph $\mathcal{H}$ is defined to a special function with respect to the number of edges of the longest Berge path containing $e$. We prove that the summation of the weighted function of all edges is at most $n$ for an $n$-vertex uniform hypergraph $\mathcal{H}$ and characterize all extremal hypergraphs that attain the value, which strengthens and extends the hypergraph version of the classic Erdős-Gallai Theorem.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to…
▽ More
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.