-
Asymmetric Kinematics in Young Clusters: The λ Ori Cluster
Authors:
Joseph J. Armstrong,
Jonathan C. Tan
Abstract:
Context. Most stars form in clusters or associations but only a small number of these groups are expected to remain bound for longer than a few Myr. Once star formation has ended and the molecular gas around young stellar objects has been expelled via feedback processes, most initially bound young clusters lose the majority of their binding mass and begin to disperse into the Galactic field. Aims.…
▽ More
Context. Most stars form in clusters or associations but only a small number of these groups are expected to remain bound for longer than a few Myr. Once star formation has ended and the molecular gas around young stellar objects has been expelled via feedback processes, most initially bound young clusters lose the majority of their binding mass and begin to disperse into the Galactic field. Aims. This process can be investigated by analysing the structure and kinematic trends in nearby young clusters, particularly expansion, the tell-tale sign that a cluster is no longer gravitationally bound but is dispersing into the field. Methods. We combine Gaia DR3 5-parameter astrometry with calibrated radial velocities for members of the nearby young cluster λ Ori (Collinder 69). Results. We characterise the plane-of-sky substructure of the cluster using the Q-parameter and Angular Dispersion parameter. We find evidence that the cluster contains significant substructure, but that this is preferentially located away from the central cluster core, which is smooth and likely remains bound. We find strong evidence for expansion in λ Ori in the plane-of-sky using a number of metrics, but also that the trends are asymmetric at the 5σ significance level. with the maximum rate of expansion being directed nearly parallel to the Galactic plane. We then invert the maximum rate of expansion of 0.144^{+0.003}_{-0.003} kms^{-1}pc^{-1} to give an expansion timescale of 6.944^{+0.148}_{-0.142} Myr, which is slightly larger than typical literature age estimates for the cluster. We also find asymmetry in the velocity dispersion, potential signatures of cluster rotation, and calculate kinematic ages for individual cluster members by tracing their motion back in time to their closest approach to the cluster center.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Authors:
Zhenhua Liu,
Tong Zhu,
Chuanyuan Tan,
Wenliang Chen
Abstract:
Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daT…
▽ More
Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate that NAUF achieves a state-of-the-art average unlearning score, surpassing the best baseline method by 5.65 points, effectively protecting target individuals' personal data while maintaining the model's general capabilities.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
The formation of supermassive black holes from Population III.1 seeds. III. Galaxy evolution and black hole growth from semi-analytic modelling
Authors:
Vieri Cammelli,
Pierluigi Monaco,
Jonathan C. Tan,
Jasbir Singh,
Fabio Fontanot,
Gabriella De Lucia,
Michaela Hirschmann,
Lizhi Xie
Abstract:
We present an implementation of Pop III.1 seeding of supermassive black holes (SMBHs) in a theoretical model of galaxy formation and evolution to assess the growth the SMBH population and the properties of the host galaxies. The model of Pop III.1 seeding involves SMBH formation at redshifts $z\gtrsim 20$ in dark matter minihalos that are isolated from external radiative feedback, parameterized by…
▽ More
We present an implementation of Pop III.1 seeding of supermassive black holes (SMBHs) in a theoretical model of galaxy formation and evolution to assess the growth the SMBH population and the properties of the host galaxies. The model of Pop III.1 seeding involves SMBH formation at redshifts $z\gtrsim 20$ in dark matter minihalos that are isolated from external radiative feedback, parameterized by isolation distance $d_{\rm iso}$. Within a standard $Λ$CDM cosmology, we generate dark matter halos using the code \textsc{pinocchio} and seed them according to the Pop III.1 scenario, exploring values of $d_{\rm iso}$ from 50 to 100~kpc (proper distance). We consider two alternative cases of SMBH seeding: a Halo Mass Threshold (HMT) model in which all halos $>7\times10^{10}\:M_\odot$ are seeded with $\sim 10^5\:M_\odot$ black holes; an All Light Seed (ALS) model in which all halos are seeded with low, stellar-mass black holes. We follow the redshift evolution of the halos, populating them with galaxies using the GAlaxy Evolution and Assembly theoretical model of galaxy formation, including accretion on SMBHs and related feedback processes. Here we present predictions for the properties of galaxy populations, focusing on stellar masses, star formation rates, and black hole masses. The local, $z\sim0$ metrics of occupation fraction as a function of the galaxy stellar mass, galaxy stellar mass function (GSMF), and black hole mass function (BHMF) all suggest a constraint of $d_{\rm iso}<75\:$kpc. We discuss the implications of this result for the Pop III.1 seeding mechanism.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification
Authors:
Chen Mao,
Chong Tan,
Jingqi Hu,
Min Zheng
Abstract:
Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of rout…
▽ More
Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
The discovery of a nearby 421~s transient with CHIME/FRB/Pulsar
Authors:
Fengqiu Adam Dong,
Tracy Clarke,
Alice P. Curtin,
Ajay Kumar,
Ingrid Stairs,
Shami Chatterjee,
Amanda M. Cook,
Emmanuel Fonseca,
B. M. Gaensler,
Jason W. T. Hessels,
Victoria M. Kaspi,
Mattias Lazda,
Kiyoshi W. Masui,
James W. McKee,
Bradley W. Meyers,
Aaron B. Pearlman,
Scott M. Ransom,
Paul Scholz,
Kaitlyn Shin,
Kendrick M. Smith,
Chia Min Tan
Abstract:
Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio puls…
▽ More
Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio pulsars and magnetars. However, they pulse on timescales (minutes) much longer than previously seen. While minute timescales are common rotation periods for white dwarfs, LPTs are much brighter than the known pulsating white dwarfs, and dipolar radiation from isolated (as opposed to binary) magnetic white dwarfs has yet to be observed. Here, we report the discovery of a new $\sim$421~s LPT, CHIME J0630+25, using the CHIME/FRB and CHIME/Pulsar instruments. We used standard pulsar timing techniques and obtained a phase-coherent timing solution which yielded limits on the inferred magnetic field and characteristic age. CHIME J0630+25 is remarkably nearby ($170 \pm 80$~pc), making it the closest LPT discovered to date.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Synthetic Test Data Generation Using Recurrent Neural Networks: A Position Paper
Authors:
Razieh Behjati,
Erik Arisholm,
Chao Tan,
Margrethe M. Bedregal
Abstract:
Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the…
▽ More
Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the public sectors, are not allowed to use production data for testing purposes due to privacy concerns. The alternatives are to use anonymized data, or synthetically generated data. In this paper, we elaborate on these alternatives and compare them in an industrial context. Further we focus on synthetic data generation and investigate the use of recurrent neural networks for this purpose. In our preliminary experiments, we were able to generate representative and highly accurate data using a recurrent neural network. These results open new research questions that we discuss here, and plan to investigate in our future research.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Authors:
Md Tahmid Rahman Laskar,
Sawsan Alqahtani,
M Saiful Bari,
Mizanur Rahman,
Mohammad Abdullah Matin Khan,
Haidar Khan,
Israt Jahan,
Amran Bhuiyan,
Chee Wei Tan,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty,
Jimmy Huang
Abstract:
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple…
▽ More
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing
Authors:
Bo Ai,
Stephen Tian,
Haochen Shi,
Yixuan Wang,
Cheston Tan,
Yunzhu Li,
Jiajun Wu
Abstract:
Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network…
▽ More
Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states, including particles and object-level latent physics information, from historical visuo-tactile observations and to perform future state predictions. Our tactile-informed dynamics model, learned from real-world data, can solve downstream robotics tasks with model-predictive control. We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks, where the robot must infer the physics properties of objects from direct and indirect interactions. Trained on only an average of 30 minutes of real-world interaction data per task, our model can perform online adaptation and make touch-informed predictions. Through extensive evaluations in both long-horizon dynamics prediction and real-world manipulation, our method demonstrates superior effectiveness compared to previous learning-based and physics-based simulation systems.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
FoldToken2: Learning compact, invariant and generative protein structure language
Authors:
Zhangyang Gao,
Cheng Tan,
Stan Z. Li
Abstract:
The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structure…
▽ More
The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structures. From FoldToken1 to FoldToken2, we improve three key components: (1) invariant structure encoder, (2) vector-quantized compressor, and (3) equivalent structure decoder. We evaluate FoldToken2 on the protein structure reconstruction task and show that it outperforms previous FoldToken1 by 20\% in TMScore and 81\% in RMSD. FoldToken2 probably be the first method that works well on both single-chain and multi-chain protein structures quantization. We believe that FoldToken2 will inspire further improvement in protein structure representation learning, structure alignment, and structure generation tasks.
△ Less
Submitted 11 June, 2024;
originally announced July 2024.
-
Bipolarized Weyl semimetals and quantum crystal valley Hall effect in two-dimensional altermagnetic materials
Authors:
Chao-Yang Tan,
Ze-Feng Gao,
Huan-Cheng Yang,
Kai Liu,
Peng-Jie Guo,
Zhong-Yi Lu
Abstract:
Magnetism and topology are two major areas of condensed matter physics. The combination of magnetism and topology gives rise to more novel physical effects, which have attracted strongly theoretical and experimental attention. Recently, the concept of altermagnetism has been introduced, characterized by a dual nature: real-space antiferromagnetism and reciprocal-space anisotropic spin polarization…
▽ More
Magnetism and topology are two major areas of condensed matter physics. The combination of magnetism and topology gives rise to more novel physical effects, which have attracted strongly theoretical and experimental attention. Recently, the concept of altermagnetism has been introduced, characterized by a dual nature: real-space antiferromagnetism and reciprocal-space anisotropic spin polarization. The amalgamation of altermagnetism with topology may lead to the emergence of previously unobserved topological phases and the associated physical effects. In this study, utilizing a four-band lattice model that incorporates altermagnetism and spin group symmetry, we demonstrate that type-I, type-II, and type-III bipolarized Weyl semimetals can exist in altermagnetic systems. Through the first-principles electronic structure calculations, we predict four ideal two-dimensional type-I altermagnetic bipolarized Weyl semimetals Fe$_2$WTe$_4$ and Fe$_2$MoZ$_4$ (Z=S,Se,Te). More significantly, we introduce the quantum crystal valley Hall effect, a phenomenon achievable in three of these materials namely Fe$_2$WTe$_4$, Fe$_2$MoS$_4$, and Fe$_2$MoTe$_4$, when spin-orbit coupling is considered. Furthermore, these materials have the potential to transition from a quantum crystal valley Hall phase to a Chern insulator phase under strain. In contrast, Fe$_2$MoSe$_4$ remains to be a Weyl semimetal under spin-orbit coupling but is distinguished by possessing only a single pair of Weyl points. Additionally, the position, polarization, and number of Weyl points in Fe$_2$WTe$_4$ and Fe$_2$MoZ$_4$ can be manipulated by adjusting the direction of the Néel vector. Consequently, Fe$_2$WTe$_4$ and Fe$_2$MoZ$_4$ emerge as promising experimental platforms for investigating the distinctive physical attributes of various altermagnetic topological phases.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Fermilab Booster Beam Emittances from Quadrupole Modes Measured by BPMs
Authors:
C. Y. Tan,
M. Balcewicz
Abstract:
The measurement of beam emittances by extracting the quadrupole mode signal from a 4 plate beam position monitor (BPM) was published at least 40 years ago. Unfortunately, in practice, this method suffers from poor signal to noise ratio and requires a lot of tuning to extract out the emittances. In this paper, an improved method where multiple BPMs are used together with better mathematical analysi…
▽ More
The measurement of beam emittances by extracting the quadrupole mode signal from a 4 plate beam position monitor (BPM) was published at least 40 years ago. Unfortunately, in practice, this method suffers from poor signal to noise ratio and requires a lot of tuning to extract out the emittances. In this paper, an improved method where multiple BPMs are used together with better mathematical analysis is described. The BPM derived emittances are then compared with those measured by the Ion Profile Monitor (IPM). Surprisingly, the BPM measured emittances behave very well and are more realistic than those measured by the IPM.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Learning to Transfer for Evolutionary Multitasking
Authors:
Sheng-Hao Wu,
Yuxiao Huang,
Xingyu Wu,
Liang Feng,
Zhi-Hui Zhan,
Kay Chen Tan
Abstract:
Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe…
▽ More
Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited number of evolution operators and insufficient utilization of evolutionary states for performing KT. This results in suboptimal exploitation of implicit KT's potential to tackle a variety of MTOPs. To overcome these limitations, we propose a novel Learning to Transfer (L2T) framework to automatically discover efficient KT policies for the MTOPs at hand. Our framework conceptualizes the KT process as a learning agent's sequence of strategic decisions within the EMT process. We propose an action formulation for deciding when and how to transfer, a state representation with informative features of evolution states, a reward formulation concerning convergence and transfer efficiency gain, and the environment for the agent to interact with MTOPs. We employ an actor-critic network structure for the agent and learn it via proximal policy optimization. This learned agent can be integrated with various evolutionary algorithms, enhancing their ability to address a range of new MTOPs. Comprehensive empirical studies on both synthetic and real-world MTOPs, encompassing diverse inter-task relationships, function classes, and task distributions are conducted to validate the proposed L2T framework. The results show a marked improvement in the adaptability and performance of implicit EMT when solving a wide spectrum of unseen MTOPs.
△ Less
Submitted 22 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Connected Vehicle Data-driven Robust Optimization for Traffic Signal Timing: Modeling Traffic Flow Variability and Errors
Authors:
Chaopeng Tan,
Yue Ding,
Kaidi Yang,
Hong Zhu,
Keshuang Tang
Abstract:
Recent advancements in Connected Vehicle (CV) technology have prompted research on leveraging CV data for more effective traffic management. Despite the low penetration rate, such detailed CV data has demonstrated great potential in improving traffic signal performance. However, existing studies share a common shortcoming in that they all ignore traffic flow estimation errors in their modeling pro…
▽ More
Recent advancements in Connected Vehicle (CV) technology have prompted research on leveraging CV data for more effective traffic management. Despite the low penetration rate, such detailed CV data has demonstrated great potential in improving traffic signal performance. However, existing studies share a common shortcoming in that they all ignore traffic flow estimation errors in their modeling process, which is inevitable due to the sampling observation nature of CVs. This study proposes a CV data-driven robust optimization framework for traffic signal timing accounting for both traffic flow variability and estimation errors. First, we propose a general CV data-driven optimization model that can be widely applied to various signalized intersection scenarios including under-/over-saturated and fixed-/real-time. Then, we propose a novel data-driven uncertainty set of arrival rates based on the bounds information derived from CVs, which circumvents the error-prone arrival rate estimation process. Finally, a CV data-driven robust optimization model (CV-RO) is formulated to explicitly handle arrival rate uncertainties. By means of the robust counterpart approach, this robust optimization problem can be equalized to a deterministic mixed-integer linear programming problem with an exact solution. The evaluation results highlight the superior performance of the CV-RO model compared to the deterministic model and traditional methods across various scenarios: different penetration rates, traffic demands, and control types. Notably, the CV-RO model demonstrates its excellence at lower CV penetration rates and in the presence of different traffic flow fluctuation levels, affirming its effectiveness and robustness.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Tactile Aware Dynamic Obstacle Avoidance in Crowded Environment with Deep Reinforcement Learning
Authors:
Yung Chuen Ng,
Qi Wen,
Lim,
Chun Ye Tan,
Zhen Hao Gan,
Meng Yee,
Chuah
Abstract:
Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile laye…
▽ More
Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile layer to complement the use of a LiDAR for the purpose of inducing awareness of contact with any surrounding objects within immediate vicinity of a mobile robot undetected by LiDARs. By incorporating the tactile layer, the robot can take more risks in its movements and possibly go right up to an obstacle or wall, and gently squeeze past it. In addition, we built up a simulation platform via Pybullet which integrates Robot Operating System (ROS) and reinforcement learning (RL) together. A touch-aware neural network model was trained on it to create an RL-based local path planner for dynamic obstacle avoidance. Our proposed method was demonstrated successfully on an omni-directional mobile robot who was able to navigate in a crowded environment with high agility and versatility in movement, while not being overly sensitive to nearby obstacles-not-in-contact.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Towards a Client-Centered Assessment of LLM Therapists by Client Simulation
Authors:
Jiashuo Wang,
Yang Xiao,
Yanran Li,
Changhe Song,
Chunpu Xu,
Chenhao Tan,
Wenjie Li
Abstract:
Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to a…
▽ More
Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to assess LLM therapists at scale. Ethically, asking humans to frequently mimic clients and exposing them to potentially harmful LLM outputs can be risky and unsafe. Technically, it can be difficult to consistently compare the performances of different LLM therapists interacting with the same client. To this end, we adopt LLMs to simulate clients and propose ClientCAST, a client-centered approach to assessing LLM therapists by client simulation. Specifically, the simulated client is utilized to interact with LLM therapists and complete questionnaires related to the interaction. Based on the questionnaire results, we assess LLM therapists from three client-centered aspects: session outcome, therapeutic alliance, and self-reported feelings. We conduct experiments to examine the reliability of ClientCAST and use it to evaluate LLMs therapists implemented by Claude-3, GPT-3.5, LLaMA3-70B, and Mixtral 8*7B. Codes are released at https://github.com/wangjs9/ClientCAST.
△ Less
Submitted 20 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Authors:
Haitao Lin,
Guojiang Zhao,
Odin Zhang,
Yufei Huang,
Lirong Wu,
Zicheng Liu,
Siyuan Li,
Cheng Tan,
Zhifeng Gao,
Stan Z. Li
Abstract:
Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso…
▽ More
Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair comparisons and inconclusive insights. To address this dilemma, we propose CBGBench, a comprehensive benchmark for SBDD, that unifies the task as a generative heterogeneous graph completion, analogous to fill-in-the-blank of the 3D complex binding graph. By categorizing existing methods based on their attributes, CBGBench facilitates a modular and extensible framework that implements various cutting-edge methods. Secondly, a single task on \textit{de novo} molecule generation can hardly reflect their capabilities. To broaden the scope, we have adapted these models to a range of tasks essential in drug design, which are considered sub-tasks within the graph fill-in-the-blank tasks. These tasks include the generative designation of \textit{de novo} molecules, linkers, fragments, scaffolds, and sidechains, all conditioned on the structures of protein pockets. Our evaluations are conducted with fairness, encompassing comprehensive perspectives on interaction, chemical properties, geometry authenticity, and substructure validity. We further provide the pre-trained versions of the state-of-the-art models and deep insights with analysis from empirical studies. The codebase for CBGBench is publicly accessible at \url{https://github.com/Edapinenut/CBGBench}.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Massive Dirac Fermions and Strong Shubnikov-de Haas Oscillations in Topological Insulator Sm,Fe:Bi2Se3 Single Crystals
Authors:
Weiyao Zhao,
Chi Xuan Trang,
Qile Li,
Lei Chen,
Zengji Yue,
Abdulhakim Bake,
Cheng Tan,
Lan Wang,
Mitchell Nancarrow,
Mark Edmonds,
David Cortie,
Xiaolin Wang
Abstract:
Topological insulators (TIs) are emergent materials with unique band structure, which allow the study of quantum effect in solids, as well as contribute to high performance quantum devices. To achieve the better performance of TI, here we present a co-doping strategy using synergistic rare-earth Sm and transition-metal Fe dopants in Bi2Se3 single crystals, which combine the advantages of both tran…
▽ More
Topological insulators (TIs) are emergent materials with unique band structure, which allow the study of quantum effect in solids, as well as contribute to high performance quantum devices. To achieve the better performance of TI, here we present a co-doping strategy using synergistic rare-earth Sm and transition-metal Fe dopants in Bi2Se3 single crystals, which combine the advantages of both transition metal doped TI (high ferromagnetic ordering temperature and observed QAHE), and rare-earth doped TI (large magnetic moments and significant spin orbit coupling). In the as-grown single crystals, clear evidences of ferromagnetic ordering were observed. The angle resolve photoemission spectroscopy indicate the ferromagnetism opens a 44 meV band gap at surface Dirac point. Moreover, the carrier mobility at 3 K is ~ 7400 cm2/Vs, and we thus observed an ultra-strong Shubnikov-de Haas oscillation in the longitudinal resistivity, as well as the Hall steps in transverse resistivity below 14 T. Our transport and angular resolved photoemission spectroscopy results suggest that the rare-earth and transition metal co-doping in Bi2Se3 system is a promising avenue implement the quantum anomalous Hall effect, as well as harnessing the massive Dirac fermion in electrical devices.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Towards Next Era of Multi-objective Optimization: Large Language Models as Architects of Evolutionary Operators
Authors:
Yuxiao Huang,
Shenghao Wu,
Wenjie Zhang,
Jibin Wu,
Liang Feng,
Kay Chen Tan
Abstract:
Multi-objective optimization problems (MOPs) are prevalent in various real-world applications, necessitating sophisticated solutions that balance conflicting objectives. Traditional evolutionary algorithms (EAs), while effective, often rely on domain-specific expert knowledge and iterative tuning, which can impede innovation when encountering novel MOPs. Very recently, the emergence of Large Langu…
▽ More
Multi-objective optimization problems (MOPs) are prevalent in various real-world applications, necessitating sophisticated solutions that balance conflicting objectives. Traditional evolutionary algorithms (EAs), while effective, often rely on domain-specific expert knowledge and iterative tuning, which can impede innovation when encountering novel MOPs. Very recently, the emergence of Large Language Models (LLMs) has revolutionized software engineering by enabling the autonomous development and refinement of programs. Capitalizing on this advancement, we propose a new LLM-based framework for evolving EA operators, designed to address a wide array of MOPs. This framework facilitates the production of EA operators without the extensive demands for expert intervention, thereby streamlining the design process. To validate the efficacy of our approach, we have conducted extensive empirical studies across various categories of MOPs. The results demonstrate the robustness and superior performance of our LLM-evolved operators.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
Authors:
Cheng Tan,
Dongxin Lyu,
Siyuan Li,
Zhangyang Gao,
Jingxuan Wei,
Siqi Ma,
Zicheng Liu,
Stan Z. Li
Abstract:
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r…
▽ More
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
The Impossibility of Fair LLMs
Authors:
Jacy Anthis,
Kristian Lum,
Michael Ekstrand,
Avi Feller,
Alexander D'Amour,
Chenhao Tan
Abstract:
The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness,…
▽ More
The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness, such as group fairness and fair representations, and find that their application to LLMs faces inherent limitations. We show that each framework either does not logically extend to LLMs or presents a notion of fairness that is intractable for LLMs, primarily due to the multitudes of populations affected, sensitive attributes, and use cases. To address these challenges, we develop guidelines for the more realistic goal of achieving fairness in particular use cases: the criticality of context, the responsibility of LLM developers, and the need for stakeholder participation in an iterative process of design and evaluation. Moreover, it may eventually be possible and even necessary to use the general-purpose capabilities of AI systems to address fairness challenges as a form of scalable AI-assisted alignment.
△ Less
Submitted 28 May, 2024;
originally announced June 2024.
-
On the Limitations of Fractal Dimension as a Measure of Generalization
Authors:
Charlie Tan,
Inés García-Redondo,
Qiquan Wang,
Michael M. Bronstein,
Anthea Monod
Abstract:
Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persi…
▽ More
Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persistent homology dimension have been proposed to correlate with generalization gap, thus serving as a measure of generalization. This work performs an extended evaluation of these topological generalization measures. We demonstrate that fractal dimension fails to predict generalization of models trained from poor initializations. We further identify that the $\ell^2$ norm of the final parameter iterate, one of the simplest complexity measures in learning theory, correlates more strongly with the generalization gap than these notions of fractal dimension. Finally, our study reveals the intriguing manifestation of model-wise double descent in persistent homology-based generalization measures. This work lays the ground for a deeper investigation of the causal relationships between fractal geometry, topological data analysis, and neural network optimization.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models
Authors:
Zicheng Liu,
Jiahui Li,
Siyuan Li,
Zelin Zang,
Cheng Tan,
Yufei Huang,
Yajing Bai,
Stan Z. Li
Abstract:
The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and…
▽ More
The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and reproducibility challenges. In the absence of standardization, comparative analyses risk becoming biased and unreliable. To surmount this impasse, we introduce GenBench, a comprehensive benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. Through systematic evaluations of datasets spanning diverse biological domains with a particular emphasis on both short-range and long-range genomic tasks, firstly including the three most important DNA tasks covering Coding Region, Non-Coding Region, Genome Structure, etc. Moreover, We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance. Our findings reveal an interesting observation: independent of the number of parameters, the discernible difference in preference between the attention-based and convolution-based models on short- and long-range tasks may provide insights into the future design of GFM.
△ Less
Submitted 5 June, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Probing Language Models for Pre-training Data Detection
Authors:
Zhenhua Liu,
Tong Zhu,
Chuanyuan Tan,
Haonan Lu,
Bing Liu,
Wenliang Chen
Abstract:
Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perp…
▽ More
Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perplexities, which are superficial features and not reliable. In this study, we propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection. Additionally, we propose ArxivMIA, a new challenging benchmark comprising arxiv abstracts from Computer Science and Mathematics categories. Our experiments demonstrate that our method outperforms all baselines, and achieves state-of-the-art performance on both WikiMIA and ArxivMIA, with additional experiments confirming its efficacy (Our code and dataset are available at https://github.com/zhliu0106/probing-lm-data).
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Authors:
Cheng Tan,
Jingxuan Wei,
Linzhuang Sun,
Zhangyang Gao,
Siyuan Li,
Bihui Yu,
Ruifeng Guo,
Stan Z. Li
Abstract:
Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of…
▽ More
Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of multimodal RAG is to cultivate the models' ability to reason in response to relevant queries. To this end, we introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning). The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs, which then serve as scaffolds for the multimodal reasoning process. This training-free approach not only encourages the model to engage deeply with the reasoning processes inherent in the retrieved content but also facilitates the generation of answers that are precise and richly interpretable. Surprisingly, utilizing solely the ScienceQA dataset, collected from elementary and high school science curricula, RMR significantly boosts the performance of various vision-language models across a spectrum of benchmark datasets, including A-OKVQA, MMBench, and SEED. These outcomes highlight the substantial potential of our multimodal retrieval and reasoning mechanism to improve the reasoning capabilities of vision-language models.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
UniIF: Unified Molecule Inverse Folding
Authors:
Zhangyang Gao,
Jue Wang,
Cheng Tan,
Lirong Wu,
Yufei Huang,
Siyuan Li,
Zhirui Ye,
Stan Z. Li
Abstract:
Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, su…
▽ More
Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, such as RoseTTAFold All-Atom and AlphaFold3, we propose the unified model UniIF for the inverse folding of all molecules. We do such unification in two levels: 1) Data-Level: We propose a unified block graph data form for all molecules, including the local frame building and geometric feature initialization. 2) Model-Level: We introduce a geometric block attention network, comprising a geometric interaction, interactive attention and virtual long-term dependency modules, to capture the 3D interactions of all molecules. Through comprehensive evaluations across various tasks such as protein design, RNA design, and material design, we demonstrate that our proposed method surpasses state-of-the-art methods on all tasks. UniIF offers a versatile and effective solution for general molecule inverse folding.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Discovery and follow-up of a quasiperiodically nulling and sub-pulse drifting pulsar with the Murchison Widefield Array
Authors:
G. Grover,
N. D. R. Bhat,
S. McSweeney,
C. P. Lee,
B. W. Meyers,
C. M. Tan,
S. S. Kudale
Abstract:
The phenomenon of pulsar nulling, where pulsars temporarily and stochastically cease their radio emission, is thought to be indicative of a `dying' pulsar, where radio emission ceases entirely. Here we report the discovery of a long-period pulsar, PSR J0452-3418, from the ongoing Southern-sky MWA Rapid Two-meter (SMART) pulsar survey. The pulsar has a rotation period of ${\sim}$1.67\,s and a dispe…
▽ More
The phenomenon of pulsar nulling, where pulsars temporarily and stochastically cease their radio emission, is thought to be indicative of a `dying' pulsar, where radio emission ceases entirely. Here we report the discovery of a long-period pulsar, PSR J0452-3418, from the ongoing Southern-sky MWA Rapid Two-meter (SMART) pulsar survey. The pulsar has a rotation period of ${\sim}$1.67\,s and a dispersion measure of 19.8\,\dmu, and it exhibits both quasi-periodic nulling and sub-pulse drifting. Periodic nulling is uncommon, only reported in $<1$\% of the pulsar population, with even a smaller fraction showing periodic nulling and sub-pulse drifting. We describe the discovery and follow-up of the pulsar, including a positional determination using high-resolution imaging with the upgraded Giant Metrewave Radio Telescope (uGMRT), initial timing analysis using the combination of MWA and uGMRT data, and detailed characterisation of the nulling and drifting properties in the MWA's frequency band (140-170\,MHz). Our analysis suggests a nulling fraction of 34$\pm6$\% and a nulling periodicity of 42$^{+1.5}_{-1.3}$ pulses. We measure the phase ($P_2$) and time modulation ($P_3$) caused by the sub-pulse drifting, with an average $P_2$ of 7.1$^{+26.3}_{-3.1}$ degrees and a $P_3$ of 4.8$^{+1.5}_{-0.9}$ pulses. We compare and contrast the observed properties with those of other pulsars that exhibit sub-pulse drifting and quasi-periodic nulling phenomena, and find that the majority of these objects tend to be in the `death valley' in the period-period derivative ($P$-$\dot{P}$) diagram. We also discuss some broader implications for pulsar emission physics and the detectability of similar objects using next-generation pulsar surveys.
△ Less
Submitted 28 May, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models
Authors:
Zhenzhong Wang,
Zehui Lin,
Wanyu Lin,
Ming Yang,
Minggang Zeng,
Kay Chen Tan
Abstract:
Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop…
▽ More
Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a new framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We first leverage a designated molecular representation -- the Group SELFIES -- as it can provide chemically meaningful semantics. Because attention mechanisms in Transformers can inherently capture relationships within the input, we further incorporate the attention weights and gradients together to generate explanations for capturing the functional group interactions. We then carefully craft a marginal loss to explicitly optimize the explanations to be able to align with the chemists' annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over six mutagenicity datasets and one hepatotoxicity dataset demonstrate Lamole can achieve comparable classification accuracy and boost the explanation accuracy by up to 14.8%, being the state-of-the-art in explainable molecular property prediction.
△ Less
Submitted 31 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Fast 3D Molecule Generation via Unified Geometric Optimal Transport
Authors:
Haokai Hong,
Wanyu Lin,
Kay Chen Tan
Abstract:
This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data dist…
▽ More
This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a unified, equivalent, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivalent networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a flow refinement and purification mechanism for optimal coupling identification. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to a unified optimal transport path for fast 3D molecule generation. The purification filters the subpar molecules to ensure the ultimate generation performance. We theoretically prove the proposed method indeed reduced the transport cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Aya 23: Open Weight Releases to Further Multilingual Progress
Authors:
Viraat Aryabumi,
John Dang,
Dwarak Talupuru,
Saurabh Dash,
David Cairuz,
Hangyu Lin,
Bharat Venkitesh,
Madeline Smith,
Jon Ander Campos,
Yi Chern Tan,
Kelly Marchisio,
Max Bartolo,
Sebastian Ruder,
Acyr Locatelli,
Julia Kreutzer,
Nick Frosst,
Aidan Gomez,
Phil Blunsom,
Marzieh Fadaee,
Ahmet Üstün,
Sara Hooker
Abstract:
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin…
▽ More
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.
△ Less
Submitted 31 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Super Tiny Language Models
Authors:
Dylan Hillier,
Leon Guertler,
Cheston Tan,
Palaash Agrawal,
Chen Ruirui,
Bobby Cheng
Abstract:
The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovativ…
▽ More
The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovative techniques such as byte-level tokenization with a pooling mechanism, weight tying, and efficient training strategies. These methods aim to significantly reduce reduce the parameter count compared to traditional models -- in future works, we aim to build on these in a way that maintains and improves upon the performance of base transformer models. This series of papers will explore into various subproblems, including tokenizer-free models, self-play based training, and alternative training objectives. We will target models with 10M, 50M, and 100M parameters. Our ultimate goal is to make high-performance language models more accessible and practical for a wide range of applications.
△ Less
Submitted 26 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection
Authors:
Xingyu Wu,
Yan Zhong,
Jibin Wu,
Yuxiao Huang,
Sheng-hao Wu,
Kay Chen Tan
Abstract:
In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios r…
▽ More
In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios remain unclear. In this paper, we address this gap by proposing the first provable guarantee for algorithm selection based on algorithm features, taking a generalization perspective. We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors. Specifically, we examine adaptive and predefined algorithm features under transductive and inductive learning paradigms, respectively, and derive upper bounds for the generalization error based on their model's Rademacher complexity. Our theoretical findings not only provide tight upper bounds, but also offer analytical insights into the impact of various factors, such as the training scale of problem instances and candidate algorithms, model parameters, feature values, and distributional differences between the training and test data. Notably, we demonstrate how models will benefit from algorithm features in complex scenarios involving many algorithms, and proves the positive correlation between generalization error bound and $χ^2$-divergence of distributions.
△ Less
Submitted 3 June, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
Authors:
Siyuan Li,
Zedong Wang,
Zicheng Liu,
Di Wu,
Cheng Tan,
Jiangbin Zheng,
Yufei Huang,
Stan Z. Li
Abstract:
Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of…
▽ More
Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of genomic data. In this paper, we introduce VQDNA, a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings in an end-to-end manner. To further push its limits, we propose Hierarchical Residual Quantization (HRQ), where varying scales of codebooks are designed in a hierarchy to enrich the genome vocabulary in a coarse-to-fine manner. Extensive experiments on 32 genome datasets demonstrate VQDNA's superiority and favorable parameter efficiency compared to existing genome language models. Notably, empirical analysis of SARS-CoV-2 mutations reveals the fine-grained pattern awareness and biological significance of learned HRQ vocabulary, highlighting its untapped potential for broader applications in genomics.
△ Less
Submitted 2 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Entanglement parity effects in the Kane-Fisher problem
Authors:
Chunyu Tan,
Yuxiao Hang,
Stephan Haas,
Hubert Saleur
Abstract:
We study the entanglement of a segment of length $\ell$ in an XXZ chain with one free extremity and the other connected to the rest of the system with a weak bond. We find that the von-Neumann entropy exhibits terms of order $O(1)$ with strong parity effects, that probe the physics associated with the weakened bond and its behavior under the RG (Kane Fisher problem). In contrast with the XX case s…
▽ More
We study the entanglement of a segment of length $\ell$ in an XXZ chain with one free extremity and the other connected to the rest of the system with a weak bond. We find that the von-Neumann entropy exhibits terms of order $O(1)$ with strong parity effects, that probe the physics associated with the weakened bond and its behavior under the RG (Kane Fisher problem). In contrast with the XX case studied previously the entropy difference $δS\equiv S^e-S^o$ gives rise now to a "resonance" curve which depends on the product $\ell T_B$, with $1/T_B$ a characteristic length scale akin to the Kondo length in Kondo problems. The problem is studied both numerically using DMRG and analytically near the healed and split fixed points. Interestingly - and in contrast with what happens in other impurity problems- $δS$ can, at least at lowest order, be tackled by conformal perturbation theory.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark
Authors:
Mengsong Wu,
Tong Zhu,
Han Han,
Chuanyuan Tan,
Xiang Zhang,
Wenliang Chen
Abstract:
This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control…
▽ More
This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://github.com/fairyshine/Seal-Tools .
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Singular Integrals associated with Reflection Groups on Euclidean Space
Authors:
Yongsheng Han,
Ji Li,
Chaoqiang Tan,
Zipeng Wang,
Xinfeng Wu
Abstract:
In the field of harmonic analysis, geometric considerations are frequently crucial. Specially, group actions such as translations, dilations and rotations on Euclidean space are instrumental. The objective of this paper is to extend the study of singular integrals to include the effects of group reflections on Euclidean space, and to establish the T1 theorem for these singular integrals.
In the field of harmonic analysis, geometric considerations are frequently crucial. Specially, group actions such as translations, dilations and rotations on Euclidean space are instrumental. The objective of this paper is to extend the study of singular integrals to include the effects of group reflections on Euclidean space, and to establish the T1 theorem for these singular integrals.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Van der Waals Magnetic Electrode Transfer for Two-Dimensional Spintronic Devices
Authors:
Zhongzhong Luo,
Zhihao Yu,
Xiangqian Lu,
Wei Niu,
Yao Yu,
Yu Yao,
Fuguo Tian,
Chee Leong Tan,
Huabin Sun,
Li Gao,
Wei Qin,
Yong Xu,
Qiang Zhao,
Xiang-Xiang Song
Abstract:
Two-dimensional (2D) materials are promising candidates for spintronic applications. Maintaining their atomically smooth interfaces during integration of ferromagnetic (FM) electrodes is crucial since conventional metal deposition tends to induce defects at the interfaces. Meanwhile, the difficulties in picking up FM metals with strong adhesion and in achieving conductance match between FM electro…
▽ More
Two-dimensional (2D) materials are promising candidates for spintronic applications. Maintaining their atomically smooth interfaces during integration of ferromagnetic (FM) electrodes is crucial since conventional metal deposition tends to induce defects at the interfaces. Meanwhile, the difficulties in picking up FM metals with strong adhesion and in achieving conductance match between FM electrodes and spin transport channels make it challenging to fabricate high-quality 2D spintronic devices using metal transfer techniques. Here, we report a solvent-free magnetic electrode transfer technique that employs a graphene layer to assist in the transfer of FM metals. It also serves as part of the FM electrode after transfer for optimizing spin injection, which enables the realization of spin valves with excellent performance based on various 2D materials. In addition to two-terminal devices, we demonstrate that the technique is applicable for four-terminal spin valves with nonlocal geometry. Our results provide a promising future of realizing 2D spintronic applications using the developed magnetic electrode transfer technique.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization
Authors:
Zeyi Wang,
Songbai Liu,
Jianyong Chen,
Kay Chen Tan
Abstract:
Evolutionary algorithms excel in solving complex optimization problems, especially those with multiple objectives. However, their stochastic nature can sometimes hinder rapid convergence to the global optima, particularly in scenarios involving constraints. In this study, we employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization prob…
▽ More
Evolutionary algorithms excel in solving complex optimization problems, especially those with multiple objectives. However, their stochastic nature can sometimes hinder rapid convergence to the global optima, particularly in scenarios involving constraints. In this study, we employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization problems. Our aim is to speed up the convergence of the evolutionary population. To achieve this, we finetune the LLM through tailored prompt engineering, integrating information concerning both objective values and constraint violations of solutions. This process enables the LLM to grasp the relationship between well-performing and poorly performing solutions based on the provided input data. Solution's quality is assessed based on their constraint violations and objective-based performance. By leveraging the refined LLM, it can be used as a search operator to generate superior-quality solutions. Experimental evaluations across various test benchmarks illustrate that LLM-aided evolutionary search can significantly accelerate the population's convergence speed and stands out competitively against cutting-edge evolutionary algorithms.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
"Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies
Authors:
Brennan Schaffner,
Arjun Nitin Bhagoji,
Siyuan Cheng,
Jacqueline Mei,
Jay L. Shen,
Grace Wang,
Marshini Chetty,
Nick Feamster,
Genevieve Lakier,
Chenhao Tan
Abstract:
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t…
▽ More
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and topics. This paper presents the first systematic study of these policies from the 43 largest online platforms hosting user-generated content, focusing on policies around copyright infringement, harmful speech, and misleading content. We build a custom web-scraper to obtain policy text and develop a unified annotation scheme to analyze the text for the presence of critical components. We find significant structural and compositional variation in policies across topics and platforms, with some variation attributable to disparate legal groundings. We lay the groundwork for future studies of ever-evolving content moderation policies and their impact on users.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Graph-Based Multivariate Multiscale Dispersion Entropy: Efficient Implementation and Applications to Real-World Network Data
Authors:
John Stewart Fabila-Carrasco,
Chao Tan,
Javier Escudero
Abstract:
We introduce Multivariate Multiscale Graph-based Dispersion Entropy (mvDEG), a novel, computationally efficient method for analyzing multivariate time series data in graph and complex network frameworks, and demonstrate its application in real-world data. mvDEG effectively combines temporal dynamics with topological relationships, offering enhanced analysis compared to traditional nonlinear entrop…
▽ More
We introduce Multivariate Multiscale Graph-based Dispersion Entropy (mvDEG), a novel, computationally efficient method for analyzing multivariate time series data in graph and complex network frameworks, and demonstrate its application in real-world data. mvDEG effectively combines temporal dynamics with topological relationships, offering enhanced analysis compared to traditional nonlinear entropy methods. Its efficacy is established through testing on synthetic signals, such as uncorrelated and correlated noise, showcasing its adeptness in discerning various levels of dependency and complexity.
The robustness of mvDEG is further validated with real-world datasets, effectively differentiating various two-phase flow regimes and capturing distinct dynamics in weather data analysis. An important advancement of mvDEG is its computational efficiency. Our optimized algorithm displays a computational time that grows linearly with the number of vertices or nodes, in contrast to the exponential growth observed in classical methods. This efficiency is achieved through refined matrix power calculations that exploit matrix and Kronecker product properties, making our method faster than the state of the art. The significant acceleration in computational time positions mvDEG as a transformative tool for extensive and real-time applications, setting a new benchmark in the analysis of time series recorded at distributed locations and opening avenues for innovative applications.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Towards Generalist Robot Learning from Internet Video: A Survey
Authors:
Robert McCarthy,
Daniel C. H. Tan,
Dominik Schmidt,
Fernando Acero,
Nathan Herr,
Yilun Du,
Thomas G. Thuruthel,
Zhibin Li
Abstract:
This survey presents an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large internet video datasets and, in the process, extracting foundational knowledge about the world's dynamics and physical human behaviour. Such methods hold great promise for developing general-purpose robots.
We open w…
▽ More
This survey presents an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large internet video datasets and, in the process, extracting foundational knowledge about the world's dynamics and physical human behaviour. Such methods hold great promise for developing general-purpose robots.
We open with an overview of fundamental concepts relevant to the LfV-for-robotics setting. This includes a discussion of the exciting benefits LfV methods can offer (e.g., improved generalization beyond the available robot data) and commentary on key LfV challenges (e.g., missing information in video and LfV distribution shifts). Our literature review begins with an analysis of video foundation model techniques that can extract knowledge from large, heterogeneous video datasets. Next, we review methods that specifically leverage video data for robot learning. Here, we categorise work according to which RL knowledge modality (KM) benefits from the use of video data. We additionally highlight techniques for mitigating LfV challenges, including reviewing action representations that address missing action labels in video.
Finally, we examine LfV datasets and benchmarks, before concluding with a discussion of challenges and opportunities in LfV. Here, we advocate for scalable foundation model approaches that can leverage the full range of internet video data, and that target the learning of the most promising RL KMs: the policy and dynamics model. Overall, we hope this survey will serve as a comprehensive reference for the emerging field of LfV, catalysing further research in the area and facilitating progress towards the development of general-purpose robots.
△ Less
Submitted 7 June, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation
Authors:
Lingyi Hong,
Zhongying Liu,
Wenchao Chen,
Chenzhi Tan,
Yuang Feng,
Xinyu Zhou,
Pinxue Guo,
Jinglun Li,
Zhaoyu Chen,
Shuyong Gao,
Wei Zhang,
Wenqiang Zhang
Abstract:
Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets rest…
▽ More
Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets restricts further investigation of VOS in realistic scenarios. Thus, we propose a novel benchmark named LVOS, comprising 720 videos with 296,401 frames and 407,945 high-quality annotations. Videos in LVOS last 1.14 minutes on average, approximately 5 times longer than videos in existing datasets. Each video includes various attributes, especially challenges deriving from the wild, such as long-term reappearing and cross-temporal similar objects. Compared to previous benchmarks, our LVOS better reflects VOS models' performance in real scenarios. Based on LVOS, we evaluate 20 existing VOS models under 4 different settings and conduct a comprehensive analysis. On LVOS, these models suffer a large performance drop, highlighting the challenge of achieving precise tracking and segmentation in real-world scenarios. Attribute-based analysis indicates that key factor to accuracy decline is the increased video length, emphasizing LVOS's crucial role. We hope our LVOS can advance development of VOS in real scenes. Data and code are available at https://lingyihongfd.github.io/lvos.github.io/.
△ Less
Submitted 30 April, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Enhancing High-Speed Cruising Performance of Autonomous Vehicles through Integrated Deep Reinforcement Learning Framework
Authors:
Jinhao Liang,
Kaidi Yang,
Chaopeng Tan,
Jinxiang Wang,
Guodong Yin
Abstract:
High-speed cruising scenarios with mixed traffic greatly challenge the road safety of autonomous vehicles (AVs). Unlike existing works that only look at fundamental modules in isolation, this work enhances AV safety in mixed-traffic high-speed cruising scenarios by proposing an integrated framework that synthesizes three fundamental modules, i.e., behavioral decision-making, path-planning, and mot…
▽ More
High-speed cruising scenarios with mixed traffic greatly challenge the road safety of autonomous vehicles (AVs). Unlike existing works that only look at fundamental modules in isolation, this work enhances AV safety in mixed-traffic high-speed cruising scenarios by proposing an integrated framework that synthesizes three fundamental modules, i.e., behavioral decision-making, path-planning, and motion-control modules. Considering that the integrated framework would increase the system complexity, a bootstrapped deep Q-Network (DQN) is employed to enhance the deep exploration of the reinforcement learning method and achieve adaptive decision making of AVs. Moreover, to make AV behavior understandable by surrounding HDVs to prevent unexpected operations caused by misinterpretations, we derive an inverse reinforcement learning (IRL) approach to learn the reward function of skilled drivers for the path planning of lane-changing maneuvers. Such a design enables AVs to achieve a human-like tradeoff between multi-performance requirements. Simulations demonstrate that the proposed integrated framework can guide AVs to take safe actions while guaranteeing high-speed cruising performance.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review
Authors:
Lequn Chen,
Guijun Bi,
Xiling Yao,
Jinlong Su,
Chaolin Tan,
Wenhe Feng,
Michalis Benakis,
Youxiang Chew,
Seung Ki Moon
Abstract:
Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including…
▽ More
Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including optical-based monitoring, acoustic-based sensing, laser line scanning, and operando X-ray monitoring. These techniques are evaluated for their capabilities and limitations in detecting defects within Laser Powder Bed Fusion (LPBF) and Laser Directed Energy Deposition (LDED) processes. Furthermore, the review discusses emerging multisensor monitoring and machine learning (ML)-assisted defect detection methods, benchmarking ML models tailored for in-situ defect detection. The paper also discusses in-situ adaptive defect remediation strategies that advance LAM towards zero-defect autonomous operations, focusing on real-time closed-loop feedback control and defect correction methods. Research gaps such as the need for standardization, improved reliability and sensitivity, and decision-making strategies beyond early stopping are highlighted. Future directions are proposed, with an emphasis on multimodal sensor fusion for multiscale defect prediction and fault diagnosis, ultimately enabling self-adaptation in LAM processes. This paper aims to equip researchers and industry professionals with a holistic understanding of the current capabilities, limitations, and future directions in in-situ process monitoring and adaptive quality enhancement in LAM.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled Data
Authors:
Zhenzhong Wang,
Qingyuan Zeng,
Wanyu Lin,
Min Jiang,
Kay Chen Tan
Abstract:
While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not p…
▽ More
While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not provide sufficient supervision for the unlabeled samples, leading to severe over-fitting. In this work, we point out that leveraging subgraphs to capture long-range dependencies can augment the representation of a node with homophily properties, thus alleviating the low-data regime. However, prior works leveraging subgraphs fail to capture the long-range dependencies among nodes. To this end, we present a novel self-supervised learning framework, called multi-view subgraph neural networks (Muse), for handling long-range dependencies. In particular, we propose an information theory-based identification mechanism to identify two types of subgraphs from the views of input space and latent space, respectively. The former is to capture the local structure of the graph, while the latter captures the long-range dependencies among nodes. By fusing these two views of subgraphs, the learned representations can preserve the topological properties of the graph at large, including the local structure and long-range dependencies, thus maximizing their expressiveness for downstream node classification tasks. Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models
Authors:
Yu Zhou,
Xingyu Wu,
Beicheng Huang,
Jibin Wu,
Liang Feng,
Kay Chen Tan
Abstract:
Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the ab…
▽ More
Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the absence of a comprehensive benchmark has rendered existing evaluation studies being straightforward, undiversified, and homogeneous. To address these challenges, this paper proposes a comprehensive benchmark, namely CausalBench, to evaluate the causality understanding capabilities of LLMs. Originating from the causal research community, CausalBench encompasses three causal learning-related tasks, which facilitate a convenient comparison of LLMs' performance with classic causal learning algorithms. Meanwhile, causal networks of varying scales and densities are integrated in CausalBench, to explore the upper limits of LLMs' capabilities across task scenarios of varying difficulty. Notably, background knowledge and structured data are also incorporated into CausalBench to thoroughly unlock the underlying potential of LLMs for long-text comprehension and prior information utilization. Based on CausalBench, this paper evaluates nineteen leading LLMs and unveils insightful conclusions in diverse aspects. Firstly, we present the strengths and weaknesses of LLMs and quantitatively explore the upper limits of their capabilities across various scenarios. Meanwhile, we further discern the adaptability and abilities of LLMs to specific structural networks and complex chain of thought structures. Moreover, this paper quantitatively presents the differences across diverse information sources and uncovers the gap between LLMs' capabilities in causal understanding within textual contexts and numerical domains.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models
Authors:
Beichen Huang,
Xingyu Wu,
Yu Zhou,
Jibin Wu,
Liang Feng,
Ran Cheng,
Kay Chen Tan
Abstract:
Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of applying LLMs. However, whether the application of LLMs in the black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors t…
▽ More
Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of applying LLMs. However, whether the application of LLMs in the black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors to offer deep insights into the potential of LLMs in optimization through a comprehensive investigation, which covers both discrete and continuous optimization problems to assess the efficacy and distinctive characteristics that LLMs bring to this field. Our findings reveal both the limitations and advantages of LLMs in optimization. Specifically, on the one hand, despite the significant power consumed for running the models, LLMs exhibit subpar performance in pure numerical tasks, primarily due to a mismatch between the problem domain and their processing capabilities; on the other hand, although LLMs may not be ideal for traditional numerical optimization, their potential in broader optimization contexts remains promising, where LLMs exhibit the ability to solve problems in non-numerical domains and can leverage heuristics from the prompt to enhance their performance. To the best of our knowledge, this work presents the first systematic evaluation of LLMs for numerical optimization. Our findings pave the way for a deeper understanding of LLMs' role in optimization and guide future application of LLMs in a wide range of scenarios.
△ Less
Submitted 6 July, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
Authors:
Tianyu Cao,
Natraj Raman,
Danial Dervovic,
Chenhao Tan
Abstract:
As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study…
▽ More
As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propose a computational framework for characterizing multimodal long-form summarization and investigate the behavior of Claude 2.0/2.1, GPT-4/3.5, and Command. We find that GPT-3.5 and Command fail to perform this summarization task meaningfully. For Claude 2 and GPT-4, we analyze the extractiveness of the summary and identify a position bias in LLMs. This position bias disappears after shuffling the input for Claude, which suggests that Claude has the ability to recognize important information. We also conduct a comprehensive investigation on the use of numeric data in LLM-generated summaries and offer a taxonomy of numeric hallucination. We employ prompt engineering to improve GPT-4's use of numbers with limited success. Overall, our analyses highlight the strong capability of Claude 2 in handling long multimodal inputs compared to GPT-4.
△ Less
Submitted 8 May, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Wi-Fi-based Personnel Identity Recognition: Addressing Dataset Imbalance with C-DDPMs
Authors:
Jichen Bian,
Chong Tan,
Peiyao Tang,
Min Zheng
Abstract:
Wireless sensing technologies become increasingly prevalent due to the ubiquitous nature of wireless signals and their inherent privacy-friendly characteristics. Device-free personnel identity recognition, a prevalent application in wireless sensing, is susceptibly challenged by imbalanced channel state information (CSI) datasets. This letter proposes a novel method for CSI dataset augmentation th…
▽ More
Wireless sensing technologies become increasingly prevalent due to the ubiquitous nature of wireless signals and their inherent privacy-friendly characteristics. Device-free personnel identity recognition, a prevalent application in wireless sensing, is susceptibly challenged by imbalanced channel state information (CSI) datasets. This letter proposes a novel method for CSI dataset augmentation that employs Conditional Denoising Diffusion Probabilistic Models (C-DDPMs) to generate additional samples that address class imbalance issues. The augmentation markedly improves classification accuracies on our homemade dataset, elevating all classes to above 94%.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Hypothesis Generation with Large Language Models
Authors:
Yangqiaoyu Zhou,
Haokun Liu,
Tejes Srivastava,
Hongyuan Mei,
Chenhao Tan
Abstract:
Effective generation of novel hypotheses is instrumental to scientific progress. So far, researchers have been the main powerhouse behind hypothesis generation by painstaking data analysis and thinking (also known as the Eureka moment). In this paper, we examine the potential of large language models (LLMs) to generate hypotheses. We focus on hypothesis generation based on data (i.e., labeled exam…
▽ More
Effective generation of novel hypotheses is instrumental to scientific progress. So far, researchers have been the main powerhouse behind hypothesis generation by painstaking data analysis and thinking (also known as the Eureka moment). In this paper, we examine the potential of large language models (LLMs) to generate hypotheses. We focus on hypothesis generation based on data (i.e., labeled examples). To enable LLMs to handle arbitrarily long contexts, we generate initial hypotheses from a small number of examples and then update them iteratively to improve the quality of hypotheses. Inspired by multi-armed bandits, we design a reward function to inform the exploitation-exploration tradeoff in the update process. Our algorithm is able to generate hypotheses that enable much better predictive performance than few-shot prompting in classification tasks, improving accuracy by 31.7% on a synthetic dataset and by 13.9%, 3.3% and, 24.9% on three real-world datasets. We also outperform supervised learning by 12.8% and 11.2% on two challenging real-world datasets. Furthermore, we find that the generated hypotheses not only corroborate human-verified theories but also uncover new insights for the tasks.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Diffusion-Driven Domain Adaptation for Generating 3D Molecules
Authors:
Haokai Hong,
Wanyu Lin,
Kay Chen Tan
Abstract:
Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule.…
▽ More
Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule. As the domain shift is typically caused by the structure variations of molecules, e.g., scaffold variations, we leverage a designated equivariant masked autoencoder (MAE) along with various masking strategies to capture the structural-grained representations of the in-domain varieties. In particular, with an asymmetric encoder-decoder module, the MAE can generalize to unseen structure variations from the target domains. These structure variations are encoded with an equivariant encoder and treated as domain supervisors to control denoising. We show that, with these encoded structural-grained domain supervisors, GADM can generate effective molecules within the desired new domains. We conduct extensive experiments across various domain adaptation tasks over benchmarking datasets. We show that our approach can improve up to 65.6% in terms of success rate defined based on molecular validity, uniqueness, and novelty compared to alternative baselines.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.