subscribe to arXiv mailings

arXiv:2407.04782 [pdf, other]

Metals in Star-Forming Galaxies with KCWI. I. Methodology and First Results on the Abundances of Iron, Magnesium, and Oxygen

Authors: Zhuyun Zhuang, Evan N. Kirby, Charles C. Steidel, Mithi A. C. de los Reyes, Nikolaus Z. Prusinski, N. Leethochawalit, Minjung Park, Charlie Conroy, Evan H. Nuñez

Abstract: Understanding the chemical enrichment of different elements is crucial to gaining a complete picture of galaxy chemical evolution. In this study, we present a new sample of 46 low-redshift, low-mass star-forming galaxies at $M_*\sim 10^{8-10}M_{\odot}$ along with two quiescent galaxies at $M_*\sim 10^{8.8}M_{\odot}$ observed with the Keck Cosmic Web Imager (KCWI), aiming to investigate the chemica… ▽ More Understanding the chemical enrichment of different elements is crucial to gaining a complete picture of galaxy chemical evolution. In this study, we present a new sample of 46 low-redshift, low-mass star-forming galaxies at $M_*\sim 10^{8-10}M_{\odot}$ along with two quiescent galaxies at $M_*\sim 10^{8.8}M_{\odot}$ observed with the Keck Cosmic Web Imager (KCWI), aiming to investigate the chemical evolution of galaxies in the transition zone between Local Group satellites and massive field galaxies. We develop a novel method to simultaneously determine stellar abundances of iron and magnesium in star-forming galaxies. With the gas-phase oxygen abundance (O/H)$_{\rm g}$ measured using the strong line method, we are able to make the first-ever apples-to-apples comparison of $α$ elements in the stars and the ISM. We find that the [Mg/H]$_*$-[O/H]$_{\rm g}$ relation is much tighter than the [Fe/H]$_*$-[O/H]$_{\rm g}$ relation, which can be explained by the similar production processes of $α$ elements. Most galaxies in our sample exhibit higher [O/H]$_{\rm g}$ than [Fe/H]$_*$ and [Mg/H]$_*$. In addition, we construct mass-metallicity relations (MZRs) measured as three different elements (Fe$_*$, Mg$_*$, O$_{\rm g}$). Compared to the gas O-MZR, the stellar Fe- and Mg-MZRs show larger scatter driven by variations in specific star formation rates (sSFR), with star-forming galaxies exhibiting higher sSFR and lower stellar abundances at fixed mass. The excess of [O/H]$_{\rm g}$ compared to stellar abundances as well as the anti-correlation between sSFR and stellar abundance suggests that galaxy quenching of intermediate-mass galaxies at $M_*\sim 10^{8-10}M_{\odot}$ is primarily driven by starvation. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 36 pages, 16 figures, 4 tables; accepted for publication in ApJ. Main results in Figure 7, 10 and 11

arXiv:2406.18832 [pdf, other]

OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

Authors: Jinguang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, Jingyu Wang, Zirui Zhuang, Tingting Yang, Jianxin Liao

Abstract: Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ)… ▽ More Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ) method for the activations of LLMs. OutlierTune consists of two components: pre-execution of dequantization and symmetrization. The pre-execution of dequantization updates the model weights by the activation scaling factors, avoiding the internal scaling and costly additional computational overheads brought by the per-channel activation quantization. The symmetrization further reduces the quantization differences arising from the weight updates by ensuring the balanced numerical ranges across different activation channels. OutlierTune is easy to implement and hardware-efficient, introducing almost no additional computational overheads during the inference. Extensive experiments show that the proposed framework outperforms existing methods across multiple different tasks. Demonstrating better generalization, this framework improves the Int6 quantization of the instruction-tuning LLMs, such as OPT-IML, to the same level as half-precision (FP16). Moreover, we have shown that the proposed framework is 1.48x faster than the FP16 implementation while reducing approximately 2x memory usage. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.12242 [pdf, other]

doi 10.1609/aaai.v38i8.28795

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Authors: Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning, Yunhua Hu, Siqiao Xue

Abstract: Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g.,… ▽ More Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10759 [pdf, other]

Humanoid Parkour Learning

Authors: Ziwen Zhuang, Shenzhe Yao, Hang Zhao

Abstract: Parkour is a grand challenge for legged locomotion, even for quadruped robots, requiring active perception and various maneuvers to overcome multiple challenging obstacles. Existing methods for humanoid locomotion either optimize a trajectory for a single parkour track or train a reinforcement learning policy only to walk with a significant amount of motion references. In this work, we propose a f… ▽ More Parkour is a grand challenge for legged locomotion, even for quadruped robots, requiring active perception and various maneuvers to overcome multiple challenging obstacles. Existing methods for humanoid locomotion either optimize a trajectory for a single parkour track or train a reinforcement learning policy only to walk with a significant amount of motion references. In this work, we propose a framework for learning an end-to-end vision-based whole-body-control parkour policy for humanoid robots that overcomes multiple parkour skills without any motion prior. Using the parkour policy, the humanoid robot can jump on a 0.42m platform, leap over hurdles, 0.8m gaps, and much more. It can also run at 1.8m/s in the wild and walk robustly on different terrains. We test our policy in indoor and outdoor environments to demonstrate that it can autonomously select parkour skills while following the rotation command of the joystick. We override the arm actions and show that this framework can easily transfer to humanoid mobile manipulation tasks. Videos can be found at https://humanoid4parkour.github.io △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08835 [pdf, other]

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Mapping Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08563 [pdf, other]

Field-sensitive dislocation bound states in two-dimensional $d$-wave altermagnets

Authors: Di Zhu, Dongling Liu, Zheng-Yang Zhuang, Zhigang Wu, Zhongbo Yan

Abstract: When a two-dimensional $d$-wave altermagnet is grown on a substrate, the interplay of momentum-dependent spin splittings arising from altermagnetism and Rashba spin-orbit coupling gives rise to a nodal band structure with band degeneracies enforced by a $C_{4z}\mathcal{T}$ symmetry. If we break the $C_{4z}\mathcal{T}$ symmetry by an exchange field, the band degeneracies are found to be immediately… ▽ More When a two-dimensional $d$-wave altermagnet is grown on a substrate, the interplay of momentum-dependent spin splittings arising from altermagnetism and Rashba spin-orbit coupling gives rise to a nodal band structure with band degeneracies enforced by a $C_{4z}\mathcal{T}$ symmetry. If we break the $C_{4z}\mathcal{T}$ symmetry by an exchange field, the band degeneracies are found to be immediately lifted, leading to a topological band structure characterized by nontrivial strong and weak topological indices. Remarkably, both the strong topological index and the $Z_{2}$-valued weak topological indices depend sensitively on the direction of the exchange field. As a consequence of the bulk-defect correspondence, we find that the unique dependence of weak topological indices on the exchange field in this system dictates that the presence or absence of topological bound states at lattice dislocations also depends sensitively on the direction of the exchange field. When the substrate is an $s$-wave superconductor, we find that a similar dependence of band topology on the exchange field gives rise to field-sensitive dislocation Majorana zero modes. As topological dislocation bound states are easily detectable by scanning tunneling microscopy, our findings unveil a promising experimental diagnosis of altermagnetic materials among an ever growing list of candidates. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures

arXiv:2406.02226 [pdf, ps, other]

Stability for a family of planar systems with nilpotent critical points

Authors: Ziwei Zhuang, Changjian Liu

Abstract: Consider a family of planar polynomial systems $\dot x = y^{2l-1} - x^{2k+1}, \dot y =-x +m y^{2s+1},$ where $l,k,s\in\mathbb{N^*},$ $2\le l \le 2s$ and $m\in\mathbb{R}.$ We study the center-focus problem on its origin which is a monodromic nilpotent critical point. By directly calculating the generalized Lyapunov constants, we find that the origin is always a focus and we complete the classificat… ▽ More Consider a family of planar polynomial systems $\dot x = y^{2l-1} - x^{2k+1}, \dot y =-x +m y^{2s+1},$ where $l,k,s\in\mathbb{N^*},$ $2\le l \le 2s$ and $m\in\mathbb{R}.$ We study the center-focus problem on its origin which is a monodromic nilpotent critical point. By directly calculating the generalized Lyapunov constants, we find that the origin is always a focus and we complete the classification of its stability. This includes the most difficult case: $s=kl$ and $m=(2k+1)!!/(2kl+1)!_{(2l)}.$ In this case, we prove that the origin is always unstable. Our result extends and completes a previous one. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01250 [pdf, other]

DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree

Authors: Zhutao Zhuang, Xinqi Zeng, Zhiguang Chen

Abstract: Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challengin… ▽ More Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challenging to find proper parameter for garbage collection triggering. In this work we introduce DumpKV, which introduces learning based lifetime aware garbage collection with dynamic lifetime adjustment to do efficient garbage collection to achieve lower write amplification. DumpKV manages large values using trained lightweight model with features suitable for various application based on past write access information of keys to give lifetime prediction for each individual key to enable efficient garbage collection. To reduce interference to write throughput DumpKV conducts feature collection during L0\-L1 compaction leveraging the fact that LSM\-tree is small under KV separation. Experimental results show that DumpKV achieves lower write amplification by 38\%\-73\% compared to existing key\-value separation garbage collection LSM\-tree stores with small feature storage overhead. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Hi

arXiv:2405.20351 [pdf, other]

ADR-BC: Adversarial Density Weighted Regression Behavior Cloning

Authors: Ziqi Zhang, Zifeng Zhuang, Donglin Wang, Jingzehua Xu, Miao Liu, Shuai Zhang

Abstract: Typically, traditional Imitation Learning (IL) methods first shape a reward or Q function and then use this shaped function within a reinforcement learning (RL) framework to optimize the empirical policy. However, if the shaped reward/Q function does not adequately represent the ground truth reward/Q function, updating the policy within a multi-step RL framework may result in cumulative bias, furt… ▽ More Typically, traditional Imitation Learning (IL) methods first shape a reward or Q function and then use this shaped function within a reinforcement learning (RL) framework to optimize the empirical policy. However, if the shaped reward/Q function does not adequately represent the ground truth reward/Q function, updating the policy within a multi-step RL framework may result in cumulative bias, further impacting policy learning. Although utilizing behavior cloning (BC) to learn a policy by directly mimicking a few demonstrations in a single-step updating manner can avoid cumulative bias, BC tends to greedily imitate demonstrated actions, limiting its capacity to generalize to unseen state action pairs. To address these challenges, we propose ADR-BC, which aims to enhance behavior cloning through augmented density-based action support, optimizing the policy with this augmented support. Specifically, the objective of ADR-BC shares the similar physical meanings that matching expert distribution while diverging the sub-optimal distribution. Therefore, ADR-BC can achieve more robust expert distribution matching. Meanwhile, as a one-step behavior cloning framework, ADR-BC avoids the cumulative bias associated with multi-step RL frameworks. To validate the performance of ADR-BC, we conduct extensive experiments. Specifically, ADR-BC showcases a 10.5% improvement over the previous state-of-the-art (SOTA) generalized IL baseline, CEIL, across all tasks in the Gym-Mujoco domain. Additionally, it achieves an 89.5% improvement over Implicit Q Learning (IQL) using real rewards across all tasks in the Adroit and Kitchen domains. On the other hand, we conduct extensive ablations to further demonstrate the effectiveness of ADR-BC. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.14790 [pdf, other]

DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

Authors: Jinxin Liu, Xinghong Guo, Zifeng Zhuang, Donglin Wang

Abstract: In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data. We achieve this by leveraging diffusion probabilistic models as priors to guide the learning process and regularize the policy. By optimizing a joint objective that incorporates diversi… ▽ More In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data. We achieve this by leveraging diffusion probabilistic models as priors to guide the learning process and regularize the policy. By optimizing a joint objective that incorporates diversity and diffusion-guided regularization, we encourage the emergence of diverse behaviors while maintaining the similarity to the offline data. Experimental results in four decision-making domains (Push, Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering diverse and discriminative skills. We also introduce skill stitching and skill interpolation, which highlight the generalist nature of the learned skill space. Further, by incorporating an extrinsic reward function, DIDI enables reward-guided behavior generation, facilitating the learning of diverse and optimal behaviors from sub-optimal data. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: ICML2024

arXiv:2405.12214 [pdf, other]

Tunable Spatiotemporal Orders in Driven Insulators

Authors: Daniel Kaplan, Pavel A. Volkov, Ahana Chakraborty, Zekun Zhuang, Premala Chandra

Abstract: We show that driving optical phonons above a threshold fluence induces spatiotemporal orders, where material properties oscillate at an incommensurate wavevector $q_0$ in space and at half the drive frequency in time. The order is robust against temperature on timescales much larger than the lifetime of the excited modes and can be accompanied by a static $2q_0$ modulation. We make predictions for… ▽ More We show that driving optical phonons above a threshold fluence induces spatiotemporal orders, where material properties oscillate at an incommensurate wavevector $q_0$ in space and at half the drive frequency in time. The order is robust against temperature on timescales much larger than the lifetime of the excited modes and can be accompanied by a static $2q_0$ modulation. We make predictions for time-resolved diffraction and provide estimates for candidate materials. Our results show the possibility of using THz waves in solids to realize tunable incommensurate order on the nanoscale. △ Less

Submitted 24 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures + supplementary information

arXiv:2405.08740 [pdf, other]

Reinformer: Max-Return Sequence Modeling for Offline RL

Authors: Zifeng Zhuang, Dengyun Peng, Jinxin Liu, Ziqi Zhang, Donglin Wang

Abstract: As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as sequence modeling that conditions on the hindsight information including returns, goal or future trajectory. Although promising, this supervised paradigm overlooks the core objective of RL that maximizes the return. This overlook directly leads to the lack of trajectory stitching capability that affects the seque… ▽ More As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as sequence modeling that conditions on the hindsight information including returns, goal or future trajectory. Although promising, this supervised paradigm overlooks the core objective of RL that maximizes the return. This overlook directly leads to the lack of trajectory stitching capability that affects the sequence model learning from sub-optimal data. In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models. We propose Reinforced Transformer (Reinformer), indicating the sequence model is reinforced by the RL objective. Reinformer additionally incorporates the objective of maximizing returns in the training phase, aiming to predict the maximum future return within the distribution. During inference, this in-distribution maximum return will guide the selection of optimal actions. Empirically, Reinformer is competitive with classical RL methods on the D4RL benchmark and outperforms state-of-the-art sequence model particularly in trajectory stitching ability. Code is public at https://github.com/Dragon-Zhuang/Reinformer. △ Less

Submitted 2 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.07686 [pdf, other]

Pole trajectories of the $Λ(1405)$ helps establish its dynamical nature

Authors: Zejian Zhuang, Raquel Molina, Jun-xu Lu, Li-Sheng Geng

Abstract: Chiral trajectories of dynamically generated resonances are connected to the SU(3) breaking pattern and their nature. From an analysis of a recent LQCD simulation on the $πΣ-\bar{K}N$ scattering for $I=0$, and the study of the quark mass dependence of the octet baryons, we determine for the first time the trajectory of the two poles associated to the $Λ(1405)$ towards the symmetric point… ▽ More Chiral trajectories of dynamically generated resonances are connected to the SU(3) breaking pattern and their nature. From an analysis of a recent LQCD simulation on the $πΣ-\bar{K}N$ scattering for $I=0$, and the study of the quark mass dependence of the octet baryons, we determine for the first time the trajectory of the two poles associated to the $Λ(1405)$ towards the symmetric point $(\mathrm{Tr}[M]=\mathrm{cte})$ accurately. Our result at unphysical pion mass is consistent with the lattice simulation at $m_π\simeq 200$ MeV and the extrapolation to the physical point, based on the NLO chiral lagrangian, agrees perfectly well with previous analyses of experimental data. Contrary to other works, we predict qualitatively similar trajectories at LO and up to NLO, being consistent with the dominance of the LO interaction. At the SU(3) symmetric point up to NLO, we obtain that the lower pole is located at $E^{(1)}=1595\pm8$ MeV, being a singlet representation, while the higher pole belongs to the octet with a mass $E^{(8)}=1600\pm4$ MeV. This can be tested in the future LQCD simulations. △ Less

Submitted 18 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Comments are welcome

arXiv:2405.07004 [pdf, other]

Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Authors: Zhixiong Zhuang, Maria-Irina Nicolae, Mario Fritz

Abstract: Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box a… ▽ More Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim's input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Accepted at ICML 2024. Project page: https://zhixiongzh.github.io/stealthy-imitation

arXiv:2404.19721 [pdf]

PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games

Authors: Steph Buongiorno, Lawrence Jake Klinkert, Tanishq Chawla, Zixin Zhuang, Corey Clark

Abstract: This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level… ▽ More This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level data (which includes, but is not limited to, setting, key items, and non-playable characters (NPCs)), but by also fostering dynamic, free-form interactions between the player and the environment that align with the procedural game narrative. The NPCs generated by PANGeA are personality-biased and express traits from the Big 5 Personality Model in their generated responses. PANGeA addresses challenges behind ingesting free-form text input, which can prompt LLM responses beyond the scope of the game narrative. A novel validation system that uses the LLM's intelligence evaluates text input and aligns generated responses with the unfolding narrative. Making these interactions possible, PANGeA is supported by a server that hosts a custom memory system that supplies context for augmenting generated responses thus aligning them with the procedural narrative. For its broad application, the server has a REST interface enabling any game engine to integrate directly with PANGeA, as well as an LLM interface adaptable with local or private LLMs. PANGeA's ability to foster dynamic narrative generation by aligning responses with the procedural narrative is demonstrated through an empirical study and ablation test of two versions of a demo game. These are, a custom, browser-based GPT and a Unity demo. As the results show, PANGeA holds potential to assist game designers in using LLMs to generate narrative-consistent content even when provided varied and unpredictable, free-form text input. △ Less

Submitted 9 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.17134 [pdf, ps, other]

Boundedness of log Fano cone singularities and discreteness of local volumes

Authors: Chenyang Xu, Ziquan Zhuang

Abstract: We prove that in any fixed dimension, K-semistable log Fano cone singularities whose volumes are bounded from below by a fixed positive number form a bounded set. As a consequence, we show that the set of local volumes of klt singularities of a fixed dimension has zero as the only accumulation point. We prove that in any fixed dimension, K-semistable log Fano cone singularities whose volumes are bounded from below by a fixed positive number form a bounded set. As a consequence, we show that the set of local volumes of klt singularities of a fixed dimension has zero as the only accumulation point. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 20 pages. Comments are welcome!

arXiv:2404.10049 [pdf, other]

Berry-dipole Semimetals

Authors: Zheng-Yang Zhuang, Chaoyi Zhang, Xiao-Jiao Wang, Zhongbo Yan

Abstract: We introduce ''Berry-dipole semimetals'', whose band degeneracies are characterized by quantized Berry dipoles. Through a two-band model constructed by Hopf map, we reveal that the Berry-dipole semimetals display a multitude of salient properties distinct from other topological semimetals. On the boundary, we find that the first-order Berry-dipole semimetal harbors anomalous paired Fermi arcs with… ▽ More We introduce ''Berry-dipole semimetals'', whose band degeneracies are characterized by quantized Berry dipoles. Through a two-band model constructed by Hopf map, we reveal that the Berry-dipole semimetals display a multitude of salient properties distinct from other topological semimetals. On the boundary, we find that the first-order Berry-dipole semimetal harbors anomalous paired Fermi arcs with the same spin polarization, even though the layer Chern number is zero, and the second-order Berry-dipole semimetal hosts dispersionless hinge arcs. In the bulk, we find that the low-energy Berry-dipole Hamiltonian near the band node has a quadratic energy dispersion and peculiar Berry curvature, which give rise to rather unique characteristics in the intrinsic anomalous Hall effect, orbital magnetization and Landau levels. Our study shows that Berry-dipole semimetals are a class of topological gapless phases supporting rich intriguing physics. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 7+12 pages, 7 figures

arXiv:2404.08938 [pdf, other]

Enforcing Paraphrase Generation via Controllable Latent Diffusion

Authors: Wei Zou, Ziyuan Zhuang, Shujian Huang, Jia Liu, Jiajun Chen

Abstract: Paraphrase generation aims to produce high-quality and diverse utterances of a given text. Though state-of-the-art generation via the diffusion model reconciles generation quality and diversity, textual diffusion suffers from a truncation issue that hinders efficiency and quality control. In this work, we propose \textit{L}atent \textit{D}iffusion \textit{P}araphraser~(LDP), a novel paraphrase gen… ▽ More Paraphrase generation aims to produce high-quality and diverse utterances of a given text. Though state-of-the-art generation via the diffusion model reconciles generation quality and diversity, textual diffusion suffers from a truncation issue that hinders efficiency and quality control. In this work, we propose \textit{L}atent \textit{D}iffusion \textit{P}araphraser~(LDP), a novel paraphrase generation by modeling a controllable diffusion process given a learned latent space. LDP achieves superior generation efficiency compared to its diffusion counterparts. It facilitates only input segments to enforce paraphrase semantics, which further improves the results without external features. Experiments show that LDP achieves improved and diverse paraphrase generation compared to baselines. Further analysis shows that our method is also helpful to other similar text generations and domain adaptations. Our code and data are available at https://github.com/NIL-zhuang/ld4pg. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2403.15566 [pdf, ps, other]

Non-existence of Ulrich modules over Cohen-Macaulay local rings

Authors: Srikanth B. Iyengar, Linquan Ma, Mark E. Walker, Ziquan Zhuang

Abstract: Over a Cohen-Macaulay local ring, the minimal number of generators of a maximal Cohen-Macaulay module is bounded above by its multiplicity. In 1984 Ulrich asked whether there always exist modules for which equality holds; such modules are known nowadays as Ulrich modules. We answer this question in the negative by constructing families of two dimensional Cohen-Macaulay local rings that have no Ulr… ▽ More Over a Cohen-Macaulay local ring, the minimal number of generators of a maximal Cohen-Macaulay module is bounded above by its multiplicity. In 1984 Ulrich asked whether there always exist modules for which equality holds; such modules are known nowadays as Ulrich modules. We answer this question in the negative by constructing families of two dimensional Cohen-Macaulay local rings that have no Ulrich modules. Some of these examples are Gorenstein normal domains; others are even complete intersection domains, though not normal. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 12 pages

MSC Class: 13C13 (primary); 13H10; 13C14; 14F06 (secondary)

arXiv:2403.15448 [pdf, other]

What is Wrong with End-to-End Learning for Phase Retrieval?

Authors: Wenjie Zhang, Yuxiang Wan, Zhong Zhuang, Ju Sun

Abstract: For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before… ▽ More For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before any learning, i.e., symmetry breaking. We take far-field phase retrieval (FFPR), which is central to many areas of scientific imaging, as an example and show that symmetric breaking can substantially improve data-driven learning. We also formulate the mathematical principle of symmetry breaking. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.14783 [pdf, other]

Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering

Authors: Bowen Jiang, Zhijun Zhuang, Shreyas S. Shivakumar, Dan Roth, Camillo J. Taylor

Abstract: This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitations of foundation models in object detection and counting by using specialized agents as tools. Unlike existing approaches, our study focuses on the system's performance without fine-tuning it on speci… ▽ More This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitations of foundation models in object detection and counting by using specialized agents as tools. Unlike existing approaches, our study focuses on the system's performance without fine-tuning it on specific VQA datasets, making it more practical and robust in the open world. We present preliminary experimental results under zero-shot scenarios and highlight some failure cases, offering new directions for future research. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: A full version of the paper will be released soon. The codes are available at https://github.com/bowen-upenn/Multi-Agent-VQA

arXiv:2403.11334 [pdf, other]

Bridging the Gap between Discrete Agent Strategies in Game Theory and Continuous Motion Planning in Dynamic Environments

Authors: Hongrui Zheng, Zhijun Zhuang, Stephanie Wu, Shuo Yang, Rahul Mangharam

Abstract: Generating competitive strategies and performing continuous motion planning simultaneously in an adversarial setting is a challenging problem. In addition, understanding the intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments. Existing approaches either discretize agent action by grouping similar control inputs, sacrificing performance in motio… ▽ More Generating competitive strategies and performing continuous motion planning simultaneously in an adversarial setting is a challenging problem. In addition, understanding the intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments. Existing approaches either discretize agent action by grouping similar control inputs, sacrificing performance in motion planning, or plan in uninterpretable latent spaces, producing hard-to-understand agent behaviors. This paper proposes an agent strategy representation via Policy Characteristic Space that maps the agent policies to a pre-specified low-dimensional space. Policy Characteristic Space enables the discretization of agent policy switchings while preserving continuity in control. Also, it provides intepretability of agent policies and clear intentions of policy switchings. Then, regret-based game-theoretic approaches can be applied in the Policy Characteristic Space to obtain high performance in adversarial environments. Our proposed method is assessed by conducting experiments in an autonomous racing scenario using scaled vehicles. Statistical evidence shows that our method significantly improves the win rate of ego agent and the method also generalizes well to unseen environments. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Submitted to RA-L

arXiv:2403.08593 [pdf, other]

Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments

Authors: Sitao Cheng, Ziyuan Zhuang, Yong Xu, Fangkai Yang, Chaoyun Zhang, Xiaoting Qin, Xiang Huang, Ling Chen, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

Abstract: Large Language Models (LLMs) have shown potential in reasoning over structured environments, e.g., knowledge graph and table. Such tasks typically require multi-hop reasoning, i.e., match natural language utterance with instances in the environment. Previous methods leverage LLMs to incrementally build a reasoning path, where the LLMs either invoke tools or pick up schemas by step-by-step interact… ▽ More Large Language Models (LLMs) have shown potential in reasoning over structured environments, e.g., knowledge graph and table. Such tasks typically require multi-hop reasoning, i.e., match natural language utterance with instances in the environment. Previous methods leverage LLMs to incrementally build a reasoning path, where the LLMs either invoke tools or pick up schemas by step-by-step interacting with the environment. We propose Reasoning-Path-Editing (Readi), a novel framework where LLMs can efficiently and faithfully reason over structured environments. In Readi, LLMs initially generate a reasoning path given a query, and edit the path only when necessary. We instantiate the path on structured environments and provide feedback to edit the path if anything goes wrong. Experimental results on three KGQA and two TableQA datasets show the effectiveness of Readi, significantly surpassing previous LLM-based methods (by 9.1% Hit@1 on WebQSP, 12.4% on MQA-3H and 9.5% on WTQ), comparable with state-of-the-art fine-tuned methods (67% on CWQ and 74.7% on WebQSP) and substantially boosting the vanilla LLMs (by 14.9% on CWQ). Our code will be available on https://aka.ms/readi. △ Less

Submitted 3 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted by ACL 2024 Findings. 21 pages, 7 figures, 17 tables

arXiv:2403.08008 [pdf, other]

Distribution and Properties of Molecular Gas Toward the Monoceros OB1 Region

Authors: Zi Zhuang, Yang Su, Shiyu Zhang, Xuepeng Chen, Qing-Zeng Yan, Haoran Feng, Li Sun, Xiaoyun Xu, Yan Sun, Xin Zhou, Hongchi Wang, Ji Yang

Abstract: We perform a comprehensive CO study toward the Monoceros OB1 (Mon OB1) region based on the MWISP survey at an angular resolution of about $50''$. The high-sensitivity data, together with the high dynamic range, shows that molecular gas in the $\rm 8^{\circ}\times4^{\circ}$ region displays complicated hierarchical structures and various morphology (e.g., filamentary, cavity-like, shell-like, and ot… ▽ More We perform a comprehensive CO study toward the Monoceros OB1 (Mon OB1) region based on the MWISP survey at an angular resolution of about $50''$. The high-sensitivity data, together with the high dynamic range, shows that molecular gas in the $\rm 8^{\circ}\times4^{\circ}$ region displays complicated hierarchical structures and various morphology (e.g., filamentary, cavity-like, shell-like, and other irregular structures). Based on Gaussian decomposition and clustering for $\mathrm{^{13}CO}$ data, a total of 263 $\mathrm{^{13}CO}$ structures are identified in the whole region, and 88% of raw data flux is recovered. The dense gas with relatively high column density from the integrated CO emission is mainly concentrated in the region where multiple $\rm ^{13}CO$ structures are overlapped. Combining the results of 32 large $\mathrm{^{13}CO}$ structures with distances from Gaia DR3, we estimate an average distance of $\rm 729^{+45}_{-45}~pc$ for the GMC complex. The total mass of the GMC Complex traced by $\mathrm{^{12}CO}$, $\mathrm{^{13}CO}$, and $\mathrm{C^{18}O}$ are $1.1\times10^5~M_\odot$, $4.3\times10^4~M_\odot$, and $8.4\times10^3~M_\odot$, respectively. The dense gas fraction shows a clear difference between Mon OB1 GMC East (12.4%) and Mon OB1 GMC West (3.3%). Our results show that the dense gas environment is closely linked to the nearby star-forming regions. On the other hand, star-forming activities have a great influence on the physical properties of the surrounding molecular gas (e.g., greater velocity dispersion, higher temperatures, and more complex velocity structures, etc.). We also discuss the distribution/kinematics of molecular gas associated with nearby star-forming activities. △ Less

Submitted 14 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 28 pages, 17 figures, match to the version of APJ, 966, 202. The dataset has been released on https://doi.org/10.57760/sciencedb.17451

arXiv:2403.05055 [pdf, other]

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

Authors: Yitao Zhu, Sheng Wang, Mengjie Xu, Zixu Zhuang, Zhixin Wang, Kaidong Wang, Han Zhang, Qian Wang

Abstract: Multiple cameras can provide multi-view video coverage of a person. It is necessary to fuse multi-view data, e.g., for subsequent behavioral analysis, while such fusion often relies on calibration of cameras in traditional solutions. However, it is non-trivial to calibrate multiple cameras. In this work, we propose a method to reconstruct 3D human body from multiple uncalibrated camera views. Firs… ▽ More Multiple cameras can provide multi-view video coverage of a person. It is necessary to fuse multi-view data, e.g., for subsequent behavioral analysis, while such fusion often relies on calibration of cameras in traditional solutions. However, it is non-trivial to calibrate multiple cameras. In this work, we propose a method to reconstruct 3D human body from multiple uncalibrated camera views. First, we adopt a pre-trained human body encoder to process each individual camera view, such that human body models and parameters can be reconstructed for each view. Next, instead of simply averaging models across views, we train a network to determine the weights of individual views for their fusion, based on the parameters estimated for joints and hands of human body as well as camera positions. Further, we turn to the mesh surface of human body for dynamic fusion, such that facial expression can be seamlessly integrated into the model of human body. Our method has demonstrated superior performance in reconstructing human body upon two public datasets. More importantly, our method can flexibly support ad-hoc deployment of an arbitrary number of cameras, which has significant potential in related applications. We will release source code upon acceptance of the paper. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.00061 [pdf, other]

The Multilayer Nature of Molecular Gas toward the Cygnus Region

Authors: Shiyu Zhang, Yang Su, Xuepeng Chen, Min Fang, Qingzeng Yan, Shaobo Zhang, Yan Sun, Xiaolong Wang, Haoran Feng, Yuehui Ma, Miaomiao Zhang, Zi Zhuang, Xin Zhou, Zhiwei Chen, Ji Yang

Abstract: We study the physical properties and 3D distribution of molecular clouds (MCs) toward the Cygnus region using the MWISP CO survey and Gaia DR3 data. Based on Gaussian decomposition and clustering for $\rm ^{13}CO$ lines, over 70% of the fluxes are recovered. With the identification result of $\rm ^{13}CO$ structures, two models are designed to measure the distances of the molecular gas in velocity… ▽ More We study the physical properties and 3D distribution of molecular clouds (MCs) toward the Cygnus region using the MWISP CO survey and Gaia DR3 data. Based on Gaussian decomposition and clustering for $\rm ^{13}CO$ lines, over 70% of the fluxes are recovered. With the identification result of $\rm ^{13}CO$ structures, two models are designed to measure the distances of the molecular gas in velocity crowding regions. The distances of more than 200 large $\rm ^{13}CO$ structures are obtained toward the 150 square degree region. Additionally, tens of the identified MC structures coincide well with masers and/or intense mid-IR emission. We find multiple gas layers toward the region: (1) the extensive gas structures composing the Cygnus Rift from 700 pc to 1 kpc across the whole region; (2) the $\sim$ 1.3 kpc gas layer mainly in the Cygnus X South region; and (3) the 1.5 kpc dense filament at the Cygnus X North region and many cometary clouds shaped by Cygnus OB2. We also note that the spatial distribution of YSO candidates is generally consistent with the molecular gas structures. The total molecular mass of the Cygnus region is estimated to be $\sim 2.7\times10^{6}~M_{\odot}$ assuming an X-factor ratio $X_{\rm CO} = 2 \times 10^{20} \rm cm^{-2} (K\cdot km\cdot s^{-1})^{-1}$. The foreground Cygnus Rift contributes $\sim$25% of the molecular mass in the whole region. Our work presents a new 3D view of the MCs' distribution toward the Cygnus X region, as well as the exact molecular gas mass distribution in the foreground Cygnus Rift. △ Less

Submitted 23 April, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: 51 pages, 26 figures, 4 tables, to match the AJ version (2024 AJ 167 220Z). The data can be found at doi: 10.57760/sciencedb.16716

arXiv:2402.16348 [pdf, other]

doi 10.1109/LRA.2024.3379840

Star-Searcher: A Complete and Efficient Aerial System for Autonomous Target Search in Complex Unknown Environments

Authors: Yiming Luo, Zixuan Zhuang, Neng Pan, Chen Feng, Shaojie Shen, Fei Gao, Hui Cheng, Boyu Zhou

Abstract: This paper tackles the challenge of autonomous target search using unmanned aerial vehicles (UAVs) in complex unknown environments. To fill the gap in systematic approaches for this task, we introduce Star-Searcher, an aerial system featuring specialized sensor suites, mapping, and planning modules to optimize searching. Path planning challenges due to increased inspection requirements are address… ▽ More This paper tackles the challenge of autonomous target search using unmanned aerial vehicles (UAVs) in complex unknown environments. To fill the gap in systematic approaches for this task, we introduce Star-Searcher, an aerial system featuring specialized sensor suites, mapping, and planning modules to optimize searching. Path planning challenges due to increased inspection requirements are addressed through a hierarchical planner with a visibility-based viewpoint clustering method. This simplifies planning by breaking it into global and local sub-problems, ensuring efficient global and local path coverage in real-time. Furthermore, our global path planning employs a history-aware mechanism to reduce motion inconsistency from frequent map changes, significantly enhancing search efficiency. We conduct comparisons with state-of-the-art methods in both simulation and the real world, demonstrating shorter flight paths, reduced time, and higher target search completeness. Our approach will be open-sourced for community benefit at https://github.com/SYSU-STAR/STAR-Searcher. △ Less

Submitted 21 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Aceepted to IEEE RA-L. Code: https://github.com/SYSU-STAR/STAR-Searcher. Video: https://www.youtube.com/watch?v=08ll_oo_DtU

arXiv:2402.15362 [pdf, ps, other]

Essential dimension of isogenies

Authors: J��nos Kollár, Ziquan Zhuang

Abstract: We give a lower bound for the essential dimension of isogenies of complex abelian varieties. The bound is sharp in many cases. In particular, the multiplication-by-$m$ map is incompressible for every $m\geq 2$, confirming a conjecture of Brosnan. We give a lower bound for the essential dimension of isogenies of complex abelian varieties. The bound is sharp in many cases. In particular, the multiplication-by-$m$ map is incompressible for every $m\geq 2$, confirming a conjecture of Brosnan. △ Less

Submitted 8 April, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 12 pages. v3: Added a new author. Sections 3 and 5 are new. Main results are strengthened to cover the case of small primes. Comments are welcome!

arXiv:2402.11282 [pdf]

Grammaticality illusion or ambiguous interpretation? Event-related potentials reveal the nature of the missing-NP effect in Mandarin centre-embedded structures

Authors: Qihang Yang, Caimei Yang, Yu Liao, Ziman Zhuang

Abstract: In several languages, omitting a verb phrase (VP) in double centre-embedded structures creates a grammaticality illusion. Similar illusion also exhibited in Mandarin missing-NP double centre-embedded structures. However, there is no consensus on its very nature. Instead of treating it as grammaticality illusion, we argue that ambiguous interpretations of verbs can best account for this phenomenon… ▽ More In several languages, omitting a verb phrase (VP) in double centre-embedded structures creates a grammaticality illusion. Similar illusion also exhibited in Mandarin missing-NP double centre-embedded structures. However, there is no consensus on its very nature. Instead of treating it as grammaticality illusion, we argue that ambiguous interpretations of verbs can best account for this phenomenon in Mandarin. To further support this hypothesis, we conducted two electroencephalography (EEG) experiments on quasi double centre-embedded structures whose complexity is reduced by placing the self-embedding relative clauses into the sentence's subject position. Experiment 1 showed that similar phenomenon even exhibited in this structure, evidenced by an absence of P600 effect and a presence of N400 effect. In Experiment 2, providing semantic cues to reduce ambiguity dispelled this illusion, as evidenced by a P600 effect. We interpret the results under garden-path theory and propose that word-order difference may account for this cross-linguistic variation. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10487 [pdf, other]

RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal Data

Authors: Chin-Chia Michael Yeh, Yujie Fan, Xin Dai, Uday Singh Saini, Vivian Lai, Prince Osei Aboagye, Junpeng Wang, Huiyuan Chen, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang

Abstract: Spatial-temporal forecasting systems play a crucial role in addressing numerous real-world challenges. In this paper, we investigate the potential of addressing spatial-temporal forecasting problems using general time series forecasting models, i.e., models that do not leverage the spatial relationships among the nodes. We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting arch… ▽ More Spatial-temporal forecasting systems play a crucial role in addressing numerous real-world challenges. In this paper, we investigate the potential of addressing spatial-temporal forecasting problems using general time series forecasting models, i.e., models that do not leverage the spatial relationships among the nodes. We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting architecture called RPMixer. The all-MLP architecture was chosen due to its recent success in time series forecasting benchmarks. Furthermore, our method capitalizes on the ensemble-like behavior of deep neural networks, where each individual block within the network behaves like a base learner in an ensemble model, particularly when identity mapping residual connections are incorporated. By integrating random projection layers into our model, we increase the diversity among the blocks' outputs, thereby improving the overall performance of the network. Extensive experiments conducted on the largest spatial-temporal forecasting benchmark datasets demonstrate that the proposed method outperforms alternative methods, including both spatial-temporal graph models and general forecasting models. △ Less

Submitted 12 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.09205 [pdf, other]

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

Authors: Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

Abstract: Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark des… ▽ More Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released. △ Less

Submitted 15 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 26 pages, 5 tables, 6 figures

arXiv:2402.03772 [pdf, other]

Fundamental Limits of Two-Hop MIMO Channels: An Asymptotic Approach

Authors: Zeyan Zhuang, Xin Zhang, Dongfang Xu, Shenghui Song

Abstract: Multi-antenna relays and intelligent reflecting surfaces (IRSs) have been utilized to construct favorable channels to improve the performance of wireless systems. A common feature between relay systems and IRS-aided systems is the two-hop multiple-input multiple-output (MIMO) channel. As a result, the mutual information (MI) of two-hop MIMO channels has been widely investigated with very engaging… ▽ More Multi-antenna relays and intelligent reflecting surfaces (IRSs) have been utilized to construct favorable channels to improve the performance of wireless systems. A common feature between relay systems and IRS-aided systems is the two-hop multiple-input multiple-output (MIMO) channel. As a result, the mutual information (MI) of two-hop MIMO channels has been widely investigated with very engaging results. However, a rigorous investigation on the fundamental limits of two-hop MIMO channels, i.e., the first and second-order analysis, is not yet available in the literature, due to the difficulties caused by the two-hop (product) channel and the noise introduced by the relay (active IRS). In this paper, we employ large-scale random matrix theory (RMT), specifically Gaussian tools, to derive the closed-form deterministic approximation for the mean and variance of the MI. Additionally, we determine the convergence rate for the mean, variance and the characteristic function of the MI, and prove the asymptotic Gaussianity. Furthermore, we also investigate the analytical properties of the fundamental equations that describe the closed-form approximation and prove the existence and uniqueness of the solution. An iterative algorithm is then proposed to obtain the solution for the fundamental equations. Numerical results validate the accuracy of the theoretical analysis. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03025 [pdf, other]

Understanding and Guiding Weakly Supervised Entity Alignment with Potential Isomorphism Propagation

Authors: Yuanyi Wang, Wei Tang, Haifeng Sun, Zirui Zhuang, Xiaoyuan Fu, Jingyu Wang, Qi Qi, Jianxin Liao

Abstract: Weakly Supervised Entity Alignment (EA) is the task of identifying equivalent entities across diverse knowledge graphs (KGs) using only a limited number of seed alignments. Despite substantial advances in aggregation-based weakly supervised EA, the underlying mechanisms in this setting remain unexplored. In this paper, we present a propagation perspective to analyze weakly supervised EA and explai… ▽ More Weakly Supervised Entity Alignment (EA) is the task of identifying equivalent entities across diverse knowledge graphs (KGs) using only a limited number of seed alignments. Despite substantial advances in aggregation-based weakly supervised EA, the underlying mechanisms in this setting remain unexplored. In this paper, we present a propagation perspective to analyze weakly supervised EA and explain the existing aggregation-based EA models. Our theoretical analysis reveals that these models essentially seek propagation operators for pairwise entity similarities. We further prove that, despite the structural heterogeneity of different KGs, the potentially aligned entities within aggregation-based EA models have isomorphic subgraphs, which is the core premise of EA but has not been investigated. Leveraging this insight, we introduce a potential isomorphism propagation operator to enhance the propagation of neighborhood information across KGs. We develop a general EA framework, PipEA, incorporating this operator to improve the accuracy of every type of aggregation-based model without altering the learning process. Extensive experiments substantiate our theoretical findings and demonstrate PipEA's significant performance gains over state-of-the-art weakly supervised EA methods. Our work not only advances the field but also enhances our comprehension of aggregation-based weakly supervised EA. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02685 [pdf, other]

Intrinsic nonlinear Hall effect in two-dimensional honeycomb topological antiferromagnets

Authors: Zheng-Yang Zhuang, Zhongbo Yan

Abstract: Two-dimensional systems with honeycomb lattice are known to be a paradigmatic platform to explore the various types of Hall effects, owing to that the interplay of lattice geometry, spin-orbit coupling and magnetism can give rise to very rich features in the quantum geometry of wave functions. In this work, we consider honeycomb topological antiferromagets that are effectively described by a… ▽ More Two-dimensional systems with honeycomb lattice are known to be a paradigmatic platform to explore the various types of Hall effects, owing to that the interplay of lattice geometry, spin-orbit coupling and magnetism can give rise to very rich features in the quantum geometry of wave functions. In this work, we consider honeycomb topological antiferromagets that are effectively described by a $\mathcal{PT}$-symmetric antiferromagnetic Kane-Mele model, and explore the evolution of its nonlinear Hall response with respect to the change of lattice anisotropy, chemical potential, and the direction of the Néel vector. Due to the $\mathcal{PT}$-symmetry, the leading-order Hall effect of quantum geometric origin is the intrinsic nonlinear Hall effect, which is a second-order effect of electric fields and is independent of the scattering time. We investigate the behavior of the intrinsic nonlinear Hall conductivity tensor across topological phase transitions driven by antiferromagnetic exchange field and lattice anisotropy and find that its components do not change sign, which is different from the extrinsic nonlinear Hall effect. In the weakly doped regime, we find that the intrinsic nonlinear Hall effect is valley-polarized. By varying the chemical potential, we find that the nonlinear Hall conductivity tensors exhibit kinks when the Fermi surface undergoes Lifshitz transitions. Furthermore, we find that the existence of spin-orbit coupling to lift the spin-rotation symmetry is decisive for the use of intrinsic nonlinear Hall effect to detect the direction of the Néel vector. Our work shows that the two-dimensional honeycomb topological antiferromagnets are an ideal class of material systems with rich properties for the study of intrinsic nonlinear Hall effect. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 9 pages, 7 figures

arXiv:2401.16452 [pdf, other]

Context-Former: Stitching via Latent Conditioned Sequence Modeling

Authors: Ziqi Zhang, Jingzehua Xu, Jinxin Liu, Zifeng Zhuang, Donglin Wang, Miao Liu, Shuai Zhang

Abstract: Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as sequence modeling, showcasing competitive performance on offline RL benchmarks. However, recent studies demonstrate that DT lacks of stitching capacity, thus explo… ▽ More Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as sequence modeling, showcasing competitive performance on offline RL benchmarks. However, recent studies demonstrate that DT lacks of stitching capacity, thus exploiting stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our approach, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multiple IL settings. 2) More importantly, we conduct a comparison of ContextFormer with various competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance. △ Less

Submitted 27 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15904 [pdf, other]

Boundary touching probability and nested-path exponent for non-simple CLE

Authors: Morris Ang, Xin Sun, Pu Yu, Zijie Zhuang

Abstract: The conformal loop ensemble (CLE) has two phases: for $κ\in (8/3, 4]$, the loops are simple and do not touch each other or the boundary; for $κ\in (4,8)$, the loops are non-simple and may touch each other and the boundary. We derive the probability that the loop surrounding a given point touches the domain boundary. We also obtain the law of the conformal radius of this loop seen from the given po… ▽ More The conformal loop ensemble (CLE) has two phases: for $κ\in (8/3, 4]$, the loops are simple and do not touch each other or the boundary; for $κ\in (4,8)$, the loops are non-simple and may touch each other and the boundary. We derive the probability that the loop surrounding a given point touches the domain boundary. We also obtain the law of the conformal radius of this loop seen from the given point conditioned on the loop touching the boundary or not, refining a result of Schramm-Sheffield-Wilson (2009). As an application, we exactly evaluate the CLE counterpart of the nested-path exponent for the Fortuin-Kasteleyn (FK) random cluster model recently introduced by Song-Tan-Zhang-Jacobsen-Nienhuis-Deng (2022). This exponent describes the asymptotic behavior of the number of nested open paths in the open cluster containing the origin when the cluster is large. For Bernoulli percolation, which corresponds to $κ=6$, the exponent was derived recently in Song-Jacobsen-Nienhuis-Sportiello-Deng (2023) by a color switching argument. For $κ\neq 6$, and in particular for the FK-Ising case, our formula appears to be new. Our derivation begins with Sheffield's construction of CLE from which the quantities of interest can be expressed by radial SLE. We solve the radial SLE problem using the coupling between SLE and Liouville quantum gravity, along with the exact solvability of Liouville conformal field theory. △ Less

Submitted 14 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 42 pages, 14 figures; minor revision, added discussion on the FK Ising case

arXiv:2401.12030 [pdf]

Topological Nature of Radiation Asymmetry in Bilayer Metagratings

Authors: Ze-Peng Zhuang, Hao-Long Zeng, Xiao-Dong Chen, Xin-Tao He, Jian-Wen Dong

Abstract: Manipulating radiation asymmetry of photonic structures is of particular interest in many photonic applications such as directional optical antenna, high efficiency on-chip lasers, and coherent light control. Here, we proposed a term of pseudo-polarization to reveal topological nature of radiation asymmetry in bilayer metagratings. Robust pseudo-polarization vortex with an integer topological char… ▽ More Manipulating radiation asymmetry of photonic structures is of particular interest in many photonic applications such as directional optical antenna, high efficiency on-chip lasers, and coherent light control. Here, we proposed a term of pseudo-polarization to reveal topological nature of radiation asymmetry in bilayer metagratings. Robust pseudo-polarization vortex with an integer topological charge exists in P-symmetry metagrating, allowing for tunable directionality ranging from -1 to 1 in synthetic parameter space. When P-symmetry-breaking, such vortex becomes pairs of C points due to the conservation law of charge, leading to the phase difference of radiation asymmetry from π/2 to 3π/2. Furthermore, topologically enabled coherent perfect absorption is robust with customized phase difference at will between two counter-propagating external light sources. This work can not only enrich the understanding of two particular topological photonic behavriors, i.e., bound state in the continuum and unidirectional guided resonance, but also provide a topological view on radiation asymmetry, opening an unexplored avenue for asymmetric light manipulation in on-chip laser, light-light switch and quantum emitters. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11834 [pdf, other]

End-to-end Multi-Instance Robotic Reaching from Monocular Vision

Authors: Zheyu Zhuang, Xin Yu, Robert Mahony

Abstract: Multi-instance scenes are especially challenging for end-to-end visuomotor (image-to-control) learning algorithms. "Pipeline" visual servo control algorithms use separate detection, selection and servo stages, allowing algorithms to focus on a single object instance during servo control. End-to-end systems do not have separate detection and selection stages and need to address the visual ambiguiti… ▽ More Multi-instance scenes are especially challenging for end-to-end visuomotor (image-to-control) learning algorithms. "Pipeline" visual servo control algorithms use separate detection, selection and servo stages, allowing algorithms to focus on a single object instance during servo control. End-to-end systems do not have separate detection and selection stages and need to address the visual ambiguities introduced by the presence of arbitrary number of visually identical or similar objects during servo control. However, end-to-end schemes avoid embedding errors from detection and selection stages in the servo control behaviour, are more dynamically robust to changing scenes, and are algorithmically simpler. In this paper, we present a real-time end-to-end visuomotor learning algorithm for multi-instance reaching. The proposed algorithm uses a monocular RGB image and the manipulator's joint angles as the input to a light-weight fully-convolutional network (FCN) to generate control candidates. A key innovation of the proposed method is identifying the optimal control candidate by regressing a control-Lyapunov function (cLf) value. The multi-instance capability emerges naturally from the stability analysis associated with the cLf formulation. We demonstrate the proposed algorithm effectively reaching and grasping objects from different categories on a table-top amid other instances and distractors from an over-the-shoulder monocular RGB camera. The network is able to run up to approximately 160 fps during inference on one GTX 1080 Ti GPU. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: This manuscript was published in ICRA21, not a new paper

arXiv:2401.09489 [pdf]

PUPAE: Intuitive and Actionable Explanations for Time Series Anomalies

Authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn J. Keogh

Abstract: In recent years there has been significant progress in time series anomaly detection. However, after detecting an (perhaps tentative) anomaly, can we explain it? Such explanations would be useful to triage anomalies. For example, in an oil refinery, should we respond to an anomaly by dispatching a hydraulic engineer, or an intern to replace the battery on a sensor? There have been some parallel ef… ▽ More In recent years there has been significant progress in time series anomaly detection. However, after detecting an (perhaps tentative) anomaly, can we explain it? Such explanations would be useful to triage anomalies. For example, in an oil refinery, should we respond to an anomaly by dispatching a hydraulic engineer, or an intern to replace the battery on a sensor? There have been some parallel efforts to explain anomalies, however many proposed techniques produce explanations that are indirect, and often seem more complex than the anomaly they seek to explain. Our review of the literature/checklists/user-manuals used by frontline practitioners in various domains reveals an interesting near-universal commonality. Most practitioners discuss, explain and report anomalies in the following format: The anomaly would be like normal data A, if not for the corruption B. The reader will appreciate that is a type of counterfactual explanation. In this work we introduce a domain agnostic counterfactual explanation technique to produce explanations for time series anomalies. As we will show, our method can produce both visual and text-based explanations that are objectively correct, intuitive and in many circumstances, directly actionable. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 9 Page Manuscript, 1 Page Supplementary (Supplement not published in conference proceedings.)

Journal ref: SIAM SDM 2024

arXiv:2401.02987 [pdf, other]

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Authors: Prince Aboagye, Yan Zheng, Junpeng Wang, Uday Singh Saini, Xin Dai, Michael Yeh, Yujie Fan, Zhongfang Zhuang, Shubham Jain, Liang Wang, Wei Zhang

Abstract: The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta-featur… ▽ More The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models. △ Less

Submitted 14 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.15721 [pdf, ps, other]

UAV Trajectory Tracking via RNN-enhanced IMM-KF with ADS-B Data

Authors: Yian Zhu, Ziye Jia, Qihui Wu, Chao Dong, Zirui Zhuang, Huiling Hu, Qi Cai

Abstract: With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic de… ▽ More With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic dependent surveillance-broadcast (ADS-B) emerges as a promising method to monitor UAVs, due to the advantages of real-time capabilities, easy deployment and affordable cost. Therefore, we employ the ADS-B for UAV trajectory tracking in this work. However, the inherent noise in the transmitted data poses an obstacle for precisely tracking UAVs. Hence, we propose the algorithm of recurrent neural network-enhanced interacting multiple model-Kalman filter (RNN-enhanced IMM-KF) for UAV trajectory filtering. Specifically, the algorithm utilizes the RNN to capture the maneuvering behavior of UAVs and the noise level in the ADS-B data. Moreover, accurate UAV tracking is achieved by adaptively adjusting the process noise matrix and observation noise matrix of IMM-KF with the assistance of the RNN. The proposed algorithm can facilitate GSs to make timely decisions during trajectory deviations of UAVs and improve the airspace safety. Finally, via comprehensive simulations, the total root mean square error of the proposed algorithm decreases by 28.56%, compared to the traditional IMM-KF. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.07624 [pdf, other]

A dynamical clipping approach with task feedback for Proximal Policy Optimization

Authors: Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Jinxin Liu, Donglin wang, Shuai Zhang

Abstract: Proximal Policy Optimization (PPO) has been broadly applied to various domains, including Large Language Model (LLM) optimization and Robotics learning, etc. However, PPO is limited by a fixed setting for the clipping bound. Specifically, there is no theoretical proof that the optimal clipping bound remains consistent throughout the entire training process. Truncating the ratio of the new and old… ▽ More Proximal Policy Optimization (PPO) has been broadly applied to various domains, including Large Language Model (LLM) optimization and Robotics learning, etc. However, PPO is limited by a fixed setting for the clipping bound. Specifically, there is no theoretical proof that the optimal clipping bound remains consistent throughout the entire training process. Truncating the ratio of the new and old policies with a unique clipping bound ensures stable training and can achieve the best training performance. Additionally, previous research suggests that a fixed clipping bound limits the agent's exploration. Therefore, researching a dynamical clipping bound to enhance PPO's performance can be highly beneficial. Different from previous clipping approaches, we consider increasing the maximum cumulative Return in reinforcement learning (RL) tasks as the preference of the RL task, and propose a bi-level proximal policy optimization paradigm, which involves not only optimizing the policy but also dynamically adjusting the clipping bound to reflect the preference of the RL tasks to further elevate the training outcomes and stability of PPO. Based on this bi-level proximal policy optimization paradigm, we introduce a new algorithm named Preference based Proximal Policy Optimization (Pb-PPO). This algorithm utilizes a multi-armed bandit algorithm to reflect RL preferences (we also validate that such approach can be utilized to reflect human preference), recommending the optimal clipping bound for PPO in each epoch, thereby achieving more stable and better training outcomes. △ Less

Submitted 7 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.00951 [pdf, other]

AV4EV: Open-Source Modular Autonomous Electric Vehicle Platform for Making Mobility Research Accessible

Authors: Zhijie Qiao, Mingyan Zhou, Zhijun Zhuang, Tejas Agarwal, Felix Jahncke, Po-Jen Wang, Jason Friedman, Hongyi Lai, Divyanshu Sahu, Tomáš Nagy, Martin Endler, Jason Schlessman, Rahul Mangharam

Abstract: When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated fac… ▽ More When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated facilities. On the other hand, 1/10th-1/16th scaled-down vehicle platforms are more affordable but have limited similitude in performance and drivability. To address this issue, we present the design of a one-third-scale autonomous electric go-kart platform with open-source mechatronics design along with fully functional autonomous driving software. The platform's multi-modal driving system is capable of manual, autonomous, and teleoperation driving modes. It also features a flexible sensing suite for the algorithm deployment across perception, localization, planning, and control. This development serves as a bridge between full-scale vehicles and reduced-scale cars while accelerating cost-effective algorithmic advancements. Our experimental results demonstrate the AV4EV platform's capabilities and ease of use for developing new AV algorithms. All materials are available at AV4EV.org to stimulate collaborative efforts within the AV and electric vehicle (EV) communities. △ Less

Submitted 12 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 6 pages, 5 figures

arXiv:2311.17945 [pdf, other]

Contrastive Vision-Language Alignment Makes Efficient Instruction Learner

Authors: Lizhao Liu, Xinyu Sun, Tianhang Xiang, Zhuangwei Zhuang, Liuren Yin, Mingkui Tan

Abstract: We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) a… ▽ More We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) and the LLM by a generative image captioning loss. However, we find that the generative objective can only produce weak alignment for vision and language, making the aligned vision-language model very hungry for the instruction fine-tuning data. In this paper, we propose CG-VLM that applies both Contrastive and Generative alignment objectives to effectively align the representation of ViT and LLM. Different from image level and sentence level alignment in common contrastive learning settings, CG-VLM aligns the image-patch level features and text-token level embeddings, which, however, is very hard to achieve as no explicit grounding patch-token relation provided in standard image captioning datasets. To address this issue, we propose to maximize the averaged similarity between pooled image-patch features and text-token embeddings. Extensive experiments demonstrate that the proposed CG-VLM produces strong vision-language alignment and is an efficient instruction learner. For example, using only 10% instruction tuning data, we reach 95% performance of state-of-the-art method LLaVA [29] on the zero-shot ScienceQA-Image benchmark. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 17 pages, 10 pages for main paper, 7 pages for supplementary

arXiv:2311.12889 [pdf, other]

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Authors: Bowen Jiang, Zhijun Zhuang, Camillo Jose Taylor

Abstract: This work presents an enhanced approach to generating scene graphs by incorporating a relationship hierarchy and commonsense knowledge. Specifically, we propose a Bayesian classification head that exploits an informative hierarchical structure. It jointly predicts the super-category or type of relationship between the two objects, along with the detailed relationship under each super-category. We… ▽ More This work presents an enhanced approach to generating scene graphs by incorporating a relationship hierarchy and commonsense knowledge. Specifically, we propose a Bayesian classification head that exploits an informative hierarchical structure. It jointly predicts the super-category or type of relationship between the two objects, along with the detailed relationship under each super-category. We design a commonsense validation pipeline that uses a large language model to critique the results from the scene graph prediction system and then use that feedback to enhance the model performance. The system requires no external large language model assistance at test time, making it more convenient for practical applications. Experiments on the Visual Genome and the OpenImage V6 datasets demonstrate that harnessing hierarchical relationships enhances the model performance by a large margin. The proposed Bayesian head can also be incorporated as a portable module in existing scene graph generation algorithms to improve their results. In addition, the commonsense validation enables the model to generate an extensive set of reasonable predictions beyond dataset annotations. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.06015 [pdf]

RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

Authors: Hongyin Zhang, Diyuan Shi, Zifeng Zhuang, Han Zhao, Zhenyu Wei, Feng Zhao, Sibo Gai, Shangke Lyu, Donglin Wang

Abstract: Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed t… ▽ More Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.03393 [pdf, other]

Sketching Multidimensional Time Series for Fast Discord Mining

Authors: Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang, Jeff M. Phillips, Eamonn Keogh

Abstract: Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with… ▽ More Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data. △ Less

Submitted 7 December, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.02563 [pdf, other]

Time Series Synthesis Using the Matrix Profile for Anonymization

Authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

Abstract: Publishing and sharing data is crucial for the data mining community, allowing collaboration and driving open innovation. However, many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information. To alleviate such issues, we propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be rel… ▽ More Publishing and sharing data is crucial for the data mining community, allowing collaboration and driving open innovation. However, many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information. To alleviate such issues, we propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data. The TSSUMP method synthesizes time series by preserving similarity join information (i.e., Matrix Profile) while reducing the correlation between the synthesized and the original time series. As a result, neither the values for the individual time steps nor the local patterns (or shapes) from the original data can be recovered, yet the resulting data can be used for downstream tasks that data analysts are interested in. We concentrate on similarity joins because they are one of the most widely applied time series data mining routines across different data mining tasks. We test our method on a case study of ECG and gender masking prediction. In this case study, the gender information is not only removed from the synthesized time series, but the synthesized time series also preserves enough information from the original time series. As a result, unmodified data mining tools can obtain near-identical performance on the synthesized time series as on the original time series. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.02561 [pdf, other]

Ego-Network Transformer for Subsequence Classification in Time Series Data

Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Yujie Fan, Xin Dai, Yan Zheng, Vivian Lai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

Abstract: Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequen… ▽ More Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequences. Successfully classifying these relevant subsequences requires not only distinguishing between different classes but also accurately identifying the foreground subsequences amidst the background. To address this challenge, we propose a novel subsequence classification method that represents each subsequence as an ego-network, providing crucial nearest neighbor information to the model. The ego-networks of all subsequences collectively form a time series subsequence graph, and we introduce an algorithm to efficiently construct this graph. Furthermore, we have demonstrated the significance of enforcing temporal consistency in the prediction of adjacent subsequences for the subsequence classification problem. To evaluate the effectiveness of our approach, we conducted experiments using 128 univariate and 30 multivariate time series datasets. The experimental results demonstrate the superior performance of our method compared to alternative approaches. Specifically, our method outperforms the baseline on 104 out of 158 datasets. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.02560 [pdf, other]

Temporal Treasure Hunt: Content-based Time Series Retrieval System for Discovering Insights

Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Yujie Fan, Vivian Lai, Junpeng Wang, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang

Abstract: Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single d… ▽ More Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single domain database, which can be inadequate if the user does not know the source of the query time series. This limitation motivates us to investigate the CTSR problem in a scenario where the database contains time series from multiple domains. To facilitate this investigation, we introduce a CTSR benchmark dataset that comprises time series data from a variety of domains, such as motion, power demand, and traffic. This dataset is sourced from a publicly available time series classification dataset archive, making it easily accessible to researchers in the field. We compare several popular methods for modeling and retrieving time series data using this benchmark dataset. Additionally, we propose a novel distance learning model that outperforms the existing methods. Overall, our study highlights the importance of addressing the CTSR problem across multiple domains and provides a useful benchmark dataset for future research. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Showing 1–50 of 229 results for author: Zhuang, Z