subscribe to arXiv mailings

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07441 [pdf, other]

doi 10.1109/TIP.2024.3425048

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

Authors: Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen, Guangwei Gao

Abstract: Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model th… ▽ More Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a correlation-weighted Fusion (cwF) module selectively merges diverse feature representations, significantly enhancing predictive accuracy. HAFormer achieves high performance with minimal computational overhead and compact model size, achieving 74.2% mIoU on Cityscapes and 71.1% mIoU on CamVid test datasets, with frame rates of 105FPS and 118FPS on a single 2080Ti GPU. The source codes are available at https://github.com/XU-GITHUB-curry/HAFormer. △ Less

Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 13 pages, 10 figures, 8 tables, IEEE Transactions on Image Processing

arXiv:2407.02695 [pdf, other]

Topics in Weyl Geometry and Quantum Anomalies

Authors: Weizhen Jia

Abstract: The first part of this thesis focuses on the Weyl-covariant nature of holography. We generalize the Fefferman-Graham (FG) ambient construction for conformal geometry to a corresponding construction for Weyl geometry. Through the Weyl-ambient construction, we investigate Weyl-covariant quantities on the Weyl manifold and define Weyl-obstruction tensors. We show that Weyl-obstruction tensors appear… ▽ More The first part of this thesis focuses on the Weyl-covariant nature of holography. We generalize the Fefferman-Graham (FG) ambient construction for conformal geometry to a corresponding construction for Weyl geometry. Through the Weyl-ambient construction, we investigate Weyl-covariant quantities on the Weyl manifold and define Weyl-obstruction tensors. We show that Weyl-obstruction tensors appear as poles in the Fefferman-Graham expansion of the ALAdS bulk metric for even boundary dimensions. Under holographic renormalization in the Weyl-Fefferman-Graham gauge, we compute the Weyl anomaly of the boundary theory in multiple dimensions and demonstrate that Weyl-obstruction tensors can be used as the building blocks for the Weyl anomaly of the dual quantum field theory (QFT). The holographic calculation with a background Weyl geometry also suggests an underlying geometric interpretation of the Weyl anomaly. The second part of this thesis is devoted to understanding the geometric nature of the BRST formalism and quantum anomalies. Using the language of Lie algebroids, the BRST complex can be encoded in the exterior algebra of an Atiyah Lie algebroid derived from the principal bundle of the gauge theory. We showed that the cohomology of an Atiyah Lie algebroid in a trivialization gives rise to the BRST cohomology. We then apply the Lie algebroid cohomology in studying quantum anomalies and demonstrate the computation for chiral and Lorentz-Weyl anomalies. In particular, we pay close attention to the fact that the geometric intuition afforded by the Lie algebroid (which was absent in the traditional BRST complex) provides hints of a deeper picture that simultaneously geometrizes the consistent and covariant forms of the anomaly. In the algebroid construction, the difference between the consistent and covariant anomalies is simply a different choice of basis. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 178 pages, 3 figures; Ph.D. dissertation

arXiv:2407.01026 [pdf, other]

Augmenting Document-level Relation Extraction with Efficient Multi-Supervision

Authors: Xiangyu Lin, Weijia Jia, Zhiguo Gong

Abstract: Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilizati… ▽ More Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilization of distantly supervised training data, we propose Efficient Multi-Supervision for document-level relation extraction, in which we first select a subset of informative documents from the massive dataset by combining distant supervision with expert supervision, then train the model with Multi-Supervision Ranking Loss that integrates the knowledge from multiple sources of supervision to alleviate the effects of noise. The experiments demonstrate the effectiveness of our method in improving the model performance with higher time efficiency than existing baselines. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.14663 [pdf, other]

MagMar III -- Resisting the Pressure, Is the Magnetic Field Overwhelmed in NGC6334I?

Authors: Paulo C. Cortes, Josep M. Girart, Patricio Sanhueza, Junhao Liu, Sergio Martin, Ian W. Stephens, Henrik Beuther, Patrick M. Koch, M. Fernandez-Lopez, Alvaro Sanchez-Monge, Jia-Wei Wang, Kaho Morii, Shanghuo Li, Piyali Saha, Qizhou Zhang, David Rebolledo, Luis A. Zapata, Ji-hyun Kang, Wenyu Jiao, Jongsoo Kim, Yu Cheng, Jihye Hwang, Eun Jung Chung, Spandan Choudhury, A-Ran Lyo , et al. (1 additional authors not shown)

Abstract: We report on ALMA observations of polarized dust emission at 1.2 mm from NGC6334I, a source known for its significant flux outbursts. Between five months, our data show no substantial change in total intensity and a modest 8\% variation in linear polarization, suggesting a phase of stability or the conclusion of the outburst. The magnetic field, inferred from this polarized emission, displays a pr… ▽ More We report on ALMA observations of polarized dust emission at 1.2 mm from NGC6334I, a source known for its significant flux outbursts. Between five months, our data show no substantial change in total intensity and a modest 8\% variation in linear polarization, suggesting a phase of stability or the conclusion of the outburst. The magnetic field, inferred from this polarized emission, displays a predominantly radial pattern from North-West to South-East with intricate disturbances across major cores, hinting at spiral structures. Energy analysis of CS$(J=5 \rightarrow 4)$ emission yields an outflow energy of approximately $3.5\times10^{45}$ ergs, aligning with previous interferometric studies. Utilizing the Davis-Chandrasekhar-Fermi method, we determined magnetic field strengths ranging from 1 to 11 mG, averaging at 1.9 mG. This average increases to 4 $\pm 1$ mG when incorporating Zeeman measurements. Comparative analyses using gravitational, thermal, and kinetic energy maps reveal that magnetic energy is significantly weaker, possibly explaining the observed field morphology. We also find that the energy in the outflows and the expanding cometary {\HII} region is also larger than the magnetic energy, suggesting that protostellar feedback maybe the dominant driver behind the injection of turbulence in NGC6334I at the scales sampled by our data. The gas in NGC6334I predominantly exhibits supersonic and trans-Alfvenic conditions, transitioning towards a super-Alfvenic regime, underscoring a diminished influence of the magnetic field with increasing gas density. These observations are in agreement with prior polarization studies at 220 GHz, enriching our understanding of the dynamic processes in high-mass star-forming regions. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted for Publication at the Astrophysical Journal

arXiv:2406.13404 [pdf, other]

Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing

Authors: Mengjie Liu, Yihua Li, Fangyi Mou, Zhiqing Tang, Jiong Lou, Jianxiong Guo, Weijia Jia

Abstract: Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers.… ▽ More Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers. The dynamic and resource-constrained nature of meta computing environments requires an optimal container migration strategy for mobile users to minimize latency. However, the problem of container migration in meta computing has not been thoroughly explored. To address this gap, we present low-latency, layer-aware container migration strategies that consider both proactive and passive migration. Specifically: 1) We formulate the container migration problem in meta computing, taking into account layer dependencies to reduce migration costs and overall task duration by considering four delays. 2) We introduce a reinforcement learning algorithm based on policy gradients to minimize total latency by identifying layer dependencies for action selection, making decisions for both proactive and passive migration. Expert demonstrations are introduced to enhance exploitation. 3) Experiments using real data trajectories show that the algorithm outperforms baseline algorithms, achieving lower total latency. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: to be published in IEEE ICMC 2024

arXiv:2406.13399 [pdf, other]

VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

Authors: Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

Abstract: The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti… ▽ More The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: to be published in IEEE ICWS 2024

arXiv:2406.13381 [pdf, other]

CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

Authors: Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao

Abstract: Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning… ▽ More Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at https://github.com/xmhou2002/CoAct. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures

arXiv:2406.08308 [pdf, other]

FSH: 3D Representation via Fibonacci Spherical Harmonics

Authors: Zikuan Li, Anyi Huang, Wenru Jia, Qiaoyun Wu, Mingqiang Wei, Jun Wang

Abstract: Spherical harmonics are a favorable technique for 3D representation, employing a frequency-based approach through the spherical harmonic transform (SHT). Typically, SHT is performed using equiangular sampling grids. However, these grids are non-uniform on spherical surfaces and exhibit local anisotropy, a common limitation in existing spherical harmonic decomposition methods. This paper proposes a… ▽ More Spherical harmonics are a favorable technique for 3D representation, employing a frequency-based approach through the spherical harmonic transform (SHT). Typically, SHT is performed using equiangular sampling grids. However, these grids are non-uniform on spherical surfaces and exhibit local anisotropy, a common limitation in existing spherical harmonic decomposition methods. This paper proposes a 3D representation method using Fibonacci Spherical Harmonics (FSH). We introduce a spherical Fibonacci grid (SFG), which is more uniform than equiangular grids for SHT in the frequency domain. Our method employs analytical weights for SHT on SFG, effectively assigning sampling errors to spherical harmonic degrees higher than the recovered band-limited function. This provides a novel solution for spherical harmonic transformation on non-equiangular grids. The key advantages of our FSH method include: 1) With the same number of sampling points, SFG captures more features without bias compared to equiangular grids; 2) The root mean square error of 32-degree spherical harmonic coefficients is reduced by approximately 34.6\% for SFG compared to equiangular grids; and 3) FSH offers more stable frequency domain representations, especially for rotating functions. FSH enhances the stability of frequency domain representations under rotational transformations. Its application in 3D shape reconstruction and 3D shape classification results in more accurate and robust representations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06561 [pdf, other]

Brainstorming Brings Power to Large Language Models of Knowledge Reasoning

Authors: Zining Qin, Chenhao Wang, Huiling Qin, Weijia Jia

Abstract: Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collab… ▽ More Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration. However, models with different capabilities may produce conflicting answers on the same problem, and how to reasonably obtain the correct answer from multiple candidate models has become a challenging problem. In this paper, we propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached within the group. We conducted experiments on three different types of datasets, and demonstrate that the brainstorming can significantly improve the effectiveness in logical reasoning and fact extraction. Furthermore, we find that two small-parameter models can achieve accuracy approximating that of larger-parameter models through brainstorming, which provides a new solution for distributed deployment of LLMs. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.03246 [pdf, other]

Intrinsic permeability of heterogeneous porous media

Authors: Wenqiao Jiao, David Scheidweiler, Nolwenn Delouche, Alberto Guadagnini, Pietro de Anna

Abstract: Providing a sound appraisal of the nature of the relationship between flow $(Q)$ and pressure drop $(ΔP)$ for porous media is a long-standing fundamental research challenge. A wide variety of environmental, societal and industrial issues, ranging, e.g., from water-soil system remediation to subsurface energy optimization, is affected by this critical issue. While such dependence is well represente… ▽ More Providing a sound appraisal of the nature of the relationship between flow $(Q)$ and pressure drop $(ΔP)$ for porous media is a long-standing fundamental research challenge. A wide variety of environmental, societal and industrial issues, ranging, e.g., from water-soil system remediation to subsurface energy optimization, is affected by this critical issue. While such dependence is well represented by the Kozeny-Carman formulation for homogeneous media, the fundamental nature of such a relationship ($Q$ vs $ΔP$) within heterogeneous porous systems characterized by a broad range of pore sizes is still not fully understood. We design a set of controlled and complex porous structures and quantify their intrinsic permeability through detailed high quality microfluidics experiments. We synthesize the results upon deriving an original analytical formulation relating the overall intrinsic permeability of the porous structure and their key features. Our formulation explicitly embeds the spatial variability of pore sizes into the medium permeability through a conceptualization of the system as a collection of smaller scale porous media arranged in series. The resulting analytical formulation yields permeability values matching their experimentally-based counterparts without the need of additional tunable parameters. Our study then documents and supports the strong role played by the micro-structure on the overall medium permeability. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 12 pages, 6 figures

arXiv:2405.14312 [pdf, other]

Improving Gloss-free Sign Language Translation by Reducing Representation Density

Authors: Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong

Abstract: Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem des… ▽ More Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem describes that the visual representations of semantically distinct sign gestures tend to be closely packed together in feature space, which makes gloss-free methods struggle with distinguishing different sign gestures and suffer from a sharp performance drop. To address the representation density problem, we introduce a simple but effective contrastive learning strategy, namely SignCL, which encourages gloss-free models to learn more discriminative feature representation in a self-supervised manner. Our experiments demonstrate that the proposed SignCL can significantly reduce the representation density and improve performance across various translation frameworks. Specifically, SignCL achieves a significant improvement in BLEU score for the Sign Language Transformer and GFSLT-VLP on the CSL-Daily dataset by 39% and 46%, respectively, without any increase of model parameters. Compared to Sign2GPT, a state-of-the-art method based on large-scale pre-trained vision and language models, SignCL achieves better performance with only 35% of its parameters. Implementation and Checkpoints are available at https://github.com/JinhuiYE/SignCL. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Representation Density and Performance Drop

arXiv:2405.12834 [pdf, other]

Effect of Synthetic Jets Actuator Parameters on Deep Reinforcement Learning-Based Flow Control Performance in a Square Cylinder

Authors: Wang Jia, Hang Xu

Abstract: We utilize deep reinforcement learning (DRL) algorithms to precisely control the mass flow rates of synthetic jets located on the upper and lower surfaces of a square cylinder for active flow control. Through DRL-based active flow control (AFC) technology, we significantly reduce the lift and drag coefficients of the square cylinder at Reynolds number (Re) = 100 and Re=500, while completely suppre… ▽ More We utilize deep reinforcement learning (DRL) algorithms to precisely control the mass flow rates of synthetic jets located on the upper and lower surfaces of a square cylinder for active flow control. Through DRL-based active flow control (AFC) technology, we significantly reduce the lift and drag coefficients of the square cylinder at Reynolds number (Re) = 100 and Re=500, while completely suppressing vortex shedding in the wake flow field. Additionally, we conduct a sensitivity analysis of the position and width parameters of the synthetic jets regarding flow control performance. Our observations indicate that positioning the synthetic jets near the trailing edge corners of the square cylinder, rather than the leading edge corners, can completely suppress vortex shedding, resulting in more stable lift and drag coefficients in the controlled flow. When the synthetic jets are positioned at the trailing edge corners, flow control reduces the mean drag coefficient by 14.4% and the standard deviation of the lift coefficient by 86.1% for the baseline flow at Re=100. For the baseline flow at Re=500, flow control reduces the mean drag coefficient by 51.4% and the standard deviation of the lift coefficient by 90.5%. At both Reynolds numbers, vortex shedding in the wake flow field is completely suppressed. Furthermore, using narrower synthetic jets results in a lower reduction rate of the standard deviations of the lift and drag coefficients, while increasing the mean and standard deviation of the mass flow rate of the jets used for flow control. This study provides guidance on optimizing the width and position of synthetic jets for DRL-based active flow control. △ Less

Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.11758 [pdf, other]

Fed-Credit: Robust Federated Learning with Credibility Management

Authors: Jiayan Chen, Zhirong Qian, Tianhui Meng, Xitong Gao, Tian Wang, Weijia Jia

Abstract: Aiming at privacy preservation, Federated Learning (FL) is an emerging machine learning approach enabling model training on decentralized devices or data sources. The learning mechanism of FL relies on aggregating parameter updates from individual clients. However, this process may pose a potential security risk due to the presence of malicious devices. Existing solutions are either costly due to… ▽ More Aiming at privacy preservation, Federated Learning (FL) is an emerging machine learning approach enabling model training on decentralized devices or data sources. The learning mechanism of FL relies on aggregating parameter updates from individual clients. However, this process may pose a potential security risk due to the presence of malicious devices. Existing solutions are either costly due to the use of compute-intensive technology, or restrictive for reasons of strong assumptions such as the prior knowledge of the number of attackers and how they attack. Few methods consider both privacy constraints and uncertain attack scenarios. In this paper, we propose a robust FL approach based on the credibility management scheme, called Fed-Credit. Unlike previous studies, our approach does not require prior knowledge of the nodes and the data distribution. It maintains and employs a credibility set, which weighs the historical clients' contributions based on the similarity between the local models and global model, to adjust the global model update. The subtlety of Fed-Credit is that the time decay and attitudinal value factor are incorporated into the dynamic adjustment of the reputation weights and it boasts a computational complexity of O(n) (n is the number of the clients). We conducted extensive experiments on the MNIST and CIFAR-10 datasets under 5 types of attacks. The results exhibit superior accuracy and resilience against adversarial attacks, all while maintaining comparatively low computational complexity. Among these, on the Non-IID CIFAR-10 dataset, our algorithm exhibited performance enhancements of 19.5% and 14.5%, respectively, in comparison to the state-of-the-art algorithm when dealing with two types of data poisoning attacks. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.11586 [pdf]

The Bragg Diffraction Experiment Based on Ultrasonic Wave and Artificial Crystal Lattice

Authors: Qiusong Chen, Wei Hou, Song Lin, GaoFu Liu, Weiyao Jia

Abstract: The traditional Bragg crystal diffraction experiments use X-rays, harming the participants bodies. Therefore, many universities have not offered this basic experiment. Although microwave simulation Bragg experiments can reduce harm, there are still some potential dangers. To solve this dilemma, this article takes ultrasound as the experimental object and uses an artificial simulation of crystals t… ▽ More The traditional Bragg crystal diffraction experiments use X-rays, harming the participants bodies. Therefore, many universities have not offered this basic experiment. Although microwave simulation Bragg experiments can reduce harm, there are still some potential dangers. To solve this dilemma, this article takes ultrasound as the experimental object and uses an artificial simulation of crystals to successfully achieve the Bragg crystal diffraction effect of crystals, which is in good agreement with the theoretical predictions. This experiment is expected to be widely deployed in physics, chemistry, materials, and other science and engineering majors as a basic teaching experiment. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.10987 [pdf, other]

Manifold-based Incomplete Multi-view Clustering via Bi-Consistency Guidance

Authors: Huibing Wang, Mingze Yao, Yawei Chen, Yunqiu Xu, Haipeng Liu, Wei Jia, Xianping Fu, Yang Wang

Abstract: Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete data, the existing methods mostly attempt to recover data by adding extra terms. However, for the unsupervised methods, a simple recovery strategy… ▽ More Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete data, the existing methods mostly attempt to recover data by adding extra terms. However, for the unsupervised methods, a simple recovery strategy will cause errors and outlying value accumulations, which will affect the performance of the methods. Broadly, the previous methods have not taken the effectiveness of recovered instances into consideration, or cannot flexibly balance the discrepancies between recovered data and original data. To address these problems, we propose a novel method termed Manifold-based Incomplete Multi-view clustering via Bi-consistency guidance (MIMB), which flexibly recovers incomplete data among various views, and attempts to achieve biconsistency guidance via reverse regularization. In particular, MIMB adds reconstruction terms to representation learning by recovering missing instances, which dynamically examines the latent consensus representation. Moreover, to preserve the consistency information among multiple views, MIMB implements a biconsistency guidance strategy with reverse regularization of the consensus representation and proposes a manifold embedding measure for exploring the hidden structure of the recovered data. Notably, MIMB aims to balance the importance of different views, and introduces an adaptive weight term for each view. Finally, an optimization algorithm with an alternating iteration optimization strategy is designed for final clustering. Extensive experimental results on 6 benchmark datasets are provided to confirm that MIMB can significantly obtain superior results as compared with several state-of-the-art baselines. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.08633 [pdf, other]

On the superconducting gap structure of the miassite Rh17S15: Nodal or nodeless?

Authors: J. Y. Nie, C. C. Zhao, C. Q. Xu, B. Li, C. P. Tu, X. Zhang, D. Z. Dai, H. R. Wang, S. Xu, Wenhe Jiao, B. M. Wang, Zhu'an Xu, Xiaofeng Xu, S. Y. Li

Abstract: Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down… ▽ More Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down to 110 mK and up to a field of 8 T ($\simeq 0.4H{\rm_{c2}}$). In marked contrast to the penetration depth measurement, we observe a negligible residual linear term $κ_0/T$ in zero field, in line with the nodeless gap structure. The field dependence of $κ_0(H)/T$ shows a profile that is more consistent with either a highly anisotropic gap structure or multiple nodeless gaps with significantly different magnitudes. Moreover, first-principles calculations give two electronic bands with complex shape of Fermi surfaces. These results suggest multigap nodeless superconductivity in this multiband Rh$_{17}$S$_{15}$ superconductor. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 7 pages, 6 figures

arXiv:2405.05505 [pdf, other]

Unveiling Higher-Order Topology via Polarized Topological Charges

Authors: Wei Jia, Bao-Zong Wang, Ming-Jian Gao, Jun-Hong An

Abstract: Real-space topological invariants were widely used to characterize chiral-symmetric higher-order topological phases (HOTPs). However, a momentum-space characterization to these HOTPs, which essentially reveals their intrinsic bulk-boundary correspondence and facilitates their detection in quantum simulation systems, is still lacking. Here, we propose an experimentally observable momentum-space cha… ▽ More Real-space topological invariants were widely used to characterize chiral-symmetric higher-order topological phases (HOTPs). However, a momentum-space characterization to these HOTPs, which essentially reveals their intrinsic bulk-boundary correspondence and facilitates their detection in quantum simulation systems, is still lacking. Here, we propose an experimentally observable momentum-space characterization to the chiral-symmetric HOTPs by the concept of polarized topological charges. It provides a unified description to topological phase transitions caused by the closing and reopening of band gap not only of the bulk states but also the edge states. Remarkably, these polarized topological charges can be identified by measuring the pseudospin structures. A feasible scheme to detect the HOTPs in the $^{87}$Rb cold atomic system is given. Our work opens an avenue for characterization and experimental detection of the chiral-symmetric HOTPs in momentum space. △ Less

Submitted 20 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 8+8 pages, 4+3 figures. References are updated.Typos are corrected

arXiv:2405.03493 [pdf]

Polarization-entangled photon pair generation from an epsilon-near-zero metasurface

Authors: Wenhe Jia, Grégoire Saerens, Ülle-Linda Talts, Helena Weigand, Robert J. Chapman, Liu Li, Rachel Grange, Yuanmu Yang

Abstract: Polarization-entangled photon pair sources are essential for diverse quantum technologies, such as quantum communication, computation, and imaging. However, the generation of complex polarization-entangled quantum states has long been constrained by the available nonlinear susceptibility tensor of natural nonlinear crystals, necessitating a cumbersome and intricate setup for additional coherent su… ▽ More Polarization-entangled photon pair sources are essential for diverse quantum technologies, such as quantum communication, computation, and imaging. However, the generation of complex polarization-entangled quantum states has long been constrained by the available nonlinear susceptibility tensor of natural nonlinear crystals, necessitating a cumbersome and intricate setup for additional coherent superposition or post-selection. In this study, we introduce and experimentally demonstrate a nanoscale polarization-entangled photon pair source utilizing an artificially-engineered metamaterial platform. This platform is based on a plasmonic metasurface that is strongly coupled to an epsilon-near-zero (ENZ) material. By precisely engineering resonances at both pump and signal/idler wavelengths, and leveraging the field enhancement provided by the ENZ effect, the photon pair generation efficiency of the 68-nm-thick metasurface is significantly boosted. More notably, the ENZ metasurface platform facilitates versatile manipulation of the system's anisotropic second-order nonlinear susceptibility tensor, enabling direct control over the polarization states of the photon pairs, which leads to the generation of a polarization-entangled Bell state without the need for additional components. Our approach opens a new avenue for the simultaneous photon pair generation and quantum state engineering in a compact platform. △ Less

Submitted 13 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.01882 [pdf, other]

Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.17151 [pdf, other]

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Authors: Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

Abstract: Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a n… ▽ More Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a novel approach, named ``MorphText", to capture the regularity of texts by embedding deep morphology for arbitrary-shape text detection. Towards this end, two deep morphological modules are designed to regularize text segments and determine the linkage between them. First, a Deep Morphological Opening (DMOP) module is constructed to remove false text segment detections generated in the feature extraction process. Then, a Deep Morphological Closing (DMCL) module is proposed to allow text instances of various shapes to stretch their morphology along their most significant orientation while deriving their connections. Extensive experiments conducted on four challenging benchmark datasets (CTW1500, Total-Text, MSRA-TD500 and ICDAR2017) demonstrate that our proposed MorphText outperforms both top-down and bottom-up state-of-the-art arbitrary-shape scene text detection approaches. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by Transaction on Multimedia

arXiv:2404.16322 [pdf, other]

Bridging Speed and Accuracy to Approximate $K$-Nearest Neighbor Search

Authors: Mingyu Yang, Jiabao Jin, Xiangyu Wang, Zhitao Shen, Wei Jia, Wentao Li, Wei Wang

Abstract: Approximate K-Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. The efficiency of AKNN search largely depends on the computation of distances, a process that significantly affects the runtime. To improve computational efficiency, existing work often opts for estimating approximate distances rather than computing exact distances, at the cost of reduced… ▽ More Approximate K-Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. The efficiency of AKNN search largely depends on the computation of distances, a process that significantly affects the runtime. To improve computational efficiency, existing work often opts for estimating approximate distances rather than computing exact distances, at the cost of reduced AKNN search accuracy. The recent method of ADSampling has attempted to mitigate this problem by using random projection for distance approximations and adjusting these approximations based on error bounds to improve accuracy. However, ADSampling faces limitations in effectiveness and generality, mainly due to the suboptimality of its distance approximations and its heavy reliance on random projection matrices to obtain error bounds. In this study, we propose a new method that uses an optimal orthogonal projection instead of random projection, thereby providing improved distance approximations. Moreover, our method uses error quantiles instead of error bounds for approximation adjustment, and the derivation of error quantiles can be made independent of the projection matrix, thus extending the generality of our approach. Extensive experiments confirm the superior efficiency and effectiveness of the proposed method. In particular, compared to the state-of-the-art method of ADSampling, our method achieves a speedup of 1.6 to 2.1 times on real datasets with almost no loss of accuracy. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 13 pages

arXiv:2404.14569 [pdf, other]

LIGO operates with quantum noise below the Standard Quantum Limit

Authors: Wenxuan Jia, Victoria Xu, Kevin Kuns, Masayuki Nakano, Lisa Barsotti, Matthew Evans, Nergis Mavalvala, Rich Abbott, Ibrahim Abouelfettouh, Rana Adhikari, Alena Ananyeva, Stephen Appert, Koji Arai, Naoki Aritomi, Stuart Aston, Matthew Ball, Stefan Ballmer, David Barker, Beverly Berger, Joseph Betzwieser, Dripta Bhattacharjee, Garilynn Billingsley, Nina Bode, Edgard Bonilla, Vladimir Bossilkov , et al. (146 additional authors not shown)

Abstract: Precision measurements of space and time, like those made by the detectors of the Laser Interferometer Gravitational-wave Observatory (LIGO), are often confronted with fundamental limitations imposed by quantum mechanics. The Heisenberg uncertainty principle dictates that the position and momentum of an object cannot both be precisely measured, giving rise to an apparent limitation called the Stan… ▽ More Precision measurements of space and time, like those made by the detectors of the Laser Interferometer Gravitational-wave Observatory (LIGO), are often confronted with fundamental limitations imposed by quantum mechanics. The Heisenberg uncertainty principle dictates that the position and momentum of an object cannot both be precisely measured, giving rise to an apparent limitation called the Standard Quantum Limit (SQL). Reducing quantum noise below the SQL in gravitational-wave detectors, where photons are used to continuously measure the positions of freely falling mirrors, has been an active area of research for decades. Here we show how the LIGO A+ upgrade reduced the detectors' quantum noise below the SQL by up to 3 dB while achieving a broadband sensitivity improvement, more than two decades after this possibility was first presented. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Report number: LIGO-P2400059

arXiv:2404.13470 [pdf, other]

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Authors: Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin

Abstract: The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded… ▽ More The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.13003 [pdf, other]

Deep reinforcement learning-based active flow control of an elliptical cylinder: transitioning from an elliptical cylinder to a circular cylinder and a flat plate

Authors: Wang Jia, Hang Xu

Abstract: We study the adaptability of deep reinforcement learning (DRL)-based active flow control (AFC) technology for bluff body flows with complex geometries. It is extended from a cylinder with Ar=1 to a flat elliptical cylinder with Ar=2, slender elliptical cylinders with Ar less than 1, and a flat plate with Ar=0. The robustness and adaptability of DRL-based control technology will be assessed under v… ▽ More We study the adaptability of deep reinforcement learning (DRL)-based active flow control (AFC) technology for bluff body flows with complex geometries. It is extended from a cylinder with Ar=1 to a flat elliptical cylinder with Ar=2, slender elliptical cylinders with Ar less than 1, and a flat plate with Ar=0. The robustness and adaptability of DRL-based control technology will be assessed under varying levels of flow instability around bluff bodies. We utilize the Proximal Policy Optimization (PPO) algorithm to precisely control the mass flow rates of synthetic jets located on the upper and lower surfaces of a cylinder to achieve reduction in drag, minimization of lift, and suppression of vortex shedding. Our research findings indicate that, for elliptical cylinders with Ar between 1.75 and 0.75, the reduction in drag coefficient ranges from 0.9% to 15.7%, and the reduction in lift coefficient ranges from 95.2% to 99.7%. The DRL-based control strategy not only significantly reduces lift and drag, but also completely suppresses vortex shedding while using less than 1% of external excitation energy, demonstrating its efficiency and energy-saving capabilities. Additionally, for Ar from 0.5 to 0, the reduction in drag coefficient ranges from 26.9% to 43.6%, and the reduction in lift coefficient from 50.2% to 68.0%. This reflects the control strategy's significant reduction in both drag and lift coefficients, while also alleviating vortex shedding. The interaction and nonlinear development of vortices in the wake of elliptical cylinders lead to complex flow instability, and DRL-based AFC technology shows adaptability and potential in addressing flow control problems for this type of bluff body flow. △ Less

Submitted 23 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12553 [pdf, other]

Assessing the Longitudinal Impact of Environmental Chemical Mixtures on Children's Neurodevelopment: A Bayesian Approach

Authors: Wei Jia, Roman Jandarov

Abstract: This manuscript presents a novel Bayesian varying coefficient quantile regression (BVCQR) model designed to assess the longitudinal effects of chemical exposure mixtures on children's neurodevelopment. Recognizing the complexity and high-dimensionality of environmental exposures, the proposed approach addresses critical gaps in existing research by offering a method that can manage the sparsity of… ▽ More This manuscript presents a novel Bayesian varying coefficient quantile regression (BVCQR) model designed to assess the longitudinal effects of chemical exposure mixtures on children's neurodevelopment. Recognizing the complexity and high-dimensionality of environmental exposures, the proposed approach addresses critical gaps in existing research by offering a method that can manage the sparsity of data and provide interpretable results. The proposed BVCQR model estimates the effects of mixtures on neurodevelopmental outcomes at specific ages, leveraging a horseshoe prior for sparsity and utilizing a Bayesian method for uncertainty quantification. Our simulations demonstrate the model's robustness and effectiveness in handling high-dimensional data, offering significant improvements over traditional models. The model's application to the Health Outcomes and Measures of the Environment (HOME) Study further illustrates its utility in identifying significant chemical exposures affecting children's growth and development. The findings underscore the potential of BVCQR in environmental health research, providing a sophisticated tool for analyzing the longitudinal impact of complex chemical mixtures, with implications for future studies aimed at understanding and mitigating environmental risks to child health. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12123 [pdf, other]

Robust and Adaptive Deep Reinforcement Learning for Enhancing Flow Control around a Square Cylinder with Varying Reynolds Numbers

Authors: Wang Jia, Hang Xu

Abstract: The present study applies a Deep Reinforcement Learning (DRL) algorithm to Active Flow Control (AFC) of a two-dimensional flow around a confined square cylinder. Specifically, the Soft Actor-Critic (SAC) algorithm is employed to modulate the flow of a pair of synthetic jets placed on the upper and lower surfaces of the confined squared cylinder in flow configurations characterized by Re of 100, 20… ▽ More The present study applies a Deep Reinforcement Learning (DRL) algorithm to Active Flow Control (AFC) of a two-dimensional flow around a confined square cylinder. Specifically, the Soft Actor-Critic (SAC) algorithm is employed to modulate the flow of a pair of synthetic jets placed on the upper and lower surfaces of the confined squared cylinder in flow configurations characterized by Re of 100, 200, 300, and 400. The investigation starts with an analysis of the baseline flow in the absence of active control. It is observed that at Re = 100 and Re = 200, the vortex shedding exhibits mono-frequency characteristics. Conversely, at Re = 300 and Re = 400, the vortex shedding is dominated by multiple frequencies, which is indicative of more complex flow features. With the application of the SAC algorithm, we demonstrate the capability of DRL-based control in effectively suppressing vortex shedding, while significantly diminishing drag and fluctuations in lift. Quantitatively, the data-driven active control strategy results in a drag reduction of approximately 14.4%, 26.4%, 38.9%, and 47.0% for Re = 100, 200, 300, and 400, respectively. To understand the underlying control mechanism, we also present detailed flow field comparisons, which showcase the adaptability of DRL in devising distinct control strategies tailored to the dynamic conditions at varying Re. These findings substantiate the proficiency of DRL in controlling chaotic, multi-frequency dominated vortex shedding phenomena, underscoring the robustness of DRL in complex AFC problems. △ Less

Submitted 29 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.09790 [pdf, other]

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

arXiv:2404.08965 [pdf, other]

Seeing Text in the Dark: Algorithm and Benchmark

Authors: Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang

Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for l… ▽ More Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released. △ Less

Submitted 23 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2403.19125 [pdf, other]

Generic reduction theory for Fermi sea topology in metallic systems

Authors: Wei Jia

Abstract: Fermi sea in a metal can host exotic quantum topology, which determines its conductance quantization and is characterized by Euler characteristic $χ_F$. Unlike gapped band topology described by the global feature of wave function, this topology of gapless system is associated with the geometry of Fermi sea, and thus probing and identifying $χ_F$ are inherently difficult in higher-dimensional syste… ▽ More Fermi sea in a metal can host exotic quantum topology, which determines its conductance quantization and is characterized by Euler characteristic $χ_F$. Unlike gapped band topology described by the global feature of wave function, this topology of gapless system is associated with the geometry of Fermi sea, and thus probing and identifying $χ_F$ are inherently difficult in higher-dimensional systems. Here, we propose a dimensional reduction theory for Fermi sea topology in $d$-dimensional metallic systems, showing that $χ_F$ can be determined by the feature of so-called reduced critical points on Fermi surfaces, with theoretical simplicity and observational intuitiveness. We also reveal a nontrivial correspondence between the Fermi sea topology and the gapped band topology by using an ingenious mapping, of which $χ_F$ exactly equals to the topological invariant of gapped topological phases. This provides a potential way to capture $χ_F$ through the topological superconductors. Our work opens an avenue to characterize and detect the Fermi sea topology using low-dimensional momentum information. △ Less

Submitted 27 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures, 1 table

arXiv:2403.11807 [pdf, other]

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

Authors: Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

Abstract: Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce o… ▽ More Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, GAMA-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through GAMA-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on GAMA-Bench, achieving a score of 60.5. Moreover, Gemini-1.0-Pro and GPT-3.5 (0613, 1106, 0125) demonstrate similar intelligence on GAMA-Bench. The code and experimental results are made publicly available via https://github.com/CUHK-ARISE/GAMABench. △ Less

Submitted 25 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 16 pages of main text. 11 pages of appendices. 15 figures, 9 tables. Updated scoring scheme

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05227 [pdf, ps, other]

doi 10.1088/1674-1056/ad1c5e

Superconductivity in kagome metal ThRu3Si2

Authors: Yi Liu, Jing Li, Wu-Zhang Yang, Jia-Yi Lu, Bo-Ya Cao, Hua-Xun Li, Wan-Li Chai, Si-Qi Wu, Bai-Zhuo Li, Yun-Lei Sun, Wen-He Jiao, Wang Cao, Xiao-Feng Xu, Ren Zhi, Guang-Han Cao

Abstract: We report the physical properties of ThRu$_3$Si$_2$ featured with distorted Ru kagome lattice. The combined experiments of resistivity, magnetization and specific heat reveal bulk superconductivity with $T_{\rm{c}}$ = 3.8 K. The specific heat jump and calculated electron-phonon coupling indicate a moderate coupled BCS superconductor. In comparison with LaRu$_3$Si$_2$, the calculated electronic str… ▽ More We report the physical properties of ThRu$_3$Si$_2$ featured with distorted Ru kagome lattice. The combined experiments of resistivity, magnetization and specific heat reveal bulk superconductivity with $T_{\rm{c}}$ = 3.8 K. The specific heat jump and calculated electron-phonon coupling indicate a moderate coupled BCS superconductor. In comparison with LaRu$_3$Si$_2$, the calculated electronic structure in ThRu$_3$Si$_2$ shows an electron-doping effect with electron filling lifted from 100 meV below flat bands to 300 meV above it. This explains the lower superconducting transition temperature and weaker electron correlations observed in ThRu$_3$Si$_2$. Our work suggests the $T_{\rm{c}}$ and electronic correlations in kagome superconductor could have intimate connection with the flat bands. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 7 pages, 5 figures

Journal ref: Chinese Physics B (2024)

arXiv:2403.04274 [pdf, other]

doi 10.1051/0004-6361/202449182

Relative alignment between gas structures and magnetic field in Orion A at different scales using different molecular gas tracers

Authors: Wenyu Jiao, Ke Wang, Fengwei Xu, Chao Wang, Henrik Beuther

Abstract: Context: Magnetic fields can play crucial roles in high-mass star formation. Nonetheless, the significance of magnetic fields at various scales and their relationship with gas structures is largely overlooked. Aims: Our goal is to examine the relationship between the magnetic field and molecular gas structures within the Orion A giant molecular cloud at different scales and density regimes. Method… ▽ More Context: Magnetic fields can play crucial roles in high-mass star formation. Nonetheless, the significance of magnetic fields at various scales and their relationship with gas structures is largely overlooked. Aims: Our goal is to examine the relationship between the magnetic field and molecular gas structures within the Orion A giant molecular cloud at different scales and density regimes. Methods: We assess the gas intensity structures and column densities in Orion A by utilizing $^{12}$CO, $^{13}$CO, and C$^{18}$O from Nobeyama observations. Through comparing Nobeyama observations with {\it{Planck}} polarization observations on large scales ($\sim0.6$ pc) and JCMT polarization observations on small scales ($\sim0.04$ pc), we investigate how the role of magnetic fields change with scale and density. Results: We find a similar trend from parallel to perpendicular alignment with increasing column densities in Orion A at both large and small spatial scales. Besides, when changing from low-density to high-density tracers, the relative orientation preference changes from random to perpendicular. The self-similar results at different scales indicate that magnetic fields are dynamically important in both cloud formation and filament formation. However, magnetic fields properties at small scales are relative complicated, and the interplay between magnetic field and star-forming activities needs to be discussed case-by-case. △ Less

Submitted 19 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: 12 pages, 8 figures, published in A&A

arXiv:2403.03692 [pdf, ps, other]

Vertex-disjoint cycles of different lengths in tournaments

Authors: Yandong Bai, Wenpei Jia

Abstract: Bermond and Thomassen conjectured in 1981 that every digraph with minimum outdegree at least $2k-1$ contains $k$ vertex-disjoint cycles,here $k$ is a positive integer. Lichiardopol conjectured in 2014 that for every positive integer $k$ there exists an integer $g(k)$ such that every digraph with minimum outdegree at least $g(k)$ contains $k$ vertex-disjoint cycles of different lengths. Recently, C… ▽ More Bermond and Thomassen conjectured in 1981 that every digraph with minimum outdegree at least $2k-1$ contains $k$ vertex-disjoint cycles,here $k$ is a positive integer. Lichiardopol conjectured in 2014 that for every positive integer $k$ there exists an integer $g(k)$ such that every digraph with minimum outdegree at least $g(k)$ contains $k$ vertex-disjoint cycles of different lengths. Recently, Chen and Chang proved in [J. Graph Theory 105 (2) (2024) 297-314] that for $k\geqslant 3$ every tournament with minimum outdegree at least $2k-1$ contains $k$ vertex-disjoint cycles in which two of them have different lengths. Motivated by the above two conjectures and related results, we investigate vertex-disjoint cycles of different lengths in tournaments, and show that when $k\geqslant 5$ every tournament with minimum outdegree at least $2k-1$ contains $k$ vertex-disjoint cycles in which three of them have different lengths. In addition, we show that every tournament with minimum outdegree at least $6$ contains three vertex-disjoint cycles of different lengths and the minimum outdegree condition is sharp. This answers a question proposed by Chen and Chang. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.01126 [pdf, other]

doi 10.1103/PhysRevA.108.043709

Single photon scattering from a chain of giant atoms coupled to a one-dimensional waveguide

Authors: Y. P. Peng, W. Z. Jia

Abstract: We investigate coherent single-photon transport in a waveguide quantum electrodynamics structure containing multiple giant atoms. The single-photon scattering amplitudes are solved using a real-space method. The results give rise to a clear picture of the multi-channel scattering process. In the case of identical and equally-spaced giant atoms in a separate configuration, we also use the transfer-… ▽ More We investigate coherent single-photon transport in a waveguide quantum electrodynamics structure containing multiple giant atoms. The single-photon scattering amplitudes are solved using a real-space method. The results give rise to a clear picture of the multi-channel scattering process. In the case of identical and equally-spaced giant atoms in a separate configuration, we also use the transfer-matrix method to express the scattering amplitudes in terms of compact analytical expressions, which allow us to conveniently analyze the properties of the scattering spectra. Based on these theoretical results, we find that the non-dipole effects of giant atoms, which are relevant to the design of the setup, can strongly manipulate several types of collective properties of the output fields, including the superradiant phenomenon, the multiple Fano interference, and the photonic band gap. This makes it possible to manipulate the photon transport in a more versatile way than with small atoms. We also make a proposal to probe the topological states of a chain of braided giant atoms by using photon scattering spectra, showing that waveguide quantum electrodynamics systems with giant atoms are ideal platforms to merge topological physics and on-chip quantum optics. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 19 pages, 11 figures

Journal ref: Phys. Rev. A 108, 043709 (2023)

arXiv:2403.00316 [pdf, other]

Surface Chern-Simons theory for third-order topological insulators and superconductors

Authors: Zhi-Hao Huang, Yi Tan, Wei Jia, Long Zhang, Xiong-Jun Liu

Abstract: Three-dimensional 3rd-order topological insulators (TOTIs) and superconductors (TOTSCs), as the highestorder topological phases hosting zero corner modes in physical dimension, has sparked extensive research interest. However, such topological states have not been discovered in reality due to the lack of experimental schemes of realization. Here, we propose a novel surface Chern-Simons (CS) theory… ▽ More Three-dimensional 3rd-order topological insulators (TOTIs) and superconductors (TOTSCs), as the highestorder topological phases hosting zero corner modes in physical dimension, has sparked extensive research interest. However, such topological states have not been discovered in reality due to the lack of experimental schemes of realization. Here, we propose a novel surface Chern-Simons (CS) theory for 3rd-order topological phases, and show that the theory enables a feasible and systematic design of TOTIs and TOTSCs. We show that the emergence of zero Dirac (Majorana) corner modes is entirely captured by an emergent $\mathbb{Z}_{2}$ CS term that can be further characterized by a novel two-particle Wess-Zumino (WZ) term uncovered here in the surfaces of three-dimensional topological materials. Importantly, our proposed CS term characterization and two-particle WZ term mechanism provide a unique perspective to design TOTIs (TOTSCs) in terms of minimal ingredients, feasibly guiding the search for underlying materials, with promising candidates being discussed. This work shall advance both the theoretical and experimental research for highest-order topological matters. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 5+11 pages, 4+5 figures

arXiv:2402.11515 [pdf, other]

Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics

Authors: Wang Jia, Hang Xu

Abstract: Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-… ▽ More Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 CPU cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies. △ Less

Submitted 29 April, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.11111 [pdf, other]

Language Models as Science Tutors

Authors: Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Jia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen

Abstract: NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering bench… ▽ More NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering benchmark consisting of questions about long chapters from STEM textbooks, written by experts. TutorEval helps measure real-life usability of LMs as scientific assistants, and it is the first benchmark combining long contexts, free-form generation, and multi-disciplinary scientific knowledge. Moreover, we show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. Therefore, we create TutorChat, a dataset of 80,000 long synthetic dialogues about textbooks. We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH. Our datasets build on open-source materials, and we release our models, data, and evaluations. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 8 pages without bibliography and appendix, 26 pages total

arXiv:2402.08982 [pdf, other]

MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

Authors: Xubin Wang, Haojiong Shangguan, Fengyi Huang, Shangrui Wu, Weijia Jia

Abstract: Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to… ▽ More Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness. Additionally, we have open-sourced our code on GitHub at https://github.com/wangxb96/MEL. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.07726 [pdf, other]

Unsupervised Sign Language Translation and Generation

Authors: Zhengsheng Guo, Zhiwei He, Wenxiang Jiao, Xing Wang, Rui Wang, Kehai Chen, Zhaopeng Tu, Yong Xu, Min Zhang

Abstract: Motivated by the success of unsupervised neural machine translation (UNMT), we introduce an unsupervised sign language translation and generation network (USLNet), which learns from abundant single-modality (text and video) data without parallel sign language data. USLNet comprises two main components: single-modality reconstruction modules (text and video) that rebuild the input from its noisy ve… ▽ More Motivated by the success of unsupervised neural machine translation (UNMT), we introduce an unsupervised sign language translation and generation network (USLNet), which learns from abundant single-modality (text and video) data without parallel sign language data. USLNet comprises two main components: single-modality reconstruction modules (text and video) that rebuild the input from its noisy version in the same modality and cross-modality back-translation modules (text-video-text and video-text-video) that reconstruct the input from its noisy version in the different modality using back-translation procedure.Unlike the single-modality back-translation procedure in text-based UNMT, USLNet faces the cross-modality discrepancy in feature representation, in which the length and the feature dimension mismatch between text and video sequences. We propose a sliding window method to address the issues of aligning variable-length text with video sequences. To our knowledge, USLNet is the first unsupervised sign language translation and generation model capable of generating both natural language text and sign language video in a unified manner. Experimental results on the BBC-Oxford Sign Language dataset (BOBSL) and Open-Domain American Sign Language dataset (OpenASL) reveal that USLNet achieves competitive results compared to supervised baseline models, indicating its effectiveness in sign language translation and generation. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.13270 [pdf, other]

Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics

Authors: Pengcheng Zhao, Yanxiang Chen, Yang Zhao, Wei Jia, Zhao Zhang, Ronggang Wang, Richang Hong

Abstract: Automatic image colorization is inherently an ill-posed problem with uncertainty, which requires an accurate semantic understanding of scenes to estimate reasonable colors for grayscale images. Although recent interaction-based methods have achieved impressive performance, it is still a very difficult task to infer realistic and accurate colors for automatic colorization. To reduce the difficulty… ▽ More Automatic image colorization is inherently an ill-posed problem with uncertainty, which requires an accurate semantic understanding of scenes to estimate reasonable colors for grayscale images. Although recent interaction-based methods have achieved impressive performance, it is still a very difficult task to infer realistic and accurate colors for automatic colorization. To reduce the difficulty of semantic understanding of grayscale scenes, this paper tries to utilize corresponding audio, which naturally contains extra semantic information about the same scene. Specifically, a novel audio-infused automatic image colorization (AIAIC) network is proposed, which consists of three stages. First, we take color image semantics as a bridge and pretrain a colorization network guided by color image semantics. Second, the natural co-occurrence of audio and video is utilized to learn the color semantic correlations between audio and visual scenes. Third, the implicit audio semantic representation is fed into the pretrained network to finally realize the audio-guided colorization. The whole process is trained in a self-supervised manner without human annotation. In addition, an audiovisual colorization dataset is established for training and testing. Experiments demonstrate that audio guidance can effectively improve the performance of automatic colorization, especially for some scenes that are difficult to understand only from visual modality. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.12873 [pdf, other]

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model

Authors: Zhiwei He, Xing Wang, Wenxiang Jiao, Zhuosheng Zhang, Rui Wang, Shuming Shi, Zhaopeng Tu

Abstract: Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE m… ▽ More Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. We examine the problem and argue that the vulnerability of the QE model might lead to high rewards for incorrect translations, resulting in overoptimization and error propagation. To address the problem, we adopt a simple yet effective method that uses heuristic rules to detect the incorrect translations and assigns a penalty term to the reward scores of them. Experimental results show that the proposed QE-based feedback training achieves consistent and significant improvements across various settings, further verified through human preference studies. Our subsequent analysis demonstrates the high data efficiency of the proposed QE-based feedback training: it outperforms systems using larger parallel corpora by a small amount of monolingual data. Our code is available at: https://github.com/zwhe99/FeedbackMT △ Less

Submitted 18 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: NAACL 2024

arXiv:2401.05695 [pdf, other]

Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback

Authors: Chengfeng Dou, Zhi Jin, Wenpin Jiao, Haiyan Zhao, Yongqiang Zhao, Zhenwei Tao

Abstract: The use of large language models in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we… ▽ More The use of large language models in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we propose an approach called preference learning from process feedback~(PLPF), which integrates the doctor's diagnostic logic into LLMs. PLPF involves rule modeling, preference data generation, and preference alignment to train the model to adhere to the diagnostic process. Experimental results using Standardized Patient Testing show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%, outperforming traditional reinforcement learning from human feedback. Additionally, PLPF demonstrates effectiveness in both multi-round and single-round dialogue tasks, showcasing its potential for improving medical dialogue generation. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.04322 [pdf, other]

The ALMA-QUARKS survey: Detection of two extremely dense substructures in a massive prestellar core

Authors: Xiaofeng Mai, Tie Liu, Xunchuan Liu, Lei Zhu, Guido Garay, Paul F. Goldsmith, Mika Juvela, Hongli Liu, Emma Mannfors, Emma Mannfors, Anandmayee Tej, Patricio Sanhueza, Shanghuo Li, Fengwei Xu, Enrique Vazquez Semadeni, Wenyu Jiao, Yaping Peng, T. Baug, Aiyuan Yang, Lokesh Dewangan, Leonardo Bronfman, Gilberto C. Gómez, Aina Palau, Chang Won Lee, Sheng-Li Qin , et al. (11 additional authors not shown)

Abstract: Only a handful of massive starless core candidates have been discovered so far, but none of them have been fully confirmed. Within the MM1 clump in the filamentary infrared dark cloud G34.43+0.24 that was covered by the ALMA-ATOMS survey at Band 3 ($\sim2\arcsec$, 6000\,au) and the ALMA-QUARKS survey at Band 6 ($\sim 0.3\arcsec$, 900\,au), two prestellar core candidates MM1-C and E1 with masses of… ▽ More Only a handful of massive starless core candidates have been discovered so far, but none of them have been fully confirmed. Within the MM1 clump in the filamentary infrared dark cloud G34.43+0.24 that was covered by the ALMA-ATOMS survey at Band 3 ($\sim2\arcsec$, 6000\,au) and the ALMA-QUARKS survey at Band 6 ($\sim 0.3\arcsec$, 900\,au), two prestellar core candidates MM1-C and E1 with masses of 71 and 20 \solarmass~and radii of 2100--4400\,au were discovered. The two cores show no obvious sign of star-formation activities. In particular, MM1-C is a very promising massive prestellar core candidate with a total gas mass of 71\,\solarmass. Within MM1-C, we detected two extremely dense substructures, C1 and C2, as characterized by their high densities of $\rm n_{H_2}\sim 10^{8-9} cm^{-3}$. Moreover, evidence of further fragmentation in C2 was also revealed. We have detected the primordial fragmentation in the earliest stage of massive star formation, and we speculate that MM1-C would be the birthplace of a massive multiple system. However, we cannot fully rule out the possibility that the massive prestellar core MM1-C will just form a cluster of low-mass stars if it undergoes further fragmentation. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 12 pages, 6 figures

arXiv:2401.00761 [pdf, other]

The Earth is Flat? Unveiling Factual Errors in Large Language Models

Authors: Wenxuan Wang, Juluan Shi, Zhaopeng Tu, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Abstract: Large Language Models (LLMs) like ChatGPT are foundational in various applications due to their extensive knowledge from pre-training and fine-tuning. Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education to mislead users. Current methods for evaluating LLMs' veracity are limited by test data leakage… ▽ More Large Language Models (LLMs) like ChatGPT are foundational in various applications due to their extensive knowledge from pre-training and fine-tuning. Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education to mislead users. Current methods for evaluating LLMs' veracity are limited by test data leakage or the need for extensive human labor, hindering efficient and accurate error detection. To tackle this problem, we introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs. This framework involves three main steps: First, it constructs a factual knowledge graph by retrieving fact triplets from a large-scale knowledge database. Then, leveraging the knowledge graph, FactChecker employs a rule-based approach to generates three types of questions (Yes-No, Multiple-Choice, and WH questions) that involve single-hop and multi-hop relations, along with correct answers. Lastly, it assesses the LLMs' responses for accuracy using tailored matching strategies for each question type. Our extensive tests on six prominent LLMs, including text-davinci-002, text-davinci-003, ChatGPT~(gpt-3.5-turbo, gpt-4), Vicuna, and LLaMA-2, reveal that FactChecker can trigger factual errors in up to 45\% of questions in these models. Moreover, we demonstrate that FactChecker's test cases can improve LLMs' factual accuracy through in-context learning and fine-tuning (e.g., llama-2-13b-chat's accuracy increase from 35.3\% to 68.5\%). We are making all code, data, and results available for future research endeavors. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.00757 [pdf, other]

A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models

Authors: Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu

Abstract: Recent advancements in large language models (LLMs) have propelled Artificial Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing assistance, code generation, and machine translation. A significant distinction of advanced LLMs, such as ChatGPT, is their demonstrated ability to "reason." However, evaluating the reasoning ability of LLMs remains a challenge as m… ▽ More Recent advancements in large language models (LLMs) have propelled Artificial Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing assistance, code generation, and machine translation. A significant distinction of advanced LLMs, such as ChatGPT, is their demonstrated ability to "reason." However, evaluating the reasoning ability of LLMs remains a challenge as most existing evaluations focus on their accuracy on the downstream tasks rather than directly assessing their reasoning processes. Efforts have been made to develop benchmarks and metrics to assess reasoning in LLMs, but they suffer from data leakage or limited scope. In this paper, we introduce LogicAsker, an automatic approach that comprehensively evaluates and improves the logical reasoning abilities of LLMs under a set of atomic reasoning skills based on propositional and predicate logic. The results provide insights into LLMs' reasoning abilities and reveal the logical rules the LLMs did not learn well. We evaluate LogicAsker on six widely deployed LLMs, including GPT-3, ChatGPT, GPT-4, Bard, Vicuna, and Guanaco. The results show that test cases from LogicAsker can find logical reasoning failures in different LLMs with a rate of 25\% - 94\%. In addition, the test cases of LogicAsker can be further used to design demonstration examples for in-context learning, which effectively improves the logical reasoning ability of LLMs, e.g., 10\% for GPT-4. As far as we know, our work is the first to create prompts based on testing results to improve LLMs' formal reasoning ability effectively. All the code, data, and results will be released for reproduction and future research. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.15492 [pdf, other]

DPA-2: Towards a universal large atomic model for molecular and material simulation

Authors: Duo Zhang, Xinzijian Liu, Xiangyu Zhang, Chengqian Zhang, Chun Cai, Hangrui Bi, Yiming Du, Xuejian Qin, Jiameng Huang, Bowen Li, Yifan Shan, Jinzhe Zeng, Yuzhi Zhang, Siyuan Liu, Yifan Li, Junhan Chang, Xinyan Wang, Shuo Zhou, Jianchuan Liu, Xiaoshan Luo, Zhenyu Wang, Wanrun Jiang, Jing Wu, Yudi Yang, Jiyuan Yang , et al. (17 additional authors not shown)

Abstract: The rapid development of artificial intelligence (AI) is driving significant changes in the field of atomic modeling, simulation, and design. AI-based potential energy models have been successfully used to perform large-scale and long-time simulations with the accuracy of ab initio electronic structure methods. However, the model generation process still hinders applications at scale. We envision… ▽ More The rapid development of artificial intelligence (AI) is driving significant changes in the field of atomic modeling, simulation, and design. AI-based potential energy models have been successfully used to perform large-scale and long-time simulations with the accuracy of ab initio electronic structure methods. However, the model generation process still hinders applications at scale. We envision that the next stage would be a model-centric ecosystem, in which a large atomic model (LAM), pre-trained with as many atomic datasets as possible and can be efficiently fine-tuned and distilled to downstream tasks, would serve the new infrastructure of the field of molecular modeling. We propose DPA-2, a novel architecture for a LAM, and develop a comprehensive pipeline for model fine-tuning, distillation, and application, associated with automatic workflows. We show that DPA-2 can accurately represent a diverse range of chemical systems and materials, enabling high-quality simulations and predictions with significantly reduced efforts compared to traditional methods. Our approach paves the way for a universal large atomic model that can be widely applied in molecular and material simulation research, opening new opportunities for scientific discoveries and industrial applications. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Showing 1–50 of 352 results for author: Jia, W