subscribe to arXiv mailings

OffsetBias: Leveraging Debiased Data for Tuning Evaluators

Authors: Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, Sanghyuk Choi

Abstract: Employing Large Language Models (LLMs) to assess the quality of generated responses, such as prompting instruct-tuned models or fine-tuning judge models, has become a widely adopted evaluation method. It is also known that such evaluators are vulnerable to biases, such as favoring longer responses. While it is important to overcome this problem, the specifics of these biases remain under-explored.… ▽ More Employing Large Language Models (LLMs) to assess the quality of generated responses, such as prompting instruct-tuned models or fine-tuning judge models, has become a widely adopted evaluation method. It is also known that such evaluators are vulnerable to biases, such as favoring longer responses. While it is important to overcome this problem, the specifics of these biases remain under-explored. In this work, we qualitatively identify six types of biases inherent in various judge models. We propose EvalBiasBench as a meta-evaluation collection of hand-crafted test cases for each bias type. Additionally, we present de-biasing dataset construction methods and the associated preference dataset OffsetBias. Experimental results demonstrate that fine-tuning on our dataset significantly enhances the robustness of judge models against biases and improves performance across most evaluation scenarios. We release our datasets and the fine-tuned judge model to public. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Work in Progress

arXiv:2407.05567 [pdf, other]

Multiple scattering and diffusion of scalar coherent waves in a group of small spheroidal particles with random orientations

Authors: Mingyuan Ren, Yajing Qiao, Ning Zhou, Jianrui Gong, Yang Zhou, Yu Zhang

Abstract: In this manuscript we study multiple scattering and diffusion of scalar wave in a group of monodisperse spheroidal particles with random orientations. We begin by fixing a spheroid in a prolate spheroidal coordinate system, and attain the expansion of the scalar Green's function in this space. The expansion is firstly based on spheroidal wave functions, and then we transform it into the expansion… ▽ More In this manuscript we study multiple scattering and diffusion of scalar wave in a group of monodisperse spheroidal particles with random orientations. We begin by fixing a spheroid in a prolate spheroidal coordinate system, and attain the expansion of the scalar Green's function in this space. The expansion is firstly based on spheroidal wave functions, and then we transform it into the expansion of spherical wave functions. Next, we average the Green's function over the orientations of the spheroid to get the averaged transition operator. Finally, we calculate the transport mean free path and anisotropy factor for the spheroidal particles group, based on the irreducible vertex in the Bethe-Salpeter equation. The approaches to get the average transition operator and the mean free paths in this manuscript will be of benefit to the research area of multiple scattering by non-spherical particles. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 18 pages, 3 figures

arXiv:2407.00450 [pdf, other]

Hybrid Quantum-Classical Clustering for Preparing a Prior Distribution of Eigenspectrum

Authors: Mengzhen Ren, Yu-Cheng Chen, Ching-Jui Lai, Min-Hsiu Hsieh, Alice Hu

Abstract: Determining the energy gap in a quantum many-body system is critical to understanding its behavior and is important in quantum chemistry and condensed matter physics. The challenge of determining the energy gap requires identifying both the excited and ground states of a system. In this work, we consider preparing the prior distribution and circuits for the eigenspectrum of time-independent Hamilt… ▽ More Determining the energy gap in a quantum many-body system is critical to understanding its behavior and is important in quantum chemistry and condensed matter physics. The challenge of determining the energy gap requires identifying both the excited and ground states of a system. In this work, we consider preparing the prior distribution and circuits for the eigenspectrum of time-independent Hamiltonians, which can benefit both classical and quantum algorithms for solving eigenvalue problems. The proposed algorithm unfolds in three strategic steps: Hamiltonian transformation, parameter representation, and classical clustering. These steps are underpinned by two key insights: the use of quantum circuits to approximate the ground state of transformed Hamiltonians and the analysis of parameter representation to distinguish between eigenvectors. The algorithm is showcased through applications to the 1D Heisenberg system and the LiH molecular system, highlighting its potential for both near-term quantum devices and fault-tolerant quantum devices. The paper also explores the scalability of the method and its performance across various settings, setting the stage for more resource-efficient quantum computations that are both accurate and fast. The findings presented here mark a new insight into hybrid algorithms, offering a pathway to overcoming current computational challenges. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00322 [pdf]

LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods

Authors: Zhenhua Wang, Guang Xu, Ming Ren

Abstract: With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight… ▽ More With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight that augmented data is randomly generated by LLM, implying that not all data may possess equal training value, that could impede the performance of classifiers. To address these challenges, we introduce the scaling laws to intrinsically calculate LLMNL and HNL. Through extensive experiments, we reveal slight deviations (approximately 0.2 Mandelbrot exponent) from Mandelbrot's law in LLMNL, underscore a complexity advantage in HNL, and supplement an interpretive discussion on language style. This establishes a solid foundation for LLM's expansion. Further, we introduce a novel data augmentation method for few-shot text classification, termed ZGPTDA, which leverages fuzzy computing mechanisms driven by the conformity to scaling laws to make decisions about GPT-4 augmented data. Extensive experiments, conducted in real-world scenarios, confirms the effectiveness (improving F1 of Bert and RoBerta by 7-10%) and competitiveness (surpassing recent AugGPT and GENCO methods by about 2% accuracy on DeBerta) of ZGPTDA. In addition, we reveal some interesting insights, e.g., Hilberg's law and Taylor's law can impart more benefits to text classification, etc. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.18144 [pdf, other]

doi 10.1007/s11263-024-02153-0

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored and underutilised in insect farming. Here we present a comprehensive review of the selective breeding framework in the context of insect production. We systematically evaluate adjustments of selective breeding techniques to the realm of insects and highlight the essential components integral to the breeding process. The discussion covers every step of a conventional breeding scheme, such as formulation of breeding objectives, phenotyping, estimation of genetic parameters and breeding values, selection of appropriate breeding strategies, and mitigation of issues associated with genetic diversity depletion and inbreeding. This review combines knowledge from diverse disciplines, bridging the gap between animal breeding, quantitative genetics, evolutionary biology, and entomology, offering an integrated view of the insect breeding research area and uniting knowledge which has previously remained scattered across diverse fields of expertise. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Journal ref: International Journal of Computer Vision (IJCV), 2024

arXiv:2406.14133 [pdf, other]

Beam shaping by nonlinear moiré metasurfaces

Authors: Lun Qu, Wei Wu, Di Zhang, Chenxiong Wang, Lu Bai, Chenyang Li, Wei Cai, Mengxin Ren, Andrea Alù, Jingjun Xu

Abstract: This paper explores the interplay of momentum transfer and nonlinear optical processes through moiré phenomena. Momentum transfer plays a crucial role in the interaction between photons and matter. Here, we study stacked metasurfaces with tailored dispersion and rotated against each other with varying twisted angles. The stacking introduces interlayer interactions, which can be controlled by the r… ▽ More This paper explores the interplay of momentum transfer and nonlinear optical processes through moiré phenomena. Momentum transfer plays a crucial role in the interaction between photons and matter. Here, we study stacked metasurfaces with tailored dispersion and rotated against each other with varying twisted angles. The stacking introduces interlayer interactions, which can be controlled by the relative angle between metasurfaces, significantly enriching the resulting response compared to the single layer counterpart. By focusing on second-harmonic generation (SHG) from these twisted metasurfaces, we delve into the realm of nonlinear moiré photonics. Through experimental observations, we unveil the emergence of intricate far-field SHG radiation patterns, showing their effective tuning by varying the twisted angles. These findings offer a fresh perspective to explore nonlinear wavefront shaping through moiré phenomena, opening new avenues for nonlinear information processing, optical steering, and nonlinear optical switching. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figures

arXiv:2406.08980 [pdf, other]

From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.05613 [pdf, other]

Distributed Motion Control of Multiple Mobile Manipulator System with Disturbance and Communication Delay

Authors: Wenhang Liu, Meng Ren, Kun Song, Michael Yu Wang, Zhenhua Xiong

Abstract: In real-world object manipulation scenarios, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction forces and causing object damage or emergency stops. This paper presents a novel distributed motion control approach aimed at reducing these unnecessary interaction forces. The control strategy only utilizes force information without the nee… ▽ More In real-world object manipulation scenarios, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction forces and causing object damage or emergency stops. This paper presents a novel distributed motion control approach aimed at reducing these unnecessary interaction forces. The control strategy only utilizes force information without the need for global position and velocity information. Disturbances are corrected through compensatory movements of the manipulators. Besides, the asymmetric, non-uniform, and time-varying communication delays between robots are also considered. The stability of the control law is rigorously proven by the Lyapunov theorem. Subsequently, the efficacy of the proposed control law is validated through simulations and experiments of collaborative object transportation by two robots. Experimental results demonstrate the effectiveness of the proposed control law in reducing interaction forces during object manipulation. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.01252 [pdf, other]

Towards Scalable Automated Alignment of LLMs: A Survey

Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00593 [pdf]

Low threshold optical bistability based on MoS2 in asymmetric Fabry-Perot cavity structure in visible light band

Authors: Songqing Tang, Mengjiao Ren, Zhiheng Li, Zhiwei Zheng, Leyong Jiang

Abstract: This article theoretically proposes a multi-layer Fabry-Perot cavity structure based on nonlinear MoS2, whose cavity is composed of asymmetric photonic crystals. In this structure, we observed a low threshold optical bistability phenomenon on the order of a in the visible light band, which is caused by the large third-order nonlinear conductivity of the bilayer MoS2 and the Fabry-Perot cavity reso… ▽ More This article theoretically proposes a multi-layer Fabry-Perot cavity structure based on nonlinear MoS2, whose cavity is composed of asymmetric photonic crystals. In this structure, we observed a low threshold optical bistability phenomenon on the order of a in the visible light band, which is caused by the large third-order nonlinear conductivity of the bilayer MoS2 and the Fabry-Perot cavity resonance. Research has found that when light is incident from two different directions in an asymmetric Fabry-Perot cavity, the optical bistability exhibits not exactly the same behavior. In addition, we further investigated and found that the optical bistability behavior in this simple multi-layer structure is closely related to parameters such as incident wavelength, Fabry-Perot cavity length, and refractive index of the photonic crystal dielectric. This work provides a new approach for the implementation of low threshold optical bistable devices in the visible light band, which is expected to be applied in nonlinear optical fields such as all optical switches and all optical logic devices. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 22 pages, 6 figures

arXiv:2406.00590 [pdf]

MoS2-based optical bistability in silver-Bragg reflector multilayer structure at visible light band

Authors: Songqing Tang, Mengjiao Ren, Zhiheng Li, Zhiwei Zheng, Leyong Jiang

Abstract: In this paper, we present a theoretical analysis of the optical bistability in a metallic silver-Bragg reflector structure by embedding bilayer MoS2 at the visible band. The nonlinear OB is achieved due to the nonlinear conductivity of the bilayer MoS2 and the excitation of the optical Tamm state at the interface between the silver and the Bragg reflector. It is found that the hysteresis behaviour… ▽ More In this paper, we present a theoretical analysis of the optical bistability in a metallic silver-Bragg reflector structure by embedding bilayer MoS2 at the visible band. The nonlinear OB is achieved due to the nonlinear conductivity of the bilayer MoS2 and the excitation of the optical Tamm state at the interface between the silver and the Bragg reflector. It is found that the hysteresis behaviour and the threshold width of the OB can be effectively tuned by varying the incident light wavelength. In addition, the optical bistable behaviour of the structure can be adjusted by varying the position of the MoS2 inset in the defect layer, incident angle and the structural parameters of the spacer layer. Although the current threshold cannot be commercialized, we believe that this solution will provide a meaningful path reference for low threshold bistability in the visible light band. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 23 pages, 6 figures

arXiv:2405.03178 [pdf, other]

POPDG: Popular 3D Dance Generation with PopDanceSet

Authors: Zhenye Luo, Min Ren, Xuecai Hu, Yongzhen Huang, Li Yao

Abstract: Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M… ▽ More Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19132 [pdf, other]

Integrating Present and Past in Unsupervised Continual Learning

Authors: Yipeng Zhang, Laurent Charlin, Richard Zemel, Mengye Ren

Abstract: We formulate a unifying framework for unsupervised continual learning (UCL), which disentangles learning objectives that are specific to the present and the past data, encompassing stability, plasticity, and cross-task consolidation. The framework reveals that many existing UCL approaches overlook cross-task consolidation and try to balance plasticity and stability in a shared embedding space. Thi… ▽ More We formulate a unifying framework for unsupervised continual learning (UCL), which disentangles learning objectives that are specific to the present and the past data, encompassing stability, plasticity, and cross-task consolidation. The framework reveals that many existing UCL approaches overlook cross-task consolidation and try to balance plasticity and stability in a shared embedding space. This results in worse performance due to a lack of within-task data diversity and reduced effectiveness in learning the current task. Our method, Osiris, which explicitly optimizes all three objectives on separate embedding spaces, achieves state-of-the-art performance on all benchmarks, including two novel benchmarks proposed in this paper featuring semantically structured task sequences. Compared to standard benchmarks, these two structured benchmarks more closely resemble visual signals received by humans and animals when navigating real-world environments. Finally, we show some preliminary evidence that continual models can benefit from such realistic learning scenarios. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: CoLLAs 2024

arXiv:2404.07598 [pdf, other]

Electro-optically Modulated Nonlinear Metasurfaces

Authors: Zhengqing He, Lun Qu, Wei Wu, Jikun Liu, Jingfei You, Weiye Liu, Lu Bai, Chunyan Jin, Chenxiong Wang, Zhidong Gu, Wei Cai, Mengxin Ren, Jingjun Xu

Abstract: Tunable nonlinearity facilitates the creation of reconfigurable nonlinear metasurfaces, enabling innovative applications in signal processing, light switching, and sensing. This paper presents a novel approach to electrically modulate SHG from a lithium niobate (LN) metasurface, exploiting the electro-optical (EO) effect. By fabricating a nanohole array metasurface on a thin LN film and applying a… ▽ More Tunable nonlinearity facilitates the creation of reconfigurable nonlinear metasurfaces, enabling innovative applications in signal processing, light switching, and sensing. This paper presents a novel approach to electrically modulate SHG from a lithium niobate (LN) metasurface, exploiting the electro-optical (EO) effect. By fabricating a nanohole array metasurface on a thin LN film and applying an electric field, we demonstrate the alteration of the material's refractive index, resulting in resonance shifts and modulation of SHG intensity at specific wavelengths. Our findings provide valuable insights for the development of electrically tunable nonlinear light sources, quantum optics, dynamic nonlinear holography, and nonlinear information processing. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 4 pages, 4 figures

arXiv:2404.04904 [pdf, other]

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Authors: Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

Abstract: Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-… ▽ More Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-domain ADD dataset comprising over 300 hours of speech data that is generated by five advanced zero-shot TTS models. To simulate real-world scenarios, we employ diverse attack methods and audio prompts from different datasets. Experiments show that, through novel attack-augmented training, the Wav2Vec2-large and Whisper-medium models achieve equal error rates of 4.1\% and 6.5\% respectively. Additionally, we demonstrate our models' outstanding few-shot ADD ability by fine-tuning with just one minute of target-domain data. Nonetheless, neural codec compressors greatly affect the detection accuracy, necessitating further research. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2403.15362 [pdf, other]

CoLLEGe: Concept Embedding Generation for Large Language Models

Authors: Ryan Teehan, Brenden Lake, Mengye Ren

Abstract: Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language model… ▽ More Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language models. In this paper, we introduce a novel approach named CoLLEGe (Concept Learning with Language Embedding Generation) to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions. Our primary meta-learning objective is simply to facilitate a language model to make next word predictions in forthcoming sentences, making it compatible with language model pretraining. We design a series of tasks to test new concept learning in challenging real-world scenarios, including new word acquisition, definition inference, and verbal reasoning, and demonstrate that our method succeeds in each setting without task-specific training. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12987 [pdf, other]

Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

Authors: Bowen Gao, Minsi Ren, Yuyan Ni, Yanwen Huang, Bo Qiang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Abstract: In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. T… ▽ More In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. To address this, we introduce the Delta Score, a new metric for evaluating the specificity of molecular binding. To further incorporate this insight for generation, we develop an innovative energy-guided approach using contrastive learning, with active compounds as decoys, to direct generative models toward creating molecules with high specificity. Our empirical results show that this method not only enhances the delta score but also maintains or improves traditional docking scores, successfully bridging the gap between SBDD and real-world needs. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.09613 [pdf, other]

Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

Authors: Yanlai Yang, Matt Jones, Michael C. Mozer, Mengye Ren

Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, reco… ▽ More We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 19 pages, 18 figures

arXiv:2402.19138 [pdf, other]

Generalized Pentagon Equations

Authors: Anton Alekseev, Florian Naef, Muze Ren

Abstract: Drinfeld defined the Knizhinik--Zamolodchikov (KZ) associator $Φ_{\rm KZ}$ by considering the regularized holonomy of the KZ connection along the {\em droit chemin} $[0,1]$. The KZ associator is a group-like element of the free associative algebra with two generators, and it satisfies the pentagon equation. In this paper, we consider paths on $\mathbb{C}\backslash \{ z_1, \dots, z_n\}$ which sta… ▽ More Drinfeld defined the Knizhinik--Zamolodchikov (KZ) associator $Φ_{\rm KZ}$ by considering the regularized holonomy of the KZ connection along the {\em droit chemin} $[0,1]$. The KZ associator is a group-like element of the free associative algebra with two generators, and it satisfies the pentagon equation. In this paper, we consider paths on $\mathbb{C}\backslash \{ z_1, \dots, z_n\}$ which start and end at tangential base points. These paths are not necessarily straight, and they may have a finite number of transversal self-intersections. We show that the regularized holonomy $H$ of the KZ connection associated to such a path satisfies a generalization of Drinfeld's pentagon equation. In this equation, we encounter $H$, $Φ_{\rm KZ}$, and new factors associated to self-intersections, to tangential base points, and to the rotation number of the path. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 19 pages, 3 figures

arXiv:2402.18243 [pdf, other]

Learning or Self-aligning? Rethinking Instruction Fine-tuning

Authors: Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun

Abstract: Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potentia… ▽ More Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. Surprisingly, our experiments reveal that attempting to learn additional world knowledge through IFT often struggles to yield positive impacts and can even lead to markedly negative effects. Further, we discover that maintaining internal knowledge consistency before and after IFT is a critical factor for achieving successful IFT. Our findings reveal the underlying mechanisms of IFT and provide robust support for some very recent and potential future works. △ Less

Submitted 2 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.00300 [pdf, other]

Self-supervised learning of video representations from a child's perspective

Authors: A. Emin Orhan, Wentao Wang, Alex N. Wang, Mengye Ren, Brenden M. Lake

Abstract: Children learn powerful internal models of the world around them from a few years of egocentric visual experience. Can such internal models be learned from a child's visual experience with highly generic learning algorithms or do they require strong inductive biases? Recent advances in collecting large-scale, longitudinal, developmentally realistic video datasets and generic self-supervised learni… ▽ More Children learn powerful internal models of the world around them from a few years of egocentric visual experience. Can such internal models be learned from a child's visual experience with highly generic learning algorithms or do they require strong inductive biases? Recent advances in collecting large-scale, longitudinal, developmentally realistic video datasets and generic self-supervised learning (SSL) algorithms are allowing us to begin to tackle this nature vs. nurture question. However, existing work typically focuses on image-based SSL algorithms and visual capabilities that can be learned from static images (e.g. object recognition), thus ignoring temporal aspects of the world. To close this gap, here we train self-supervised video models on longitudinal, egocentric headcam recordings collected from a child over a two year period in their early development (6-31 months). The resulting models are highly effective at facilitating the learning of action concepts from a small number of labeled examples; they have favorable data size scaling properties; and they display emergent video interpolation capabilities. Video models also learn more robust object representations than image-based models trained with the exact same data. These results suggest that important temporal aspects of a child's internal model of the world may be learnable from their visual experience using highly generic learning algorithms and without strong inductive biases. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures; code & models available from https://github.com/eminorhan/video-models

arXiv:2401.11839 [pdf, other]

AI for social science and social science of AI: A Survey

Authors: Ruoxi Xu, Yingfei Sun, Mengjie Ren, Shiguang Guo, Ruotong Pan, Hongyu Lin, Le Sun, Xianpei Han

Abstract: Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically… ▽ More Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically categorize previous explorations in the combination of AI and social science into two directions that share common technical approaches but differ in their research objectives. The first direction is focused on AI for social science, where AI is utilized as a powerful tool to enhance various stages of social science research. While the second direction is the social science of AI, which examines AI agents as social entities with their human-like cognitive and linguistic capabilities. By conducting a thorough review, particularly on the substantial progress facilitated by recent advancements in large language models, this paper introduces a fresh perspective to reassess the relationship between AI and social science, provides a cohesive framework that allows researchers to understand the distinctions and connections between AI for social science and social science of AI, and also summarized state-of-art experiment simulation platforms to facilitate research in these two directions. We believe that as AI technology continues to advance and intelligent agents find increasing applications in our daily lives, the significance of the combination of AI and social science will become even more prominent. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted by Information Processing and Management (IP&M)

arXiv:2401.11382 [pdf, other]

Using Large Language Model for End-to-End Chinese ASR and NER

Authors: Yuang Li, Jiawei Yu, Min Zhang, Mengxin Ren, Yanqing Zhao, Xiaofeng Zhao, Shimin Tao, Jinsong Su, Hao Yang

Abstract: Mapping speech tokens to the same feature space as text tokens has become the paradigm for the integration of speech modality into decoder-only large language models (LLMs). An alternative approach is to use an encoder-decoder architecture that incorporates speech features through cross-attention. This approach, however, has received less attention in the literature. In this work, we connect the W… ▽ More Mapping speech tokens to the same feature space as text tokens has become the paradigm for the integration of speech modality into decoder-only large language models (LLMs). An alternative approach is to use an encoder-decoder architecture that incorporates speech features through cross-attention. This approach, however, has received less attention in the literature. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks. We evaluate them not only by conventional metrics like the F1 score but also by a novel fine-grained taxonomy of ASR-NER errors. Our experiments reveal that encoder-decoder architecture outperforms decoder-only architecture with a short context, while decoder-only architecture benefits from a long context as it fully exploits all layers of the LLM. By using LLM, we significantly reduced the entity omission errors and improved the entity ASR accuracy compared to the Conformer baseline. Additionally, we obtained a state-of-the-art (SOTA) F1 score of 0.805 on the AISHELL-NER test set by using chain-of-thought (CoT) NER which first infers long-form ASR transcriptions and then predicts NER labels. △ Less

Submitted 6 June, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024

arXiv:2312.12736 [pdf, other]

Learning and Forgetting Unsafe Examples in Large Language Models

Authors: Jiachen Zhao, Zhun Deng, David Madras, James Zou, Mengye Ren

Abstract: As the number of large language models (LLMs) released to the public grows, there is a pressing need to understand the safety implications associated with these models learning from third-party custom finetuning data. We explore the behavior of LLMs finetuned on noisy custom data containing unsafe content, represented by datasets that contain biases, toxicity, and harmfulness, finding that while a… ▽ More As the number of large language models (LLMs) released to the public grows, there is a pressing need to understand the safety implications associated with these models learning from third-party custom finetuning data. We explore the behavior of LLMs finetuned on noisy custom data containing unsafe content, represented by datasets that contain biases, toxicity, and harmfulness, finding that while aligned LLMs can readily learn this unsafe content, they also tend to forget it more significantly than other examples when subsequently finetuned on safer content. Drawing inspiration from the discrepancies in forgetting, we introduce the "ForgetFilter" algorithm, which filters unsafe data based on how strong the model's forgetting signal is for that data. We demonstrate that the ForgetFilter algorithm ensures safety in customized finetuning without compromising downstream task performance, unlike sequential safety finetuning. ForgetFilter outperforms alternative strategies like replay and moral self-correction in curbing LLMs' ability to assimilate unsafe content during custom finetuning, e.g. 75% lower than not applying any safety measures and 62% lower than using self-correction in toxicity score. △ Less

Submitted 3 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: accepted by ICML 24

arXiv:2312.06886 [pdf, other]

Relightful Harmonization: Lighting-aware Portrait Background Replacement

Authors: Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang

Abstract: Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We… ▽ More Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image. Our approach unfolds in three stages. First, we introduce a lighting representation module that allows our diffusion model to encode lighting information from target image background. Second, we introduce an alignment network that aligns lighting features learned from image background with lighting features learned from panorama environment maps, which is a complete representation for scene illumination. Last, to further boost the photorealism of the proposed method, we introduce a novel data simulation pipeline that generates synthetic training pairs from a diverse range of natural images, which are used to refine the model. Our method outperforms existing benchmarks in visual fidelity and lighting coherence, showing superior generalization in real-world testing scenarios, highlighting its versatility and practicality. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 camera ready

arXiv:2312.06168 [pdf, other]

Motion Planning for Multiple Mobile Manipulator System in Complex Flipping Manipulation

Authors: Wenhang Liu, Kun Song, Meng Ren, Jiawei Hu, Michael Yu Wang, Zhenhua Xiong

Abstract: Multiple robot systems are favored for object manipulation and transportation, especially for large objects. However, in more complex manipulation such as flipping, these systems encounter a new challenge, configuration disconnectivity of manipulators. Grasping objects by manipulators will impose closed-chain constraints on the system, which in turn limits the feasible motions of manipulators and… ▽ More Multiple robot systems are favored for object manipulation and transportation, especially for large objects. However, in more complex manipulation such as flipping, these systems encounter a new challenge, configuration disconnectivity of manipulators. Grasping objects by manipulators will impose closed-chain constraints on the system, which in turn limits the feasible motions of manipulators and further compromises the configuration connectivity. Multiple mobile manipulator systems show much more flexibility in object manipulation with the mobility of the mobile platform and have the potential to address the above problem. In this paper, a novel planning framework is proposed for complex flipping manipulation by incorporating platform motions and regrasping. Firstly, two types of trajectories, mobile manipulator planning and regrasping planning, are classified and can be assigned different priorities for different tasks. Secondly, corresponding planning methods are designed for each type of trajectory. Specifically, in mobile manipulator planning, the configuration of the platform is determined through optimization to ensure connectivity when the manipulator approaches configuration boundaries. In regrasping planning, closed-chain constraints are temporarily disregarded and the manipulation capabilities are prioritized to facilitate subsequent planning. Finally, the structure of the overall planning framework is provided. Experimental results demonstrate that the proposed planner efficiently plans the motions of the system to accomplish flipping manipulation. Additionally, a comprehensive experiment emphasizes the significance of our planner in extending the capabilities of multiple mobile manipulator systems in complex tasks. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.05269 [pdf, other]

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

Authors: Ying Wang, Yanlai Yang, Mengye Ren

Abstract: In this paper we introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the zero-shot capabilities of pretrained large language models to perform reasoning over long-form video context. Furthermore, Life… ▽ More In this paper we introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the zero-shot capabilities of pretrained large language models to perform reasoning over long-form video context. Furthermore, Lifelong Memory uses a confidence and explanation module to produce confident, high-quality, and interpretable answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D. Code is available at https://github.com/Agentic-Learning-AI-Lab/lifelong-memory. △ Less

Submitted 29 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.03214 [pdf, other]

Optimistic Global Function Merger

Authors: Kyungwoo Lee, Manman Ren, Ellis Hoag

Abstract: Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function outlining and linker's identical code folding, despite substantial common ground. The traditional approaches necessitate the complete intermediate repres… ▽ More Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function outlining and linker's identical code folding, despite substantial common ground. The traditional approaches necessitate the complete intermediate representation to compare functions. Consequently, none of these approaches offer a scalable solution compatible with separate compilations while achieving global function merging, which is critical for large app development. In this paper, we introduce our global function merger, leveraging global merge information from previous code generation runs to optimistically create merging instances within each module context independently. Notably, our approach remains sound even when intermediate representations change, making it well-suited for distributed build environments. We present a comprehensive code generation framework that can run both the state-of-the-art global function outliner and our global function merger. These components complement each other, resulting in a positive impact on code size reduction. Our evaluation demonstrates that when integrating the global function merger with a state-of-the-art global function outliner that is fully optimized with ThinLTO, a further reduction of up to 3.5% in code size can be attained. This is in addition to the initial average reduction of 17.3% achieved through global function outlining for real-world iOS apps, all with minimal extra build time. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.17218 [pdf, other]

BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Authors: Yixuan Luo, Mengye Ren, Sai Qian Zhang

Abstract: Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically dema… ▽ More Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically demands significant computational resources in order to manage large training data batches (e.g., 4096). The significant memory and computation requirements pose a considerable challenge to its broad adoption. To mitigate this, we introduce a novel learning framework, termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves decomposing the MIM tasks into several sub-tasks with independent computation patterns, resulting in block-wise back-propagation operations instead of the traditional end-to-end approach. Our proposed BIM maintains superior performance compared to conventional MIM while greatly reducing peak memory consumption. Moreover, BIM naturally enables the concurrent training of numerous DNN backbones of varying depths. This leads to the creation of multiple trained DNN backbones, each tailored to different hardware platforms with distinct computing capabilities. This approach significantly reduces computational costs in comparison with training each DNN backbone individually. Our framework offers a promising solution for resource constrained training of MIM. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.12035 [pdf, other]

Delta Score: Improving the Binding Assessment of Structure-Based Drug Design Methods

Authors: Minsi Ren, Bowen Gao, Bo Qiang, Yanyan Lan

Abstract: Structure-based drug design (SBDD) stands at the forefront of drug discovery, emphasizing the creation of molecules that target specific binding pockets. Recent advances in this area have witnessed the adoption of deep generative models and geometric deep learning techniques, modeling SBDD as a conditional generation task where the target structure serves as context. Historically, evaluation of th… ▽ More Structure-based drug design (SBDD) stands at the forefront of drug discovery, emphasizing the creation of molecules that target specific binding pockets. Recent advances in this area have witnessed the adoption of deep generative models and geometric deep learning techniques, modeling SBDD as a conditional generation task where the target structure serves as context. Historically, evaluation of these models centered on docking scores, which quantitatively depict the predicted binding affinity between a molecule and its target pocket. Though state-of-the-art models purport that a majority of their generated ligands exceed the docking score of ground truth ligands in test sets, it begs the question: Do these scores align with real-world biological needs? In this paper, we introduce the delta score, a novel evaluation metric grounded in tangible pharmaceutical requisites. Our experiments reveal that molecules produced by current deep generative models significantly lag behind ground truth reference ligands when assessed with the delta score. This novel metric not only complements existing benchmarks but also provides a pivotal direction for subsequent research in the domain. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2311.10504 [pdf, ps, other]

Groupoid intertwiner and twist for dynamical Yang--Baxter equation: part I

Authors: Muze Ren

Abstract: Intertwiner is a homomorphism between two existing dynamical R matrices, first introduced by Baxter in eight vertex-SOS correspondence, we develop certain equivalence relations among R matrices using intertwiners. Twist is a homomorphism that twist a dynamical R matrix to get a new dynamical R matrix, we introduce a kind of notion of twist that generalize classical Drinfeld twist in quasi-triang… ▽ More Intertwiner is a homomorphism between two existing dynamical R matrices, first introduced by Baxter in eight vertex-SOS correspondence, we develop certain equivalence relations among R matrices using intertwiners. Twist is a homomorphism that twist a dynamical R matrix to get a new dynamical R matrix, we introduce a kind of notion of twist that generalize classical Drinfeld twist in quasi-triangular Hopf algebra and some dynamical twist. As applications, we obtain some examples of twists from Ocneanu cell calculus and Fendley--Ginsparg orbifold constructions. The relations between intertwiner and twist are also discussed, the groupoid structures are emphasized. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 39 pages, comments are welcome!

MSC Class: 17B37

arXiv:2311.02007 [pdf, other]

Towards Unsupervised Object Detection From LiDAR Point Clouds

Authors: Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, Raquel Urtasun

Abstract: In this paper, we study the problem of unsupervised object detection from 3D point clouds in self-driving scenes. We present a simple yet effective method that exploits (i) point clustering in near-range areas where the point clouds are dense, (ii) temporal consistency to filter out noisy unsupervised detections, (iii) translation equivariance of CNNs to extend the auto-labels to long range, and (… ▽ More In this paper, we study the problem of unsupervised object detection from 3D point clouds in self-driving scenes. We present a simple yet effective method that exploits (i) point clustering in near-range areas where the point clouds are dense, (ii) temporal consistency to filter out noisy unsupervised detections, (iii) translation equivariance of CNNs to extend the auto-labels to long range, and (iv) self-supervision for improving on its own. Our approach, OYSTER (Object Discovery via Spatio-Temporal Refinement), does not impose constraints on data collection (such as repeated traversals of the same location), is able to detect objects in a zero-shot manner without supervised finetuning (even in sparse, distant regions), and continues to self-improve given more rounds of iterative self-training. To better measure model performance in self-driving scenarios, we propose a new planning-centric perception metric based on distance-to-collision. We demonstrate that our unsupervised object detector significantly outperforms unsupervised baselines on PandaSet and Argoverse 2 Sensor dataset, showing promise that self-supervision combined with object priors can enable object discovery in the wild. For more information, visit the project website: https://waabi.ai/research/oyster △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: CVPR 2023

arXiv:2310.20373 [pdf]

doi 10.1021/acs.nanolett.3c03692

Chiral charge density wave and backscattering-immune orbital texture in monolayer 1T-TiTe2

Authors: Mingqiang Ren, Fangjun Cheng, Yufei Zhao, Mingqiang Gu, Qiangjun Cheng, Binghai Yan, Qihang Liu, Xucun Ma, Qikun Xue, Can-Li Song

Abstract: Non-trivial electronic states are attracting intense attention in low-dimensional physics. Though chirality has been identified in charge states with a scalar order parameter, its intertwining with charge density waves (CDW), film thickness and the impact on the electronic behaviors remain less well understood. Here, using scanning tunneling microscopy, we report a 2 x 2 chiral CDW as well as a st… ▽ More Non-trivial electronic states are attracting intense attention in low-dimensional physics. Though chirality has been identified in charge states with a scalar order parameter, its intertwining with charge density waves (CDW), film thickness and the impact on the electronic behaviors remain less well understood. Here, using scanning tunneling microscopy, we report a 2 x 2 chiral CDW as well as a strong suppression of the Te-5p hole-band backscattering in monolayer 1T-TiTe2. These exotic characters vanish in bilayer TiTe2 with a non-CDW state. Theoretical calculations approve that chirality comes from a helical stacking of the triple-q CDW components and therefore can persist at the two-dimensional limit. Furthermore, the chirality renders the Te-5p bands an unconventional orbital texture that prohibits electron backscattering. Our study establishes TiTe2 as a promising playground for manipulating the chiral ground states at the monolayer limit and provides a novel path to engineer electronic properties from an orbital degree. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 21 pages, 5 figures

Journal ref: Nano Letters (2023)

arXiv:2310.19849 [pdf, other]

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

Authors: Shiwei Liu, Tian Zhu, Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang

Abstract: Many crucial biological processes rely on networks of protein-protein interactions. Predicting the effect of amino acid mutations on protein-protein binding is vital in protein engineering and therapeutic discovery. However, the scarcity of annotated experimental data on binding energy poses a significant challenge for developing computational approaches, particularly deep learning-based methods.… ▽ More Many crucial biological processes rely on networks of protein-protein interactions. Predicting the effect of amino acid mutations on protein-protein binding is vital in protein engineering and therapeutic discovery. However, the scarcity of annotated experimental data on binding energy poses a significant challenge for developing computational approaches, particularly deep learning-based methods. In this work, we propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures. SidechainDiff utilizes a Riemannian diffusion model to learn the generative process of side-chain conformations and can also give the structural context representations of mutations on the protein-protein interface. Leveraging the learned representations, we achieve state-of-the-art performance in predicting the mutational effects on protein-protein binding. Furthermore, SidechainDiff is the first diffusion-based generative model for side-chains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.10239 [pdf, other]

Structural transfer learning of non-Gaussian DAG

Authors: Mingyang Ren, Xin He, Junhui Wang

Abstract: Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstr… ▽ More Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstruction in the target study. In this paper, we first introduce a novel set of structural similarity measures for DAG and then present a transfer DAG learning framework by effectively leveraging information from auxiliary DAGs of different levels of similarities. Our theoretical analysis shows substantial improvement in terms of DAG reconstruction in the target study, even when no auxiliary DAG is overall similar to the target DAG, which is in sharp contrast to most existing transfer learning methods. The advantage of the proposed transfer DAG learning is also supported by extensive numerical experiments on both synthetic data and multi-site brain functional connectivity network data. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 35 pages, 3 figures, 3 tables

arXiv:2310.07846 [pdf, other]

DAG-aware Synthesis Orchestration

Authors: Yingjie Li, Mingju Liu, Mark Ren, Alan Mishchenko, Cunxi Yu

Abstract: The key methodologies of modern logic synthesis techniques are conducted on multi-level technology-independent representations such as And-Inverter-Graphs (AIGs) of the digital logic via directed-acyclic-graph (DAGs) traversal based structural rewriting, resubstitution, and refactoring. Existing state-of-the-art DAG-aware logic synthesis algorithms are all designed to perform stand-alone optimizat… ▽ More The key methodologies of modern logic synthesis techniques are conducted on multi-level technology-independent representations such as And-Inverter-Graphs (AIGs) of the digital logic via directed-acyclic-graph (DAGs) traversal based structural rewriting, resubstitution, and refactoring. Existing state-of-the-art DAG-aware logic synthesis algorithms are all designed to perform stand-alone optimizations during a single DAG traversal. However, we empirically identify and demonstrate that these algorithms are limited in quality-of-results and runtime complexity due to this design concept. This work proposes Synthesis Orchestration, which orchestrates stand-alone operations within the single traversal of AIG. Thus, orchestration method explores more optimization opportunities and results in better performance. Our experimental results are comprehensively conducted on all 104 designs collected from ISCAS'85/89/99, VTR, and EPFL benchmark suites, with consistent logic minimization improvements over rewriting, resubstitution, refactoring, leading to an average of 4% more node reduction with improved runtime efficiency for the single optimization. Moreover, we evaluate orchestration as a plug-in algorithm in resyn and resyn3 flows in ABC, which demonstrates consistent logic minimization improvements (3.8% and 10.9% more node reduction on average). The runtime analysis demonstrates the orchestration outperforms stand-alone algorithms in both AIG minimization and runtime efficiency. Finally, we integrate the orchestration into OpenROAD for end-to-end performance evaluation. Our results demonstrate the advantages of the orchestration optimization technique, even after technology mapping and post-routing in the design flow have been conducted. △ Less

Submitted 14 July, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Journal ref: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024

arXiv:2310.06367 [pdf, other]

DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

Authors: Bowen Gao, Bo Qiang, Haichuan Tan, Minsi Ren, Yinjun Jia, Minsi Lu, Jingjing Liu, Weiying Ma, Yanyan Lan

Abstract: Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery. Traditional docking methods are highly time-consuming, and can only work with a restricted search library in real-life applications. Recent supervised learning approaches using scoring functions for binding-affinity prediction,… ▽ More Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery. Traditional docking methods are highly time-consuming, and can only work with a restricted search library in real-life applications. Recent supervised learning approaches using scoring functions for binding-affinity prediction, although promising, have not yet surpassed docking methods due to their strong dependency on limited data with reliable binding-affinity labels. In this paper, we propose a novel contrastive learning framework, DrugCLIP, by reformulating virtual screening as a dense retrieval task and employing contrastive learning to align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores. We also introduce a biological-knowledge inspired data augmentation strategy to learn better protein-molecule representations. Extensive experiments show that DrugCLIP significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks with highly reduced computation time, especially in zero-shot setting. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06283 [pdf, other]

Towards More Efficient Depression Risk Recognition via Gait

Authors: Min Ren, Muchan Tao, Xuecai Hu, Xiaotong Liu, Qiong Li, Yongzhen Huang

Abstract: Depression, a highly prevalent mental illness, affects over 280 million individuals worldwide. Early detection and timely intervention are crucial for promoting remission, preventing relapse, and alleviating the emotional and financial burdens associated with depression. However, patients with depression often go undiagnosed in the primary care setting. Unlike many physiological illnesses, depress… ▽ More Depression, a highly prevalent mental illness, affects over 280 million individuals worldwide. Early detection and timely intervention are crucial for promoting remission, preventing relapse, and alleviating the emotional and financial burdens associated with depression. However, patients with depression often go undiagnosed in the primary care setting. Unlike many physiological illnesses, depression lacks objective indicators for recognizing depression risk, and existing methods for depression risk recognition are time-consuming and often encounter a shortage of trained medical professionals. The correlation between gait and depression risk has been empirically established. Gait can serve as a promising objective biomarker, offering the advantage of efficient and convenient data collection. However, current methods for recognizing depression risk based on gait have only been validated on small, private datasets, lacking large-scale publicly available datasets for research purposes. Additionally, these methods are primarily limited to hand-crafted approaches. Gait is a complex form of motion, and hand-crafted gait features often only capture a fraction of the intricate associations between gait and depression risk. Therefore, this study first constructs a large-scale gait database, encompassing over 1,200 individuals, 40,000 gait sequences, and covering six perspectives and three types of attire. Two commonly used psychological scales are provided as depression risk annotations. Subsequently, a deep learning-based depression risk recognition model is proposed, overcoming the limitations of hand-crafted approaches. Through experiments conducted on the constructed large-scale database, the effectiveness of the proposed method is validated, and numerous instructive insights are presented in the paper, highlighting the significant potential of gait-based depression risk recognition. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.04728 [pdf, ps, other]

Baxterization for the dynamical Yang-Baxter equation

Authors: Muze Ren

Abstract: The Baxterization process for the dynamical Yang-Baxter equation is studied. We introduce the local dynamical Hecke ,Temperley-Lieb and Birman-Murakami-Wenzl operators, then by inserting spectral parameters, from each representation of these operators, we get dynamical R matrix under some conditions. As applications, we reformulate trigonometric degeneration of elliptic quantum group representatio… ▽ More The Baxterization process for the dynamical Yang-Baxter equation is studied. We introduce the local dynamical Hecke ,Temperley-Lieb and Birman-Murakami-Wenzl operators, then by inserting spectral parameters, from each representation of these operators, we get dynamical R matrix under some conditions. As applications, we reformulate trigonometric degeneration of elliptic quantum group representations and also get dynamical R matrix for critical ADE integrable lattice models. Through Baxterization, we construct some one dimensional integrable systems that are dynamical version of the Heisenberg spin chain. △ Less

Submitted 22 January, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Corrected many typos, added the hyperbolic and affine ADE cases

MSC Class: 17B37

arXiv:2310.02499 [pdf, ps, other]

Stanley-Wilf Limits for Patterns in Rooted Labeled Forests

Authors: Michael Ren

Abstract: Building off recent work of Garg and Peng, we continue the investigation into classical and consecutive pattern avoidance in rooted forests. We prove a forest analogue of the Stanley-Wilf conjecture for avoiding a single pattern as well as certain other sets of patterns. Our techniques are analytic, easily generalizing to different types of pattern avoidance and allowing for computations of conver… ▽ More Building off recent work of Garg and Peng, we continue the investigation into classical and consecutive pattern avoidance in rooted forests. We prove a forest analogue of the Stanley-Wilf conjecture for avoiding a single pattern as well as certain other sets of patterns. Our techniques are analytic, easily generalizing to different types of pattern avoidance and allowing for computations of convergent lower bounds of the forest Stanley-Wilf limit in the cases covered by our result. We end with several open questions and directions for future research, including some on the limit distributions of certain statistics of pattern-avoiding forests. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 16 pages, 4 figures. This article used to be contained in arXiv:2007.12690, but that article has now been split into two separate papers. This is the second of the two. Comments welcome!

MSC Class: 05A05; 05A16; 05C30

arXiv:2310.01680 [pdf, other]

Keypoint-Augmented Self-Supervised Learning for Medical Image Segmentation with Limited Annotation

Authors: Zhangsihao Yang, Mengwei Ren, Kaize Ding, Guido Gerig, Yalin Wang

Abstract: Pretraining CNN models (i.e., UNet) through self-supervision has become a powerful approach to facilitate medical image segmentation under low annotation regimes. Recent contrastive learning methods encourage similar global representations when the same image undergoes different transformations, or enforce invariance across different image/patch features that are intrinsically correlated. However,… ▽ More Pretraining CNN models (i.e., UNet) through self-supervision has become a powerful approach to facilitate medical image segmentation under low annotation regimes. Recent contrastive learning methods encourage similar global representations when the same image undergoes different transformations, or enforce invariance across different image/patch features that are intrinsically correlated. However, CNN-extracted global and local features are limited in capturing long-range spatial dependencies that are essential in biological anatomy. To this end, we present a keypoint-augmented fusion layer that extracts representations preserving both short- and long-range self-attention. In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial self-attention among localized keypoint features. Further, we introduce both global and local self-supervised pretraining for the framework. At the global scale, we obtain global representations from both the bottleneck of the UNet, and by aggregating multiscale keypoint features. These global features are subsequently regularized through image-level contrastive objectives. At the local scale, we define a distance-based criterion to first establish correspondences among keypoints and encourage similarity between their features. Through extensive experiments on both MRI and CT segmentation tasks, we demonstrate the architectural advantages of our proposed method in comparison to both CNN and Transformer-based UNets, when all architectures are trained with randomly initialized weights. With our proposed pretraining strategy, our method further outperforms existing SSL methods by producing more robust self-attention and achieving state-of-the-art segmentation results. The code is available at https://github.com/zshyang/kaf.git. △ Less

Submitted 18 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Camera ready for NeurIPS 2023. Code available at https://github.com/zshyang/kaf.git

arXiv:2309.09552 [pdf, other]

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Authors: Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang

Abstract: The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS)… ▽ More The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models. △ Less

Submitted 6 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024

arXiv:2309.00180 [pdf]

Exploring the law of text geographic information

Authors: Zhenhua Wang, Daiyu Zhang, Ming Ren, Guang Xu

Abstract: Textual geographic information is indispensable and heavily relied upon in practical applications. The absence of clear distribution poses challenges in effectively harnessing geographic information, thereby driving our quest for exploration. We contend that geographic information is influenced by human behavior, cognition, expression, and thought processes, and given our intuitive understanding o… ▽ More Textual geographic information is indispensable and heavily relied upon in practical applications. The absence of clear distribution poses challenges in effectively harnessing geographic information, thereby driving our quest for exploration. We contend that geographic information is influenced by human behavior, cognition, expression, and thought processes, and given our intuitive understanding of natural systems, we hypothesize its conformity to the Gamma distribution. Through rigorous experiments on a diverse range of 24 datasets encompassing different languages and types, we have substantiated this hypothesis, unearthing the underlying regularities governing the dimensions of quantity, length, and distance in geographic information. Furthermore, theoretical analyses and comparisons with Gaussian distributions and Zipf's law have refuted the contingency of these laws. Significantly, we have estimated the upper bounds of human utilization of geographic information, pointing towards the existence of uncharted territories. Also, we provide guidance in geographic information extraction. Hope we peer its true countenance uncovering the veil of geographic information. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: IPM

arXiv:2309.00178 [pdf]

Will Sentiment Analysis Need Subculture? A New Data Augmentation Approach

Authors: Zhenhua Wang, Simin He, Guang Xu, Ming Ren

Abstract: The renowned proverb that "The pen is mightier than the sword" underscores the formidable influence wielded by text expressions in shaping sentiments. Indeed, well-crafted written can deeply resonate within cultures, conveying profound sentiments. Nowadays, the omnipresence of the Internet has fostered a subculture that congregates around the contemporary milieu. The subculture artfully articulate… ▽ More The renowned proverb that "The pen is mightier than the sword" underscores the formidable influence wielded by text expressions in shaping sentiments. Indeed, well-crafted written can deeply resonate within cultures, conveying profound sentiments. Nowadays, the omnipresence of the Internet has fostered a subculture that congregates around the contemporary milieu. The subculture artfully articulates the intricacies of human feelings by ardently pursuing the allure of novelty, a fact that cannot be disregarded in the sentiment analysis. This paper strives to enrich data through the lens of subculture, to address the insufficient training data faced by sentiment analysis. To this end, a new approach of subculture-based data augmentation (SCDA) is proposed, which engenders six enhanced texts for each training text by leveraging the creation of six diverse subculture expression generators. The extensive experiments attest to the effectiveness and potential of SCDA. The results also shed light on the phenomenon that disparate subculture expressions elicit varying degrees of sentiment stimulation. Moreover, an intriguing conjecture arises, suggesting the linear reversibility of certain subculture expressions. It is our fervent aspiration that this study serves as a catalyst in fostering heightened perceptiveness towards the tapestry of information, sentiment and culture, thereby enriching our collective understanding. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: JASIST

arXiv:2308.14024 [pdf, other]

Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

Authors: Hongda Liu, Yunlong Wang, Min Ren, Junxing Hu, Zhengquan Luo, Guangqi Hou, Zhenan Sun

Abstract: Skeleton-based action recognition has recently made significant progress. However, data imbalance is still a great challenge in real-world scenarios. The performance of current action recognition algorithms declines sharply when training data suffers from heavy class imbalance. The imbalanced data actually degrades the representations learned by these methods and becomes the bottleneck for action… ▽ More Skeleton-based action recognition has recently made significant progress. However, data imbalance is still a great challenge in real-world scenarios. The performance of current action recognition algorithms declines sharply when training data suffers from heavy class imbalance. The imbalanced data actually degrades the representations learned by these methods and becomes the bottleneck for action recognition. How to learn unbiased representations from imbalanced action data is the key to long-tailed action recognition. In this paper, we propose a novel balanced representation learning method to address the long-tailed problem in action recognition. Firstly, a spatial-temporal action exploration strategy is presented to expand the sample space effectively, generating more valuable samples in a rebalanced manner. Secondly, we design a detached action-aware learning schedule to further mitigate the bias in the representation space. The schedule detaches the representation learning of tail classes from training and proposes an action-aware loss to impose more effective constraints. Additionally, a skip-modal representation is proposed to provide complementary structural information. The proposed method is validated on four skeleton datasets, NTU RGB+D 60, NTU RGB+D 120, NW-UCLA, and Kinetics. It not only achieves consistently large improvement compared to the state-of-the-art (SOTA) methods, but also demonstrates a superior generalization capacity through extensive experiments. Our code is available at https://github.com/firework8/BRL. △ Less

Submitted 27 August, 2023; originally announced August 2023.

arXiv:2308.09835 [pdf, other]

Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis

Authors: Shijie Li, Mengwei Ren, Thomas Ach, Guido Gerig

Abstract: Current deep learning-based approaches for the segmentation of microscopy images heavily rely on large amount of training data with dense annotation, which is highly costly and laborious in practice. Compared to full annotation where the complete contour of objects is depicted, point annotations, specifically object centroids, are much easier to acquire and still provide crucial information about… ▽ More Current deep learning-based approaches for the segmentation of microscopy images heavily rely on large amount of training data with dense annotation, which is highly costly and laborious in practice. Compared to full annotation where the complete contour of objects is depicted, point annotations, specifically object centroids, are much easier to acquire and still provide crucial information about the objects for subsequent segmentation. In this paper, we assume access to point annotations only during training and develop a unified pipeline for microscopy image segmentation using synthetically generated training data. Our framework includes three stages: (1) it takes point annotations and samples a pseudo dense segmentation mask constrained with shape priors; (2) with an image generative model trained in an unpaired manner, it translates the mask to a realistic microscopy image regularized by object level consistency; (3) the pseudo masks along with the synthetic images then constitute a pairwise dataset for training an ad-hoc segmentation model. On the public MoNuSeg dataset, our synthesis pipeline produces more diverse and realistic images than baseline models while maintaining high coherence between input masks and generated images. When using the identical segmentation backbones, the models trained on our synthetic dataset significantly outperform those trained with pseudo-labels or baseline-generated images. Moreover, our framework achieves comparable results to models trained on authentic microscopy images with dense labels, demonstrating its potential as a reliable and highly efficient alternative to labor-intensive manual pixel-wise annotations in microscopy image segmentation. The code is available. △ Less

Submitted 8 December, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted by The 3rd MICCAI Workshop on Data Augmentation, Labeling, and Imperfections

arXiv:2308.03101 [pdf, ps, other]

The additively idempotent semiring $S_7^0$ is nonfinitely based

Authors: Yanan Wu, Miaomiao Ren, Xianzhong Zhao

Abstract: We show that the additively idempotent semiring $S_7^0$ has no finite basis for its equational theory. This answers an open problem posed by Jackson et al. (J. Algebra 611 (2022), 211--245). We show that the additively idempotent semiring $S_7^0$ has no finite basis for its equational theory. This answers an open problem posed by Jackson et al. (J. Algebra 611 (2022), 211--245). △ Less

Submitted 6 August, 2023; originally announced August 2023.

MSC Class: 16Y60; 03C05; 08B05

arXiv:2307.15901 [pdf, other]

Bright Second Harmonic Emission from Photonic Crystal Vertical Cavity

Authors: Lun Qu, Zhidong Gu, Chenyang Li, Yuan Qin, Yiting Zhang, Di Zhang, Jiaxian Zhao, Qiang Liu, Chunyan Jin, Lishuan Wang, Wei Wu, Wei Cai, Huasong Liu, Mengxin Ren, Jingjun Xu

Abstract: We present a study on photonic vertical cavities consisting of nonlinear materials embedded in photonic crystals (PhCs) for resonantly enhancing second harmonic generation (SHG). Previous attempts at SHG in such structures have been limited to efficiencies of 10$^{-7}$ to 10$^{-5}$, but we demonstrate here a high SHG efficiency of 0.28% by constructing a vertical cavity with a lithium niobate memb… ▽ More We present a study on photonic vertical cavities consisting of nonlinear materials embedded in photonic crystals (PhCs) for resonantly enhancing second harmonic generation (SHG). Previous attempts at SHG in such structures have been limited to efficiencies of 10$^{-7}$ to 10$^{-5}$, but we demonstrate here a high SHG efficiency of 0.28% by constructing a vertical cavity with a lithium niobate membrane placed between two PhCs, which exhibits high quality resonances. Our results open up new possibilities for compact laser frequency converters that could have a revolutionary impact on the fields of nonlinear optics and photonics. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.14617 [pdf, other]

Multiscale Dynamic Graph Representation for Biometric Recognition with Occlusions

Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Kunbo Zhang, Zhenan Sun

Abstract: Occlusion is a common problem with biometric recognition in the wild. The generalization ability of CNNs greatly decreases due to the adverse effects of various occlusions. To this end, we propose a novel unified framework integrating the merits of both CNNs and graph models to overcome occlusion problems in biometric recognition, called multiscale dynamic graph representation (MS-DGR). More speci… ▽ More Occlusion is a common problem with biometric recognition in the wild. The generalization ability of CNNs greatly decreases due to the adverse effects of various occlusions. To this end, we propose a novel unified framework integrating the merits of both CNNs and graph models to overcome occlusion problems in biometric recognition, called multiscale dynamic graph representation (MS-DGR). More specifically, a group of deep features reflected on certain subregions is recrafted into a feature graph (FG). Each node inside the FG is deemed to characterize a specific local region of the input sample, and the edges imply the co-occurrence of non-occluded regions. By analyzing the similarities of the node representations and measuring the topological structures stored in the adjacent matrix, the proposed framework leverages dynamic graph matching to judiciously discard the nodes corresponding to the occluded parts. The multiscale strategy is further incorporated to attain more diverse nodes representing regions of various sizes. Furthermore, the proposed framework exhibits a more illustrative and reasonable inference by showing the paired nodes. Extensive experiments demonstrate the superiority of the proposed framework, which boosts the accuracy in both natural and occlusion-simulated cases by a large margin compared with that of baseline methods. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: Accepted by TPAMI

arXiv:2306.15713 [pdf, other]

doi 10.1007/978-3-031-19842-7_16

Rethinking Closed-loop Training for Autonomous Driving

Authors: Chris Zhang, Runsheng Guo, Wenyuan Zeng, Yuwen Xiong, Binbin Dai, Rui Hu, Mengye Ren, Raquel Urtasun

Abstract: Recent advances in high-fidelity simulators have enabled closed-loop training of autonomous driving agents, potentially solving the distribution shift in training v.s. deployment and allowing training to be scaled both safely and cheaply. However, there is a lack of understanding of how to build effective training benchmarks for closed-loop training. In this work, we present the first empirical st… ▽ More Recent advances in high-fidelity simulators have enabled closed-loop training of autonomous driving agents, potentially solving the distribution shift in training v.s. deployment and allowing training to be scaled both safely and cheaply. However, there is a lack of understanding of how to build effective training benchmarks for closed-loop training. In this work, we present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents, such as how to design traffic scenarios and scale training environments. Furthermore, we show that many popular RL algorithms cannot achieve satisfactory performance in the context of autonomous driving, as they lack long-term planning and take an extremely long time to train. To address these issues, we propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead and exploits cheaply generated imagined data for efficient learning. Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines. For more information, visit the project website: https://waabi.ai/research/travl △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: ECCV 2022

Showing 1–50 of 192 results for author: Ren, M