subscribe to arXiv mailings

Restoring Images in Adverse Weather Conditions via Histogram Transformer

Authors: Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, Xiaochun Cao

Abstract: Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly… ▽ More Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly cause similar occlusion and brightness, in this work, we propose an efficient Histogram Transformer (Histoformer) for restoring images affected by adverse weather. It is powered by a mechanism dubbed histogram self-attention, which sorts and segments spatial features into intensity-based bins. Self-attention is then applied across bins or within each bin to selectively focus on spatial features of dynamic range and process similar degraded pixels of the long range together. To boost histogram self-attention, we present a dynamic-range convolution enabling conventional convolution to conduct operation over similar pixels rather than neighbor pixels. We also observe that the common pixel-wise losses neglect linear association and correlation between output and ground-truth. Thus, we propose to leverage the Pearson correlation coefficient as a loss function to enforce the recovered pixels following the identical order as ground-truth. Extensive experiments demonstrate the efficacy and superiority of our proposed method. We have released the codes in Github. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 19 pages, 7 figures, 10MB

arXiv:2407.02047 [pdf, other]

CountFormer: Multi-View Crowd Counting Transformer

Authors: Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren

Abstract: Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise… ▽ More Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise 3D MVC framework called \textbf{CountFormer}to elevate multi-view image-level features to a scene-level volume representation and estimate the 3D density map based on the volume features. By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features, enabling it to handle various camera layouts with significant differences.Furthermore, we introduce a feature lifting module capitalized on the attention mechanism to transform image-level features into a 3D volume representation for each camera view. Subsequently, the multi-view volume aggregation module attentively aggregates various multi-view volumes to create a comprehensive scene-level volume representation, allowing CountFormer to handle images captured by arbitrary dynamic camera layouts. The proposed method performs favorably against the state-of-the-art approaches across various widely used datasets, demonstrating its greater suitability for real-world deployment compared to conventional MVC frameworks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted By ECCV2024

arXiv:2407.00707 [pdf, other]

Deep learning quantum Monte Carlo for solids

Authors: Yubing Qian, Xiang Li, Zhe Li, Weiluo Ren, Ji Chen

Abstract: Deep learning has deeply changed the paradigms of many research fields. At the heart of chemical and physical sciences is the accurate ab initio calculation of many-body wavefunction, which has become one of the most notable examples to demonstrate the power of deep learning in science. In particular, the introduction of deep learning into quantum Monte Carlo (QMC) has significantly advanced the f… ▽ More Deep learning has deeply changed the paradigms of many research fields. At the heart of chemical and physical sciences is the accurate ab initio calculation of many-body wavefunction, which has become one of the most notable examples to demonstrate the power of deep learning in science. In particular, the introduction of deep learning into quantum Monte Carlo (QMC) has significantly advanced the frontier of ab initio calculation, offering a universal tool to solve the electronic structure of materials and molecules. Deep learning QMC architectures were initial designed and tested on small molecules, focusing on comparisons with other state-of-the-art ab initio methods. Methodological developments, including extensions to real solids and periodic models, have been rapidly progressing and reported applications are fast expanding. This review covers the theoretical foundation of deep learning QMC for solids, the neural network wavefunction ansatz, and various of other methodological developments. Applications on computing energy, electron density, electric polarization, force and stress of real solids are also reviewed. The methods have also been extended to other periodic systems and finite temperature calculations. The review highlights the potentials and existing challenges of deep learning QMC in materials chemistry and condensed matter physics. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18013 [pdf, other]

Effects of model size in density-functional-theory study of alloys: A case study of CsPbBr$_2$Cl

Authors: Fang Pan, Lin Yang, Zhuangde Jiang, Wei Ren, Zuo-Guang Ye, Jingrui Li

Abstract: The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distr… ▽ More The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distribution of alloy formation energy becomes narrower when the size of the model system increases along $\sqrt{2}\times\sqrt{2}\times2$, $2\times2\times2$, and $2\sqrt{2}\times2\sqrt{2}\times2$ models. This is primarily because the distribution of Br distribution parameters, which plays a leading role in determining the formation energy range, is more narrow for larger models. As a result, larger entropy stability effect can be observed with larger models especially at high temperatures, for which the approximation using mixing entropy based on the ideal solution model becomes better. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17598 [pdf, other]

Prior-Informed AGN-Host Spectral Decomposition Using PyQSOFit

Authors: Wenke Ren, Hengxiao Guo, Yue Shen, John D. Silverman, Colin J. Burke, Shu Wang, Junxian Wang

Abstract: We introduce an improved method for decomposing the emission of active galactic nuclei (AGN) and their host galaxies using templates from principal component analysis (PCA). This approach integrates prior information from PCA with a penalized pixel fitting mechanism which improves the precision and effectiveness of the decomposition process. Specifically, we have reduced the degeneracy and over-fi… ▽ More We introduce an improved method for decomposing the emission of active galactic nuclei (AGN) and their host galaxies using templates from principal component analysis (PCA). This approach integrates prior information from PCA with a penalized pixel fitting mechanism which improves the precision and effectiveness of the decomposition process. Specifically, we have reduced the degeneracy and over-fitting in AGN-host decomposition, particularly for those with low signal-to-noise ratios (SNR), where traditional methods tend to fail. By applying our method to 76,565 SDSS Data Release 16 quasars with $z<0.8$, we achieve a success rate of $\approx$ 94%, thus establishing the largest host-decomposed spectral catalog of quasars to date. Our fitting results consider the impact of the host galaxy on the overestimation of the AGN luminosity and black hole mass ($M_{\rm BH}$). Furthermore, we obtained stellar velocity dispersion ($σ_*$) measurements for 4,137 quasars. The slope of the $M_{\rm BH}-σ_*$ relation in this subsample is generally consistent with previous quasar studies beyond the local universe. Our method provides a robust and efficient approach to disentangle the AGN and host galaxy components across a wide range of SNRs and redshifts. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 21 pages, 9 figures, 2 tables. Submitted to ApJ, comments welcome

arXiv:2406.13169 [pdf, other]

A surprising excess of radio emission in extremely stable quasars: a unique clue to jet launching?

Authors: Wen-Yong Kang, Jun-Xian Wang, Zhen-Yi Cai, Hao-Chen Wang, Wen-Ke Ren, Mai Liao, Feng Yuan, Andrzej Zdziarski, Xinwu Cao

Abstract: Quasars are generally divided into jetted radio-loud and non-jetted radio-quiet ones, but why only 10% quasars are radio loud has been puzzling for decades. Other than jet-induced-phenomena, black hole mass, or Eddington ratio, prominent difference between jetted and non-jetted quasars has scarcely been detected. Here we show a unique distinction between them and the mystery of jet launching could… ▽ More Quasars are generally divided into jetted radio-loud and non-jetted radio-quiet ones, but why only 10% quasars are radio loud has been puzzling for decades. Other than jet-induced-phenomena, black hole mass, or Eddington ratio, prominent difference between jetted and non-jetted quasars has scarcely been detected. Here we show a unique distinction between them and the mystery of jet launching could be disclosed by a prominent excess of radio emission in extremely stable quasars (ESQs, i.e., type 1 quasars with extremely weak variability in UV/optical over 10 years). Specifically, we find that $>$ 25% of the ESQs are detected by the FIRST/VLASS radio survey, while only $\sim$ 6-8% of the control sample, matched in redshift, luminosity, and Eddington ratio, are radio-detected. The excess of radio detection in ESQs has a significance of 4.4 $σ$ (99.9995%), and dominantly occurs at intermediate radio loudness with R $\sim$ 10 - 60. The radio detection fraction of ESQs also tends to increase in the ESQ samples selected with more stringent thresholds. Our results are in contrast to the common view that RL quasars are likely more variable in UV/optical due to jet contribution. New clues/challenge posed by our findings highlight the importance of extensive follow-up observations to probe the nature of jets in ESQs, and theoretical studies on the link between jet launching and ESQs. Moreover, our results makes ESQs, an essential population which has never been explored, unique targets in the burgeoning era of time domain astronomy, like their opposite counterparts of quasars exhibiting extreme variability or changing-look features. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 11 pages, 16 figures, Accepted by ApJ

arXiv:2406.11134 [pdf, other]

Emergent Wigner phases in moiré superlattice from deep learning

Authors: Xiang Li, Yubing Qian, Weiluo Ren, Yang Xu, Ji Chen

Abstract: Moiré superlattice designed in stacked van der Waals material provides a dynamic platform for hosting exotic and emergent condensed matter phenomena. However, the relevance of strong correlation effects and the large size of moiré unit cells pose significant challenges for traditional computational techniques. To overcome these challenges, we develop an unsupervised deep learning approach to uncov… ▽ More Moiré superlattice designed in stacked van der Waals material provides a dynamic platform for hosting exotic and emergent condensed matter phenomena. However, the relevance of strong correlation effects and the large size of moiré unit cells pose significant challenges for traditional computational techniques. To overcome these challenges, we develop an unsupervised deep learning approach to uncover electronic phases emerging from moiré systems based on variational optimization of neural network many-body wavefunction. Our approach has identified diverse quantum states, including novel phases such as generalized Wigner crystals, Wigner molecular crystals, and previously unreported Wigner covalent crystals. These discoveries provide insights into recent experimental studies and suggest new phases for future exploration. They also highlight the crucial role of spin polarization in determining Wigner phases. More importantly, our proposed deep learning approach is proven general and efficient, offering a powerful framework for studying moiré physics. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.06674 [pdf, other]

Systematic Collapse of the Accretion Disc Across the Supermassive Black Hole Population

Authors: Scott Hagen, Chris Done, John D. Silverman, Junyao Li, Teng Liu, Wenke Ren, Johannes Buchner, Andrea Merloni, Tohru Nagao, Mara Salvato

Abstract: The structure of the accretion flow onto supermassive black holes (SMBH) is not well understood. Standard disc models match to zeroth order in predicting substantial energy dissipation within optically-thick material producing a characteristic strong blue/UV continuum. However they fail at reproducing more detailed comparisons to the observed spectral shapes along with their observed variability.… ▽ More The structure of the accretion flow onto supermassive black holes (SMBH) is not well understood. Standard disc models match to zeroth order in predicting substantial energy dissipation within optically-thick material producing a characteristic strong blue/UV continuum. However they fail at reproducing more detailed comparisons to the observed spectral shapes along with their observed variability. Based on stellar mass black holes within our galaxy, accretion discs should undergo a transition into an X-ray hot, radiatively inefficient flow, below a (mass scaled) luminosity of $\sim 0.02\,L_{\rm{Edd}}$. While this has been seen in limited samples of nearby low-luminosity active galactic nuclei (AGN) and a few rare changing-look AGN, it is not at all clear whether this transition is present in the wider AGN population across cosmic time. A key issue is the difficulty in disentangling a change in spectral state from increased dust obscuration and/or host galaxy contamination, effectively drowning out the AGN emission. Here we use the new eROSITA eFEDS Survey to identify unobscured AGN from their X-ray emission, matched to excellent optical imaging from Subaru's Hyper Suprime-Cam; allowing the subtraction of the host galaxy contamination. The resulting, uncontaminated, AGN spectra reveal a smooth transition from a strongly disc dominated state in bright AGN, to the collapse of the disc into an inefficient X-ray plasma in the low luminosity AGN, with the transition occurring at $\sim 0.02\,L_{\rm{Edd}}$; revealing fundamental aspects of accretion physics in AGN. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figure, 1 appendix - submitted to MNRAS, comments welcome

arXiv:2406.02746 [pdf, other]

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Authors: Jinghan Zhang, Xiting Wang, Weijieying Ren, Lu Jiang, Dongjie Wang, Kunpeng Liu

Abstract: Large Language Models (LLMs) gain substantial reasoning and decision-making capabilities from thought structures. However, existing methods such as Tree of Thought and Retrieval Augmented Thoughts often fall short in complex tasks due to the limitations of insufficient local retrieval of factual knowledge and inadequate global selection of strategies. These limitations make it challenging for thes… ▽ More Large Language Models (LLMs) gain substantial reasoning and decision-making capabilities from thought structures. However, existing methods such as Tree of Thought and Retrieval Augmented Thoughts often fall short in complex tasks due to the limitations of insufficient local retrieval of factual knowledge and inadequate global selection of strategies. These limitations make it challenging for these methods to balance factual accuracy and comprehensive logical optimization effectively. To address these limitations, we introduce the Retrieval Augmented Thought Tree (RATT), a novel thought structure that considers both overall logical soundness and factual correctness at each step of the thinking process. Specifically, at every point of a thought branch, RATT performs planning and lookahead to explore and evaluate multiple potential reasoning steps, and integrate the fact-checking ability of Retrieval-Augmented Generation (RAG) with LLM's ability to assess overall strategy. Through this combination of factual knowledge and strategic feasibility, the RATT adjusts and integrates the thought tree structure to search for the most promising branches within the search space. This thought structure significantly enhances the model's coherence in logical inference and efficiency in decision-making, and thus increases the limit of the capacity of LLM to generate reliable inferences and decisions based on thought structures. A broad range of experiments on different types of tasks showcases that the RATT structure significantly outperforms existing methods in factual correctness and logical coherence. △ Less

Submitted 11 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01574 [pdf, other]

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Authors: Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

Abstract: In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in… ▽ More In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field. △ Less

Submitted 23 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01222 [pdf, other]

Symmetry enforced solution of the many-body Schrödinger equation with deep neural network

Authors: Zhe Li, Zixiang Lu, Ruichen Li, Xuelan Wen, Xiang Li, Liwei Wang, Ji Chen, Weiluo Ren

Abstract: The integration of deep neural networks with the Variational Monte Carlo (VMC) method has marked a significant advancement in solving the Schrödinger equation. In this work, we enforce spin symmetry in the neural network-based VMC calculation with modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low comput… ▽ More The integration of deep neural networks with the Variational Monte Carlo (VMC) method has marked a significant advancement in solving the Schrödinger equation. In this work, we enforce spin symmetry in the neural network-based VMC calculation with modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low computational cost. It predicts accurate energies while maintaining the correct symmetry in strongly correlated systems, even in cases where different spin states are nearly degenerate. Our approach also excels at spin-gap calculations, including the singlet-triplet gap in biradical systems, which is of high interest in photochemistry. Overall, this work establishes a robust framework for efficiently calculating various quantum states with specific spin symmetry in correlated systems, paving the way for novel discoveries in quantum science. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.18715 [pdf, other]

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Authors: Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng

Abstract: Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusi… ▽ More Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusion scenarios. In this paper, we introduce NeRF On-the-go, a simple yet effective approach that enables the robust synthesis of novel views in complex, in-the-wild scenes from only casually captured image sequences. Delving into uncertainty, our method not only efficiently eliminates distractors, even when they are predominant in captures, but also achieves a notably faster convergence speed. Through comprehensive experiments on various scenes, our method demonstrates a significant improvement over state-of-the-art techniques. This advancement opens new avenues for NeRF in diverse and dynamic real-world applications. △ Less

Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: CVPR 2024, first two authors contributed equally. Project Page: https://rwn17.github.io/nerf-on-the-go/

arXiv:2405.07595 [pdf, other]

Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection

Authors: Dehong Kong, Siyuan Liang, Wenqi Ren

Abstract: Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches… ▽ More Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches that appear natural to the human eye while ensuring a high attack success rate. We notice that patches are natural looking when their overall color is consistent with the environment. Therefore, we propose a new method named Environmental Matching Attack(EMA) to address the issue of optimizing the adversarial patch under the constraints of color. To the best of our knowledge, this paper is the first to consider natural patches in the domain of UAVs. The EMA method exploits strong prior knowledge of a pretrained stable diffusion to guide the optimization direction of the adversarial patch, where the text guidance can restrict the color of the patch. To better match the environment, the contrast and brightness of the patch are appropriately adjusted. Instead of optimizing the adversarial patch itself, we optimize an adversarial perturbation patch which initializes to zero so that the model can better trade off attacking performance and naturalness. Experiments conducted on the DroneVehicle and Carpk datasets have shown that our work can reach nearly the same attack performance in the digital attack(no greater than 2 in mAP$\%$), surpass the baseline method in the physical specific scenarios, and exhibit a significant advantage in terms of naturalness in visualization and color difference with the environment. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07120 [pdf]

doi 10.1103/PhysRevApplied.21.054019

Quasiparticle and Excitonic Structures of Few-layer and Bulk GaSe: Interlayer Coupling, Self-energy, and Electron-hole Interaction

Authors: Fanhao Jia, Zhao Tang, Greis J. Cruz, Weiwei Gao, Shaowen Xu, Wei Ren, Peihong Zhang

Abstract: Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepa… ▽ More Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepancies between different experiments. We have performed GW plus Bethe-Salpeter equation (BSE) calculations for few-layer and bulk GaSe, aiming at understanding the effects of interlayer coupling and dielectric screening on excited state properties of GaSe, and how the electronic and optical properties evolve from strongly two-dimensional (2D) like to intermediate thick layers, and to three-dimensional (3D) bulk character. Using a new definition of the exciton binding energy, we are able to calculate the binding energies of all excitonic states. Our results reveal an interesting correlation between the binding energy of an exciton and the spread of its wave function in the real and momentum spaces. We find that the existence of (nearly) parallel valence and conduction bands facilitates the formation of excitonic states that spread out in the momentum space. Thus, these excitons tend to be more localized in real space and have large exciton binding energies. The interlayer coupling substantially suppresses the Mexican-hat-like dispersion of the top valence band seen in monolayer system, explaining the greatly enhanced photoluminescence (PL) as layer thickness increases. Our results also help resolve apparent discrepancies between different experiments. After including the quasiparticle and excitonic effects as well the optical activities of excitons, our results compare well with available experimental results. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Journal ref: Phys. Rev. Applied 21, 054019 (2024)

arXiv:2405.04832 [pdf, other]

Hierarchical Characterization of Thermoelectric Performance in Copper-Based Chalcogenide CsCu$_3$S$_2$: Unveiling the role of Anharmonic Lattice Dynamics

Authors: Jincheng Yue, Junda Li, Jiongzhi Zheng, Xingchen Shen, Wenling Ren, Yanhui Liu, Tian Cui

Abstract: We explicitly consider both phonon energy shifts and broadening arising from both cubic and quartic anharmonicities, as well as diagonal/non-diagonal terms of heat flux operators in thermal conductivity. Our findings show that the strong anharmonicity of CsCu$_3$S$_2$ primarily arises from the presence of $p$-$d$ anti-bonding hybridization between Cu and S atoms, coupled with the random oscillatio… ▽ More We explicitly consider both phonon energy shifts and broadening arising from both cubic and quartic anharmonicities, as well as diagonal/non-diagonal terms of heat flux operators in thermal conductivity. Our findings show that the strong anharmonicity of CsCu$_3$S$_2$ primarily arises from the presence of $p$-$d$ anti-bonding hybridization between Cu and S atoms, coupled with the random oscillations of Cs atoms. Notably, the competition between phonon hardening described by the loop diagram and softening induced by the bubble diagram significantly influences particle-like propagation, predominantly reflected in group velocity and energy-conservation rule. Additionally, the electrical transport properties are determined by employing the precise momentum relaxation-time approximation (MRTA). At high temperatures, the thermoelectric performance of $p$-type CsCu$_3$S$_2$ reaches its optimum theoretical value of 0.94 along the in-plane direction based on advanced phonon renormalization theory. In striking contrast, the harmonic approximation theory significantly overestimates the thermoelectric efficiency at the same temperatures, rendering it an impractical expectation. Conversely, the first-order renormalization approach leads to a serious underestimation of the thermoelectric properties due to the over-correction of phonon energy. Our study not only reveals the pivotal role of anharmonic lattice dynamics in accurately assessing thermoelectric properties but also underscores the potential thermoelectric applications for novel copper-based chalcogenides. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03150 [pdf, other]

Video Diffusion Models: A Survey

Authors: Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter

Abstract: Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends.… ▽ More Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.00924 [pdf, ps, other]

Zonotope-based Symbolic Controller Synthesis for Linear Temporal Logic Specifications

Authors: Wei Ren, Raphael M. Jungers, Dimos V. Dimarogonas

Abstract: This paper studies the controller synthesis problem for nonlinear control systems under linear temporal logic (LTL) specifications using zonotope techniques. A local-to-global control strategy is proposed for the desired specification expressed as an LTL formula. First, a novel approach is developed to divide the state space into finite zonotopes and constrained zonotopes, which are called cells a… ▽ More This paper studies the controller synthesis problem for nonlinear control systems under linear temporal logic (LTL) specifications using zonotope techniques. A local-to-global control strategy is proposed for the desired specification expressed as an LTL formula. First, a novel approach is developed to divide the state space into finite zonotopes and constrained zonotopes, which are called cells and allowed to intersect with the neighbor cells. Second, from the intersection relation, a graph among all cells is generated to verify the realization of the accepting path for the LTL formula. The realization verification determines if there is a need for the control design, and also results in finite local LTL formulas. Third, once the accepting path is realized, a novel abstraction-based method is derived for the controller design. In particular, we only focus on the cells from the realization verification and approximate each cell thanks to properties of zonotopes. Based on local symbolic models and local LTL formulas, an iterative synthesis algorithm is proposed to design all local abstract controllers, whose existence and combination establish the global controller for the LTL formula. Finally, the proposed framework is illustrated via a path planning problem of mobile robots. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 16 pages, 11 figures

arXiv:2404.16683 [pdf]

Observation of intra-unit-cell superconductivity modulation

Authors: Tianheng Wei, Yanzhao Liu, Wei Ren, Ziqiang Wang, Jian Wang

Abstract: In unconventional high-temperature (high-Tc) superconductors, the symmetry-breaking electronic orders intertwined with the superconductivity provide important clues for understanding the nature of the unconventional pairing mechanism. Recently, an exotic superconducting order showing spatially periodic order parameter modulations and translational symmetry breaking, namely the pair density wave (P… ▽ More In unconventional high-temperature (high-Tc) superconductors, the symmetry-breaking electronic orders intertwined with the superconductivity provide important clues for understanding the nature of the unconventional pairing mechanism. Recently, an exotic superconducting order showing spatially periodic order parameter modulations and translational symmetry breaking, namely the pair density wave (PDW) state, has attracted broad attention. Without breaking translational symmetry, point group symmetry breaking may also induce superconductivity modulations on different atom sites within a single unit cell. However, the intra-unit-cell superconductivity modulation has never been carefully investigated before. Here, using scanning tunneling microscopy/spectroscopy, we report the observation of intra-unit-cell superconductivity modulations in the superconducting gap size and the coherence peak sharpness in monolayer high-Tc Fe(Te,Se) films epitaxially grown on SrTiO3(001) substrates. Further analysis shows that the maxima and minima in the superconductivity modulation are centered at the crystallographic locations of the Te/Se atoms, revealing the breaking of the glide-mirror symmetry of the Te/Se atoms in monolayer high-Tc Fe(Te,Se) films grown on SrTiO3(001). Our findings provide precise microscopic information of superconductivity within the lattice unit cell and indicate that the p-orbital electrons of the Te/Se atoms also play an important role in Cooper pairing in unconventional high-Tc iron-based superconductors. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16452 [pdf, other]

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

Authors: Lihua Jing, Rui Wang, Wenqi Ren, Xin Dong, Cong Zou

Abstract: Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent… ▽ More Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent of their appearance, shape, size, quantity, and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context, while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations, we propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches, compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types, such as localized noise, printable, and naturalistic patches, exhibit notable improvements over state-of-the-art works. Our code is available at https://github.com/Lihua-Jing/PAD. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.09452 [pdf, other]

Python-Based Quantum Chemistry Calculations with GPU Acceleration

Authors: Xiaojie Wu, Qiming Sun, Zhichen Pu, Tianze Zheng, Wenzhi Ma, Wen Yan, Xia Yu, Zhengxiao Wu, Mian Huo, Xiang Li, Weiluo Ren, Sheng Gong, Yumin Zhang, Weihao Gao

Abstract: To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository at \url{https://github.com/pyscf/gpu4pyscf}. This paper outlines the… ▽ More To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository at \url{https://github.com/pyscf/gpu4pyscf}. This paper outlines the primary features, innovations, and advantages of this package. When performing Density Functional Theory (DFT) calculations on modern GPU platforms, GPU4PySCF delivers 30 times speedup over a 32-core CPU node, resulting in approximately 90% cost savings for most DFT tasks. The performance advantages and productivity improvements have been found in multiple industrial applications, such as generating potential energy surfaces, analyzing molecular properties, calculating solvation free energy, identifying chemical reactions in lithium-ion batteries, and accelerating neural-network methods. To make the package easy to extend and integrate with other Python packages, it is designed with PySCF-compatible interfaces and Pythonic implementations. This design choice enhances its coordination with the Python ecosystem. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 32 pages, 14 figures

arXiv:2404.06206 [pdf, other]

Deep Learning Method for Computing Committor Functions with Adaptive Sampling

Authors: Bo Lin, Weiqing Ren

Abstract: The committor function is a central object for quantifying the transitions between metastable states of dynamical systems. Recently, a number of computational methods based on deep neural networks have been developed for computing the high-dimensional committor function. The success of the methods relies on sampling adequate data for the transition, which still is a challenging task for complex sy… ▽ More The committor function is a central object for quantifying the transitions between metastable states of dynamical systems. Recently, a number of computational methods based on deep neural networks have been developed for computing the high-dimensional committor function. The success of the methods relies on sampling adequate data for the transition, which still is a challenging task for complex systems at low temperatures. In this work, we propose a deep learning method with two novel adaptive sampling schemes (I and II). In the two schemes, the data are generated actively with a modified potential where the bias potential is constructed from the learned committor function. We theoretically demonstrate the advantages of the sampling schemes and show that the data in sampling scheme II are uniformly distributed along the transition tube. This makes a promising method for studying the transition of complex systems. The efficiency of the method is illustrated in high-dimensional systems including the alanine dipeptide and a solvated dimer system. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05905 [pdf, other]

Computing Transition Pathways for the Study of Rare Events Using Deep Reinforcement Learning

Authors: Bo Lin, Yangzheng Zhong, Weiqing Ren

Abstract: Understanding the transition events between metastable states in complex systems is an important subject in the fields of computational physics, chemistry and biology. The transition pathway plays an important role in characterizing the mechanism underlying the transition, for example, in the study of conformational changes of bio-molecules. In fact, computing the transition pathway is a challengi… ▽ More Understanding the transition events between metastable states in complex systems is an important subject in the fields of computational physics, chemistry and biology. The transition pathway plays an important role in characterizing the mechanism underlying the transition, for example, in the study of conformational changes of bio-molecules. In fact, computing the transition pathway is a challenging task for complex and high-dimensional systems. In this work, we formulate the path-finding task as a cost minimization problem over a particular path space. The cost function is adapted from the Freidlin-Wentzell action functional so that it is able to deal with rough potential landscapes. The path-finding problem is then solved using a actor-critic method based on the deep deterministic policy gradient algorithm (DDPG). The method incorporates the potential force of the system in the policy for generating episodes and combines physical properties of the system with the learning process for molecular systems. The exploitation and exploration nature of reinforcement learning enables the method to efficiently sample the transition events and compute the globally optimal transition pathway. We illustrate the effectiveness of the proposed method using three benchmark systems including an extended Mueller system and the Lennard-Jones system of seven particles. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.03327 [pdf, other]

DI-Retinex: Digital-Imaging Retinex Theory for Low-Light Image Enhancement

Authors: Shangquan Sun, Wenqi Ren, Jingyang Peng, Fenglong Song, Xiaochun Cao

Abstract: Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex t… ▽ More Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex theory in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the lowlight enhancement problem in an unsupervised manner, we propose an image-adaptive masked reverse degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.01027 [pdf]

Easy-to-configure zero-dimensional valley-chiral modes in a graphene point junction

Authors: Konstantin Davydov, Xi Zhang, Wei Ren, Matthew Coles, Logan Kline, Bryan Zucker, Kenji Watanabe, Takashi Taniguchi, Ke Wang

Abstract: The valley degree of freedom in 2D materials can be manipulated for low-dissipation quantum electronics called valleytronics. At the boundary between two regions of bilayer graphene with different atomic or electrostatic configuration, valley-polarized current has been realized. However, the demanding fabrication and operation requirements limit device reproducibility and scalability toward more a… ▽ More The valley degree of freedom in 2D materials can be manipulated for low-dissipation quantum electronics called valleytronics. At the boundary between two regions of bilayer graphene with different atomic or electrostatic configuration, valley-polarized current has been realized. However, the demanding fabrication and operation requirements limit device reproducibility and scalability toward more advanced valleytronics circuits. We demonstrate a new device architecture of a point junction where a valley-chiral 0D PN junction is easily configured, switchable, and capable of carrying valley current with an estimated polarization of ~80%. This work provides a new building block in manipulating valley quantum numbers and scalable valleytronics. △ Less

Submitted 1 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00996 [pdf]

Charge density wave without long-range structural modulation in canted antiferromagnetic kagome FeGe

Authors: Chenfei Shi, Hanbin Deng, Surya Rohith Kotla, Yi Liu, Sitaram Ramakrishnan, Claudio Eisele, Harshit Agarwal, Leila Noohinejad, Ji-Yong Liu, Tianyu Yang, Guowei Liu, Bishal Baran Maity, Qi Wang, Zhaodi Lin, Baojuan Kang, Wanting Yang, Yongchang Li, Zhihua Yang, Yuke Li, Yanpeng Qi, Arumugam Thamizhavel, Wei Ren, Guang-Han Cao, Jia-Xin Yin, Sander van Smaalen , et al. (2 additional authors not shown)

Abstract: Strongly correlated electron systems with a kagome lattice can host abundant exotic quantum states such as superconductivity and spin/charge density waves (CDW) due to the complicated interactions between different degrees of freedoms in the framework of a unique two-dimensional geometrically frustrated lattice structure. Recently, successive orders of A-type antiferromagnetism (AFM),… ▽ More Strongly correlated electron systems with a kagome lattice can host abundant exotic quantum states such as superconductivity and spin/charge density waves (CDW) due to the complicated interactions between different degrees of freedoms in the framework of a unique two-dimensional geometrically frustrated lattice structure. Recently, successive orders of A-type antiferromagnetism (AFM), $2\times2\times2$ CDW and canted double-cone AFM have been manifested upon cooling in magnetic kagome FeGe. However, the mechanism of the CDW order and its interaction with magnetism are presently enigmatic at best. Here we investigate the evolution of CDW order with temperature across the spin canting transition in FeGe by single-crystal x-ray diffraction. Refinements of its modulated structure are presented using the superspace approach. Interestingly, the superlattice reflections originating from CDW-induced long-range structural modulation become extremely weak after the system enters the canted AFM while a $2\times2$ CDW in the $ab$ plane persists as a long-range order demonstrated by strong electronic modulation in the d$I$/d$V$ map of scanning tunneling spectroscopy. We discovered a novel CDW order without long-range structural modulation in FeGe probably because of the competition between CDW and canted AFM in determining the underlying crystal structure. In addition, occupational modulations of Ge1 atoms located in the kagome plane and displacive modulations of all the atoms were extracted from the refinements, confirming the existence of Ge atom dimerization along the $c$ axis as the major distortion and indicating a dynamic transformation between different CDW domains. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 22 pages, 6 figures. Comments on the manuscript are welcome

arXiv:2404.00519 [pdf]

Electron Collimation in Twisted Bilayer Graphene via Gate-defined Moiré Barriers

Authors: Wei Ren, Xi Zhang, Ziyan Zhu, Moosa Khan, Kenji Watanabe, Takashi Taniguchi, Efthimios Kaxiras, Mitchell Luskin, Ke Wang

Abstract: Electron collimation via a graphene pn-junction allows electrostatic control of ballistic electron trajectories akin to that of an optical circuit. Similar manipulation of novel correlated electronic phases in twisted-bilayer graphene (tBLG) can provide additional probes to the underlying physics and device components towards advanced quantum electronics. In this work, we demonstrate collimation o… ▽ More Electron collimation via a graphene pn-junction allows electrostatic control of ballistic electron trajectories akin to that of an optical circuit. Similar manipulation of novel correlated electronic phases in twisted-bilayer graphene (tBLG) can provide additional probes to the underlying physics and device components towards advanced quantum electronics. In this work, we demonstrate collimation of the electron flow via gate-defined moiré barriers in a tBLG device, utilizing the band-insulator gap of the moiré superlattice. A single junction can be tuned to host a chosen combination of conventional pseudo barrier and moiré tunnel barriers, from which we demonstrate improved collimation efficiency. By measuring transport through two consecutive moiré collimators separated by 1 um, we demonstrate evidence of electron collimation in tBLG in the presence of realistic twist-angle inhomogeneity. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.19306 [pdf, other]

Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points

Authors: Tian Ma, Chuyang Shang, Wanzhu Ren, Yuancheng Li, Jiiayi Yang, Jiali Qian

Abstract: In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output… ▽ More In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output, and propose a method called Sparse Generation to make pseudo labels sparse. It constructs dense tensors through the relationship between data and detector model, optimizes three of its parameters, and obtains a sparse tensor via coordinated calculation, thereby indirectly obtaining higher quality pseudo labels, and solving the model's density problem in the situation of only a small amount of supervised annotation data can be used. On two broadly used open-source datasets (RSOD, SIMD) and a self-built dataset (Bullet-Hole), the experimental results showed that the proposed method has a significant advantage in terms of overall performance metrics, comparing to that state-of-the-art method. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.17133 [pdf, other]

RIOJA. Complex Dusty Starbursts in a Major Merger B14-65666 at z=7.15

Authors: Yuma Sugahara, Javier Álvarez-Márquez, Takuya Hashimoto, Luis Colina, Akio K. Inoue, Luca Costantin, Yoshinobu Fudamoto, Ken Mawatari, Yi W. Ren, Santiago Arribas, Tom J. L. C. Bakx, Carmen Blanco-Prieto, Daniel Ceverino, Alejandro Crespo Gómez, Masato Hagimoto, Takeshi Hashigaya, Rui Marques-Chaves, Hiroshi Matsuo, Yurina Nakazato, Miguel Pereira-Santaella, Yoichi Tamura, Mitsutaka Usui, Naoki Yoshida

Abstract: We present JWST NIRCam imaging of B14-65666 ("Big Three Dragons"), a bright Lyman-break galaxy system ($M_\text{UV}=-22.5$ mag) at $z=7.15$. The high angular resolution of NIRCam reveals the complex morphology of two galaxy components: galaxy E has a compact core (E-core), surrounded by diffuse, extended, rest-frame optical emission, which is likely to be tidal tails; and galaxy W has a clumpy and… ▽ More We present JWST NIRCam imaging of B14-65666 ("Big Three Dragons"), a bright Lyman-break galaxy system ($M_\text{UV}=-22.5$ mag) at $z=7.15$. The high angular resolution of NIRCam reveals the complex morphology of two galaxy components: galaxy E has a compact core (E-core), surrounded by diffuse, extended, rest-frame optical emission, which is likely to be tidal tails; and galaxy W has a clumpy and elongated morphology with a blue UV slope ($β_\text{UV}=-2.2\pm0.1$). The flux excess, F356W$-$F444W, peaks at the E-core ($1.05^{+0.08}_{-0.09}$ mag), tracing the presence of strong [OIII] 4960,5008 Å emission. ALMA archival data show that the bluer galaxy W is brighter in dust continua than the redder galaxy E, while the tails are bright in [OIII] 88 $\mathrm{μm}$. The UV/optical and sub-mm SED fitting confirms that B14-65666 is a major merger in a starburst phase as derived from the stellar mass ratio (3:1 to 2:1) and the star-formation rate, $\simeq1$ dex higher than the star-formation main sequence at the same redshift. The galaxy E is a dusty ($A_\text{V}=1.2\pm0.1$ mag) starburst with a possible high dust temperature ($\ge63$-$68$ K). The galaxy W would have a low dust temperature ($\le27$-$33$ K) or patchy stellar-and-dust geometry, as suggested from the infrared excess (IRX) and $β_\text{UV}$ diagram. The high optical-to-FIR [OIII] line ratio of the E-core shows its lower gas-phase metallicity ($\simeq0.2$ Z$_{\odot}$) than the galaxy W. These results agree with a scenario where major mergers disturb morphology and induce nuclear dusty starbursts triggered by less-enriched inflows. B14-65666 shows a picture of complex stellar buildup processes during major mergers in the epoch of reionization. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 18 pages, 6 figures, 4 tables. Submitted to ApJ

arXiv:2403.14468 [pdf, other]

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

Authors: Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

Abstract: In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these… ▽ More In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these methods frequently rely on textual input as the editing guidance, leading to ambiguities and limiting the types of edits they can perform. Recognizing these challenges, we introduce AnyV2V, a novel tuning-free paradigm designed to simplify video editing into two primary steps: (1) employing an off-the-shelf image editing model to modify the first frame, (2) utilizing an existing image-to-video generation model to generate the edited video through temporal feature injection. AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods. AnyV2V can also support any video length. Our evaluation indicates that AnyV2V significantly outperforms other baseline methods in automatic and human evaluations by significant margin, maintaining visual consistency with the source video while achieving high-quality edits across all the editing tasks. △ Less

Submitted 10 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: preprint

arXiv:2403.10336 [pdf, other]

How Powerful Potential of Attention on Image Restoration?

Authors: Cong Wang, Jinshan Pan, Yeying Jin, Liyan Wang, Wei Wang, Gang Fu, Wenqi Ren, Xiaochun Cao

Abstract: Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that… ▽ More Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that FFNs are key-value memories \cite{geva2020transformer}, which are vital in modern Transformer architectures. In this paper, we conduct an empirical study to explore the potential of attention mechanisms without using FFN and provide novel structures to demonstrate that removing FFN is flexible for image restoration. Specifically, we propose Continuous Scaling Attention (\textbf{CSAttn}), a method that computes attention continuously in three stages without using FFN. To achieve competitive performance, we propose a series of key components within the attention. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance. We apply our \textbf{CSAttn} to several image restoration tasks and show that our model can outperform CNN-based and Transformer-based image restoration approaches. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09036 [pdf, other]

Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

Authors: Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong

Abstract: In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients fr… ▽ More In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients from different negative classes. Therefore, we propose the Gradient-Aware Logit Adjustment (GALA) loss, which adjusts the logits based on accumulated gradients to balance the optimization process. Additionally, We find that most of the solutions to long-tailed problems are still biased towards head classes in the end, and we propose a simple and post hoc prediction re-balancing strategy to further mitigate the basis toward head class. Extensive experiments are conducted on multiple popular long-tailed recognition benchmark datasets to evaluate the effectiveness of these two designs. Our approach achieves top-1 accuracy of 48.5\%, 41.4\%, and 73.3\% on CIFAR100-LT, Places-LT, and iNaturalist, outperforming the state-of-the-art method GCL by a significant margin of 3.62\%, 0.76\% and 1.2\%, respectively. Code is available at https://github.com/lt-project-repository/lt-project. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 5 pages, 2 figures. Accepted by icassp 2024, see https://cmsworkshops.com/ICASSP2024/papers/accepted_papers.php by searching this paper title

arXiv:2403.07969 [pdf, other]

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Authors: Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Abstract: In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code… ▽ More In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes, with which complex schema information, such as constraints among tasks in UIE, can be captured in an LLM-friendly manner. We further construct a code-style schema library covering over $\textbf{30,000}$ types of knowledge, which is the largest one for UIE, to the best of our knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase learning framework that enhances its schema understanding ability via code pretraining and its schema following ability via instruction tuning. After code pretraining on around $1.5$B automatically constructed data, KnowCoder already attains remarkable generalization ability and achieves relative improvements by $\textbf{49.8%}$ F1, compared to LLaMA2, under the few-shot setting. After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12.5%}$ and $\textbf{21.9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively. Additionally, based on our unified schema representations, various human-annotated datasets can simultaneously be utilized to refine KnowCoder, which achieves significant improvements up to $\textbf{7.5%}$ under the supervised setting. △ Less

Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05906 [pdf, other]

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Authors: Jingyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao

Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing… ▽ More Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 13 pages, 10 figures, conference or other essential info

arXiv:2403.04562 [pdf, other]

Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes

Authors: Stamatios Georgoulis, Weining Ren, Alfredo Bochicchio, Daniel Eckert, Yuanyou Li, Abel Gawel

Abstract: Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Contemporary RGB camera-based methods rely on modeling camera and scene properties however, are often under-constrained and fall short in unknown categories. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demon… ▽ More Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Contemporary RGB camera-based methods rely on modeling camera and scene properties however, are often under-constrained and fall short in unknown categories. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments with simplified dynamic objects. This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too. To this end, we introduce a novel divide-and-conquer pipeline that combines: (a) ego-motion compensated events, computed via a scene understanding module that predicts monocular depth and camera pose as auxiliary tasks, and (b) optical flow from a dedicated optical flow module. These intermediate representations are then fed into a segmentation module that predicts motion segmentation masks. A novel transformer-based temporal attention module in the segmentation module builds correlations across adjacent 'frames' to get temporally consistent segmentation masks. Our method sets the new state-of-the-art on the classic EV-IMO benchmark (indoors), where we achieve improvements of 2.19 moving object IoU (2.22 mIoU) and 4.52 point IoU respectively, as well as on a newly-generated motion segmentation and tracking benchmark (outdoors) based on the DSEC event dataset, termed DSEC-MOTS, where we show improvement of 12.91 moving object IoU. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 3DV 2024, the first two authors contributed equally

arXiv:2403.04391 [pdf, other]

doi 10.1103/PhysRevLett.132.066501

$π$ Phase Interlayer Shift and Stacking Fault in the Kagome Superconductor CsV$_3$Sb$_5$

Authors: Feng Jin, Wei Ren, Mingshu Tan, Mingtai Xie, Bingru Lu, Zheng Zhang, Jianting Ji, Qingming Zhang

Abstract: The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- a… ▽ More The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- and temperature-dependent Raman measurements reveal the breaking of C$_6$ rotational symmetry and the presence of three distinct domains oriented at approximately 120°to each other. The observations demonstrate that the CDW phase can be naturally explained as a 2c staggered order phase with adjacent layers exhibiting a relative $π$ phase shift. Further, we discover a first-order structural phase transition at approximately 65 K and suggest that it is a stacking order-disorder phase transition due to stacking fault, supported by the thermal hysteresis behavior of a Cs-related phonon mode. Our findings highlight the significance of the stacking degree of freedom in CsV$_3$Sb$_5$ and offer structural insights to comprehend the entanglement between superconductivity and CDW. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: This manuscript was published in Phys. Rev. Lett

Journal ref: Physical Review Letters 132, 066501 (2024)

arXiv:2403.01427 [pdf, other]

Logit Standardization in Knowledge Distillation

Authors: Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, Xiaochun Cao

Abstract: Knowledge distillation involves transferring soft labels from a teacher to a student using a shared temperature-based softmax function. However, the assumption of a shared temperature between teacher and student implies a mandatory exact match between their logits in terms of logit range and variance. This side-effect limits the performance of student, considering the capacity discrepancy between… ▽ More Knowledge distillation involves transferring soft labels from a teacher to a student using a shared temperature-based softmax function. However, the assumption of a shared temperature between teacher and student implies a mandatory exact match between their logits in terms of logit range and variance. This side-effect limits the performance of student, considering the capacity discrepancy between them and the finding that the innate logit relations of teacher are sufficient for student to learn. To address this issue, we propose setting the temperature as the weighted standard deviation of logit and performing a plug-and-play Z-score pre-process of logit standardization before applying softmax and Kullback-Leibler divergence. Our pre-process enables student to focus on essential logit relations from teacher rather than requiring a magnitude match, and can improve the performance of existing logit-based distillation methods. We also show a typical case where the conventional setting of sharing temperature between teacher and student cannot reliably yield the authentic distillation evaluation; nonetheless, this challenge is successfully alleviated by our Z-score. We extensively evaluate our method for various student and teacher models on CIFAR-100 and ImageNet, showing its significant superiority. The vanilla knowledge distillation powered by our pre-process can achieve favorable performance against state-of-the-art methods, and other distillation variants can obtain considerable gain with the assistance of our pre-process. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 10 pages, 5 figures, accepted by The The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2024)

arXiv:2402.19274 [pdf, other]

Mixed-halide perovskite alloys $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$: New insight of configuration entropy effect from first principles and phase diagrams

Authors: Fang Pan, Junni Zhai, Jinyu Chen, Lin Yang, Hua Dong, Fang Yuan, Zhuangde Jiang, Wei Ren, Zuo-Guang Ye, Guo-Xu Zhang, Jingrui Li

Abstract: Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of co… ▽ More Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of configuration and vibration entropies. Allowed by the $20$-atomic models for the $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$ series, the partition functions and therewith thermodynamic state functions are calculated by traversing all possible mixed-halide configurations. We can thus evaluate the temperature- and system-dependent configuration entropy, which largely corrects the conventional approach based on the ideal solution model. Finally, temperature-composition phase diagrams that include $α$, $β$, $γ$ and $δ$ phases of both alloys are constructed based on the free energy data, for which the contribution of phonon vibrations is included. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18865 [pdf, other]

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Authors: Weijieying Ren, Xinlong Li, Lei Wang, Tianxiang Zhao, Wei Qin

Abstract: Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between l… ▽ More Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between learning plasticity and memory stability. Plenty of existing works have explored strategies like memory replay, regularization and parameter isolation, but little is known about the geometric connection of various adjacent minima in the continual LLMs fine-tuning scenarios. In this work, we investigate the geometric connections of different minima through the lens of mode connectivity, which means different minima can be connected by a low-loss valley. Through extensive experiments, we uncover the mode connectivity phenomenon in the LLMs continual learning scenario and find that it can strike a balance between plasticity and stability. Building upon these findings, we propose a simple yet effective method called Interpolation-based LoRA (I-LoRA), which constructs a dual-memory experience replay framework based on LoRA parameter interpolations. Extensive experiments and analysis on eight domain-specific CL benchmarks demonstrate that I-LoRA consistently show significant improvement over the previous state-of-the-art approaches with up to $11\%$ performance gains, providing a strong baseline and insights for future research on the large language model continual learning problem. Our code is available at \url{https://github.com/which47/LLMCL}. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.16671 [pdf, other]

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Authors: Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

Abstract: Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (… ▽ More Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks. Furthermore, StructLM demonstrates strong generalization across 6 novel held-out SKG tasks, outperforming TableLlama by an average of 35\% and Flan-UL2 20B by an average of 10\%. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level. △ Less

Submitted 24 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Technical Report

arXiv:2402.04324 [pdf, other]

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Authors: Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen

Abstract: Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrati… ▽ More Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrative. To mitigate these issues, we propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos. We also extend the proposed approaches to show their potential to improve consistency in auto-regressive long video generation and camera motion control. To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation. Our automatic and human evaluation results demonstrate the superiority of ConsistI2V over existing methods. △ Less

Submitted 30 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: Project Page: https://tiger-ai-lab.github.io/ConsistI2V/

arXiv:2401.16527 [pdf, other]

doi 10.1021/acsami.4c02097

Ultra-low glassy thermal conductivity and controllable, promising thermoelectric properties in crystalline o-CsCu5S3

Authors: Jincheng Yue, Jiongzhi Zheng, Junda Li, Siqi Guo, Wenling Ren, Han Liu, Yanhui Liu, Tian Cui

Abstract: We thoroughly investigate the microscopic mechanisms of the thermal transport in orthorhombic \textit{o}-CsCu$_5$S$_3$ by integrating the first-principles-based self-consistent phonon calculations (SCP) with the linearized Wigner transport equation (LWTE). Our methodology takes into account contributions to phonon energy shifts and phonon scattering rates from both three- and four-phonon processes… ▽ More We thoroughly investigate the microscopic mechanisms of the thermal transport in orthorhombic \textit{o}-CsCu$_5$S$_3$ by integrating the first-principles-based self-consistent phonon calculations (SCP) with the linearized Wigner transport equation (LWTE). Our methodology takes into account contributions to phonon energy shifts and phonon scattering rates from both three- and four-phonon processes. Additionally, it incorporates the off-diagonal terms of heat flux operators to calculate the total thermal conductivity. The predicted $κ_\mathrm{L}$ with an extremely weak temperature dependence following $\sim T^{-0.33}$, in good agreement with experimental values along with the parallel to the Bridgman growth direction. Such nonstandard temperature dependence of $κ_\mathrm{L}$ can be traced back to the dual particlelike-wavelike behavior exhibited by thermal phonons. Specifically, the coexistence of the stochastic oscillation of Cs atoms and metavalent bonding among interlayer Cu-S atoms limits the particle-like phonon propagation and enhances the wave-like tunneling of phonons. Simultaneously, the electrical transport properties are determined by employing a precise momentum relaxation-time approximation (MRTA) within the framework of the linearized Boltzmann transport equation (LBTE). By properly adjusting the carrier concentration, excellent thermoelectric performance is achieved, with a maximum thermoelectric conversion efficiency of 18.4$\%$ observed at 800 K in \textit{p}-type \textit{o}-CsCu$_5$S$_3$.} Our work not only elucidates the anomalous thermal transport behavior in the copper-based chalcogenide \textit{o}-CsCu$_5$S$_3$ but also provides insights for manipulating its thermal and electronic properties for potential thermoelectric applications. △ Less

Submitted 15 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15896 [pdf, other]

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

Authors: Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang

Abstract: Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed a… ▽ More Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed at enhancing multimodal foundation models to well understand images in both languages. To handle such a scale of dataset, we propose a novel grouped aggregation approach for image-text contrastive loss computation, which reduces the communication overhead and GPU memory demands significantly, facilitating a 60% increase in training speed. We pretrain a series of bilingual image-text foundation models with an enhanced fine-grained understanding ability on BM-6B, the resulting models, dubbed as $M^2$-Encoders (pronounced "M-Square"), set new benchmarks in both languages for multimodal retrieval and classification tasks. Notably, Our largest $M^2$-Encoder-10B model has achieved top-1 accuracies of 88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification setting, surpassing previously reported SoTA methods by 2.2% and 21.1%, respectively. The $M^2$-Encoder series represents one of the most comprehensive bilingual image-text foundation models to date, so we are making it available to the research community for further exploration and development. △ Less

Submitted 3 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.11677 [pdf, ps, other]

Emulation-based Stabilization for Networked Control Systems with Stochastic Channels

Authors: Wei Ren, Wei Wang, Zhuo-Rui Pan, Xi-Ming Sun, Andrew R. Teel, Dragan Nesic

Abstract: This paper studies the stabilization problem of networked control systems (NCSs) with random packet dropouts caused by stochastic channels. To describe the effects of stochastic channels on the information transmission, the transmission times are assumed to be deterministic, whereas the packet transmission is assumed to be random. We first propose a stochastic scheduling protocol to model random p… ▽ More This paper studies the stabilization problem of networked control systems (NCSs) with random packet dropouts caused by stochastic channels. To describe the effects of stochastic channels on the information transmission, the transmission times are assumed to be deterministic, whereas the packet transmission is assumed to be random. We first propose a stochastic scheduling protocol to model random packet dropouts, and address the properties of the proposed stochastic scheduling protocol. The proposed scheduling protocol provides a unified modelling framework for a general class of random packet dropouts due to different stochastic channels. Next, the proposed scheduling protocol is embedded into the closed-loop system, which leads to a stochastic hybrid model for NCSs with random packet dropouts. Based on this stochastic hybrid model, we follow the emulation approach to establish sufficient conditions to guarantee uniform global asymptotical stability in probability. In particular, an upper bound on the maximally allowable transmission interval is derived explicitly for all stochastic protocols satisfying Lyapunov conditions that guarantee uniform global asymptotic stability in probability. Finally, two numerical examples are presented to demonstrate the derived results. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 12 pages, 4 figures, accepted

arXiv:2401.10666 [pdf, other]

MixNet: Towards Effective and Efficient UHD Low-Light Image Enhancement

Authors: Chen Wu, Zhuoran Zheng, Xiuyi Jia, Wenqi Ren

Abstract: With the continuous advancement of imaging devices, the prevalence of Ultra-High-Definition (UHD) images is rising. Although many image restoration methods have achieved promising results, they are not directly applicable to UHD images on devices with limited computational resources due to the inherently high computational complexity of UHD images. In this paper, we focus on the task of low-light… ▽ More With the continuous advancement of imaging devices, the prevalence of Ultra-High-Definition (UHD) images is rising. Although many image restoration methods have achieved promising results, they are not directly applicable to UHD images on devices with limited computational resources due to the inherently high computational complexity of UHD images. In this paper, we focus on the task of low-light image enhancement (LLIE) and propose a novel LLIE method called MixNet, which is designed explicitly for UHD images. To capture the long-range dependency of features without introducing excessive computational complexity, we present the Global Feature Modulation Layer (GFML). GFML associates features from different views by permuting the feature maps, enabling efficient modeling of long-range dependency. In addition, we also design the Local Feature Modulation Layer (LFML) and Feed-forward Layer (FFL) to capture local features and transform features into a compact representation. This way, our MixNet achieves effective LLIE with few model parameters and low computational complexity. We conducted extensive experiments on both synthetic and real-world datasets, and the comprehensive results demonstrate that our proposed method surpasses the performance of current state-of-the-art methods. The code will be available at \url{https://github.com/zzr-idam/MixNet}. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.05676 [pdf, other]

Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Authors: Weibo Jiang, Weihong Ren, Jiandong Tian, Liangqiong Qu, Zhiyong Wang, Honghai Liu

Abstract: Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregati… ▽ More Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05667 [pdf, other]

EsaCL: Efficient Continual Learning of Sparse Models

Authors: Weijieying Ren, Vasant G Honavar

Abstract: A key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks. Many existing approaches to this problem work by either retraining the model on previous tasks or by expanding the model to accommodate new tasks. However, these approaches typically suffer from increased storage and computational requirements, a… ▽ More A key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks. Many existing approaches to this problem work by either retraining the model on previous tasks or by expanding the model to accommodate new tasks. However, these approaches typically suffer from increased storage and computational requirements, a problem that is worsened in the case of sparse models due to need for expensive re-training after sparsification. To address this challenge, we propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power, and circumvent the need of retraining. We conduct a theoretical analysis of loss landscapes with parameter pruning, and design a directional pruning (SDP) strategy that is informed by the sharpness of the loss function with respect to the model parameters. SDP ensures model with minimal loss of predictive accuracy, accelerating the learning of sparse models at each stage. To accelerate model update, we introduce an intelligent data selection (IDS) strategy that can identify critical instances for estimating loss landscape, yielding substantially improved data efficiency. The results of our experiments show that EsaCL achieves performance that is competitive with the state-of-the-art methods on three continual learning benchmarks, while using substantially reduced memory and computational resources. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: SDM 2024 : SIAM International Conference on Data Mining

arXiv:2401.05212 [pdf, other]

Outflow-related radio emission in radio-quiet quasars

Authors: Mai Liao, Junxian Wang, Wenke Ren, Minhua Zhou

Abstract: In this work, we revisit the relationship between [O III] line width $w_{\rm 90}$ (as the indicator of AGN outflow velocity) and the radio emission in RQQs by employing a large sample of Type I quasars ($\sim 37,000$) selected from the Sloan Digital Sky Survey (SDSS) Data Release Sixteen. By median stacking the radio images (to include the dominant fraction of individually radio non-detected RQQs)… ▽ More In this work, we revisit the relationship between [O III] line width $w_{\rm 90}$ (as the indicator of AGN outflow velocity) and the radio emission in RQQs by employing a large sample of Type I quasars ($\sim 37,000$) selected from the Sloan Digital Sky Survey (SDSS) Data Release Sixteen. By median stacking the radio images (to include the dominant fraction of individually radio non-detected RQQs) of Karl G. Jansky Very Large Array (VLA) Sky Survey (VLASS) for subsamples of RQQs with different $w_{\rm 90}$, our study demonstrates that, the correlation between $w_{\rm 90}$ and radio emission in our SDSS RQQs is significant, and remains solid after controlling the effects of black hole mass, quasar luminosity, Eddington ratio and redshift. This intrinsic link supports that the [O III] outflows in quasars, most likely resulted from wide-angled sub-relativistic quasar winds launched from the accretion disc, could make a dominant contribution to radio emission in the general RQQs. Alternatively, the correlation may be attributed to low-power jets in RQQs if they are ubiquitous and could efficiently enhance the [O III] width through interacting with the ISM. Meanwhile, the star-formation rates traced by the flux ratio of [Ne V]/[O II] emission lines display no dependence on $w_{\rm 90}$ after controlling the effects of black hole mass, quasar luminosity, Eddington ratio and redshift. This suggests that the stronger radio emission in RQQs with larger $w_{\rm 90}$ could not be attributed to outflow enhanced (positive feedback) star formation in the hosts. However, this also indicates the outflows, though exhibiting robust correlation with radio power, produce neither positive nor negative feedback to the star formation in their hosts. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 9 pages, 4 figures, accepted by MNRAS

arXiv:2401.03854 [pdf, other]

TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment

Authors: Jiquan Yuan, Xinyan Cao, Jinming Che, Qinyuan Wang, Sen Liang, Wei Ren, Jinlong Lin, Xixin Cao

Abstract: Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images (AIGIs) from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are typically generated by ge… ▽ More Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images (AIGIs) from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are typically generated by generative models using text prompts. Considerable efforts have been made in the past years to advance AIGCIQA. However, most existing AIGCIQA methods regress predicted scores directly from individual generated images, overlooking the information contained in the text prompts of these images. This oversight partially limits the performance of these AIGCIQA methods. To address this issue, we propose a text-image encoder-based regression (TIER) framework. Specifically, we process the generated images and their corresponding text prompts as inputs, utilizing a text encoder and an image encoder to extract features from these text prompts and generated images, respectively. To demonstrate the effectiveness of our proposed TIER method, we conduct extensive experiments on several mainstream AIGCIQA databases, including AGIQA-1K, AGIQA-3K, and AIGCIQA2023. The experimental results indicate that our proposed TIER method generally demonstrates superior performance compared to baseline in most cases. △ Less

Submitted 11 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:2312.05897

arXiv:2401.00529 [pdf, other]

GraphGPT: Graph Learning with Generative Pre-trained Transformers

Authors: Qifang Zhao, Weidong Ren, Tianyu Li, Xiaoxiao Xu, Hong Liu

Abstract: We introduce \textit{GraphGPT}, a novel model for Graph learning by self-supervised Generative Pre-training Transformers. Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes reversibly using the Eulerian path first. Then we feed the tokens into a standard transformer decoder and pre-train it with the next-token-prediction (NTP) t… ▽ More We introduce \textit{GraphGPT}, a novel model for Graph learning by self-supervised Generative Pre-training Transformers. Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes reversibly using the Eulerian path first. Then we feed the tokens into a standard transformer decoder and pre-train it with the next-token-prediction (NTP) task. Lastly, we fine-tune the GraphGPT model with the supervised tasks. This intuitive, yet effective model achieves superior or close results to the state-of-the-art methods for the graph-, edge- and node-level tasks on the large scale molecular dataset PCQM4Mv2, the protein-protein association dataset ogbl-ppa and the ogbn-proteins dataset from the Open Graph Benchmark (OGB). Furthermore, the generative pre-training enables us to train GraphGPT up to 400M+ parameters with consistently increasing performance, which is beyond the capability of GNNs and previous graph transformers. The source code and pre-trained checkpoints will be released soon\footnote{\url{https://github.com/alibaba/graph-gpt}} to pave the way for the graph foundation model research, and also to assist the scientific discovery in pharmaceutical, chemistry, material and bio-informatics domains, etc. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 9 pages

arXiv:2312.14983 [pdf, other]

A Literature Review of Energy Justice

Authors: Weihang Ren, Yongpei Guan, Feng Qiu, Todd Levin, Miguel Heleno

Abstract: Energy justice, at the intersection of energy and societal ethics, studies the origins, quantification, and resolution of persistent and potential inequities within the energy sector, serving as a foundational pillar for societal harmony. In this review, we overview the historical and modern definitions of energy equity and frameworks of energy justice. We highlight the tools adopted to measure eq… ▽ More Energy justice, at the intersection of energy and societal ethics, studies the origins, quantification, and resolution of persistent and potential inequities within the energy sector, serving as a foundational pillar for societal harmony. In this review, we overview the historical and modern definitions of energy equity and frameworks of energy justice. We highlight the tools adopted to measure equity in the energy context, unveiling multifaceted inequities that permeate global energy landscapes. We discuss the limitations of prevalent metrics such as the Gini coefficient and Generalized Entropy Indices in the evaluation of energy justice concerns. Finally, we analyze publications that examined current practices and proposed improving methods towards a more equitable energy market for the society from policy, planning, and operation perspectives. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Showing 1–50 of 430 results for author: Ren, W