-
Restoring Images in Adverse Weather Conditions via Histogram Transformer
Authors:
Shangquan Sun,
Wenqi Ren,
Xinwei Gao,
Rui Wang,
Xiaochun Cao
Abstract:
Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly…
▽ More
Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly cause similar occlusion and brightness, in this work, we propose an efficient Histogram Transformer (Histoformer) for restoring images affected by adverse weather. It is powered by a mechanism dubbed histogram self-attention, which sorts and segments spatial features into intensity-based bins. Self-attention is then applied across bins or within each bin to selectively focus on spatial features of dynamic range and process similar degraded pixels of the long range together. To boost histogram self-attention, we present a dynamic-range convolution enabling conventional convolution to conduct operation over similar pixels rather than neighbor pixels. We also observe that the common pixel-wise losses neglect linear association and correlation between output and ground-truth. Thus, we propose to leverage the Pearson correlation coefficient as a loss function to enforce the recovered pixels following the identical order as ground-truth. Extensive experiments demonstrate the efficacy and superiority of our proposed method. We have released the codes in Github.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
CountFormer: Multi-View Crowd Counting Transformer
Authors:
Hong Mo,
Xiong Zhang,
Jianchao Tan,
Cheng Yang,
Qiong Gu,
Bo Hang,
Wenqi Ren
Abstract:
Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise…
▽ More
Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise 3D MVC framework called \textbf{CountFormer}to elevate multi-view image-level features to a scene-level volume representation and estimate the 3D density map based on the volume features. By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features, enabling it to handle various camera layouts with significant differences.Furthermore, we introduce a feature lifting module capitalized on the attention mechanism to transform image-level features into a 3D volume representation for each camera view. Subsequently, the multi-view volume aggregation module attentively aggregates various multi-view volumes to create a comprehensive scene-level volume representation, allowing CountFormer to handle images captured by arbitrary dynamic camera layouts. The proposed method performs favorably against the state-of-the-art approaches across various widely used datasets, demonstrating its greater suitability for real-world deployment compared to conventional MVC frameworks.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Deep learning quantum Monte Carlo for solids
Authors:
Yubing Qian,
Xiang Li,
Zhe Li,
Weiluo Ren,
Ji Chen
Abstract:
Deep learning has deeply changed the paradigms of many research fields. At the heart of chemical and physical sciences is the accurate ab initio calculation of many-body wavefunction, which has become one of the most notable examples to demonstrate the power of deep learning in science. In particular, the introduction of deep learning into quantum Monte Carlo (QMC) has significantly advanced the f…
▽ More
Deep learning has deeply changed the paradigms of many research fields. At the heart of chemical and physical sciences is the accurate ab initio calculation of many-body wavefunction, which has become one of the most notable examples to demonstrate the power of deep learning in science. In particular, the introduction of deep learning into quantum Monte Carlo (QMC) has significantly advanced the frontier of ab initio calculation, offering a universal tool to solve the electronic structure of materials and molecules. Deep learning QMC architectures were initial designed and tested on small molecules, focusing on comparisons with other state-of-the-art ab initio methods. Methodological developments, including extensions to real solids and periodic models, have been rapidly progressing and reported applications are fast expanding. This review covers the theoretical foundation of deep learning QMC for solids, the neural network wavefunction ansatz, and various of other methodological developments. Applications on computing energy, electron density, electric polarization, force and stress of real solids are also reviewed. The methods have also been extended to other periodic systems and finite temperature calculations. The review highlights the potentials and existing challenges of deep learning QMC in materials chemistry and condensed matter physics.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Effects of model size in density-functional-theory study of alloys: A case study of CsPbBr$_2$Cl
Authors:
Fang Pan,
Lin Yang,
Zhuangde Jiang,
Wei Ren,
Zuo-Guang Ye,
Jingrui Li
Abstract:
The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distr…
▽ More
The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distribution of alloy formation energy becomes narrower when the size of the model system increases along $\sqrt{2}\times\sqrt{2}\times2$, $2\times2\times2$, and $2\sqrt{2}\times2\sqrt{2}\times2$ models. This is primarily because the distribution of Br distribution parameters, which plays a leading role in determining the formation energy range, is more narrow for larger models. As a result, larger entropy stability effect can be observed with larger models especially at high temperatures, for which the approximation using mixing entropy based on the ideal solution model becomes better.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Prior-Informed AGN-Host Spectral Decomposition Using PyQSOFit
Authors:
Wenke Ren,
Hengxiao Guo,
Yue Shen,
John D. Silverman,
Colin J. Burke,
Shu Wang,
Junxian Wang
Abstract:
We introduce an improved method for decomposing the emission of active galactic nuclei (AGN) and their host galaxies using templates from principal component analysis (PCA). This approach integrates prior information from PCA with a penalized pixel fitting mechanism which improves the precision and effectiveness of the decomposition process. Specifically, we have reduced the degeneracy and over-fi…
▽ More
We introduce an improved method for decomposing the emission of active galactic nuclei (AGN) and their host galaxies using templates from principal component analysis (PCA). This approach integrates prior information from PCA with a penalized pixel fitting mechanism which improves the precision and effectiveness of the decomposition process. Specifically, we have reduced the degeneracy and over-fitting in AGN-host decomposition, particularly for those with low signal-to-noise ratios (SNR), where traditional methods tend to fail. By applying our method to 76,565 SDSS Data Release 16 quasars with $z<0.8$, we achieve a success rate of $\approx$ 94%, thus establishing the largest host-decomposed spectral catalog of quasars to date. Our fitting results consider the impact of the host galaxy on the overestimation of the AGN luminosity and black hole mass ($M_{\rm BH}$). Furthermore, we obtained stellar velocity dispersion ($σ_*$) measurements for 4,137 quasars. The slope of the $M_{\rm BH}-σ_*$ relation in this subsample is generally consistent with previous quasar studies beyond the local universe. Our method provides a robust and efficient approach to disentangle the AGN and host galaxy components across a wide range of SNRs and redshifts.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
A surprising excess of radio emission in extremely stable quasars: a unique clue to jet launching?
Authors:
Wen-Yong Kang,
Jun-Xian Wang,
Zhen-Yi Cai,
Hao-Chen Wang,
Wen-Ke Ren,
Mai Liao,
Feng Yuan,
Andrzej Zdziarski,
Xinwu Cao
Abstract:
Quasars are generally divided into jetted radio-loud and non-jetted radio-quiet ones, but why only 10% quasars are radio loud has been puzzling for decades. Other than jet-induced-phenomena, black hole mass, or Eddington ratio, prominent difference between jetted and non-jetted quasars has scarcely been detected. Here we show a unique distinction between them and the mystery of jet launching could…
▽ More
Quasars are generally divided into jetted radio-loud and non-jetted radio-quiet ones, but why only 10% quasars are radio loud has been puzzling for decades. Other than jet-induced-phenomena, black hole mass, or Eddington ratio, prominent difference between jetted and non-jetted quasars has scarcely been detected. Here we show a unique distinction between them and the mystery of jet launching could be disclosed by a prominent excess of radio emission in extremely stable quasars (ESQs, i.e., type 1 quasars with extremely weak variability in UV/optical over 10 years). Specifically, we find that $>$ 25% of the ESQs are detected by the FIRST/VLASS radio survey, while only $\sim$ 6-8% of the control sample, matched in redshift, luminosity, and Eddington ratio, are radio-detected. The excess of radio detection in ESQs has a significance of 4.4 $σ$ (99.9995%), and dominantly occurs at intermediate radio loudness with R $\sim$ 10 - 60. The radio detection fraction of ESQs also tends to increase in the ESQ samples selected with more stringent thresholds. Our results are in contrast to the common view that RL quasars are likely more variable in UV/optical due to jet contribution. New clues/challenge posed by our findings highlight the importance of extensive follow-up observations to probe the nature of jets in ESQs, and theoretical studies on the link between jet launching and ESQs. Moreover, our results makes ESQs, an essential population which has never been explored, unique targets in the burgeoning era of time domain astronomy, like their opposite counterparts of quasars exhibiting extreme variability or changing-look features.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Emergent Wigner phases in moiré superlattice from deep learning
Authors:
Xiang Li,
Yubing Qian,
Weiluo Ren,
Yang Xu,
Ji Chen
Abstract:
Moiré superlattice designed in stacked van der Waals material provides a dynamic platform for hosting exotic and emergent condensed matter phenomena. However, the relevance of strong correlation effects and the large size of moiré unit cells pose significant challenges for traditional computational techniques. To overcome these challenges, we develop an unsupervised deep learning approach to uncov…
▽ More
Moiré superlattice designed in stacked van der Waals material provides a dynamic platform for hosting exotic and emergent condensed matter phenomena. However, the relevance of strong correlation effects and the large size of moiré unit cells pose significant challenges for traditional computational techniques. To overcome these challenges, we develop an unsupervised deep learning approach to uncover electronic phases emerging from moiré systems based on variational optimization of neural network many-body wavefunction. Our approach has identified diverse quantum states, including novel phases such as generalized Wigner crystals, Wigner molecular crystals, and previously unreported Wigner covalent crystals. These discoveries provide insights into recent experimental studies and suggest new phases for future exploration. They also highlight the crucial role of spin polarization in determining Wigner phases. More importantly, our proposed deep learning approach is proven general and efficient, offering a powerful framework for studying moiré physics.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Systematic Collapse of the Accretion Disc Across the Supermassive Black Hole Population
Authors:
Scott Hagen,
Chris Done,
John D. Silverman,
Junyao Li,
Teng Liu,
Wenke Ren,
Johannes Buchner,
Andrea Merloni,
Tohru Nagao,
Mara Salvato
Abstract:
The structure of the accretion flow onto supermassive black holes (SMBH) is not well understood. Standard disc models match to zeroth order in predicting substantial energy dissipation within optically-thick material producing a characteristic strong blue/UV continuum. However they fail at reproducing more detailed comparisons to the observed spectral shapes along with their observed variability.…
▽ More
The structure of the accretion flow onto supermassive black holes (SMBH) is not well understood. Standard disc models match to zeroth order in predicting substantial energy dissipation within optically-thick material producing a characteristic strong blue/UV continuum. However they fail at reproducing more detailed comparisons to the observed spectral shapes along with their observed variability. Based on stellar mass black holes within our galaxy, accretion discs should undergo a transition into an X-ray hot, radiatively inefficient flow, below a (mass scaled) luminosity of $\sim 0.02\,L_{\rm{Edd}}$. While this has been seen in limited samples of nearby low-luminosity active galactic nuclei (AGN) and a few rare changing-look AGN, it is not at all clear whether this transition is present in the wider AGN population across cosmic time. A key issue is the difficulty in disentangling a change in spectral state from increased dust obscuration and/or host galaxy contamination, effectively drowning out the AGN emission. Here we use the new eROSITA eFEDS Survey to identify unobscured AGN from their X-ray emission, matched to excellent optical imaging from Subaru's Hyper Suprime-Cam; allowing the subtraction of the host galaxy contamination. The resulting, uncontaminated, AGN spectra reveal a smooth transition from a strongly disc dominated state in bright AGN, to the collapse of the disc into an inefficient X-ray plasma in the low luminosity AGN, with the transition occurring at $\sim 0.02\,L_{\rm{Edd}}$; revealing fundamental aspects of accretion physics in AGN.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Authors:
Jinghan Zhang,
Xiting Wang,
Weijieying Ren,
Lu Jiang,
Dongjie Wang,
Kunpeng Liu
Abstract:
Large Language Models (LLMs) gain substantial reasoning and decision-making capabilities from thought structures. However, existing methods such as Tree of Thought and Retrieval Augmented Thoughts often fall short in complex tasks due to the limitations of insufficient local retrieval of factual knowledge and inadequate global selection of strategies. These limitations make it challenging for thes…
▽ More
Large Language Models (LLMs) gain substantial reasoning and decision-making capabilities from thought structures. However, existing methods such as Tree of Thought and Retrieval Augmented Thoughts often fall short in complex tasks due to the limitations of insufficient local retrieval of factual knowledge and inadequate global selection of strategies. These limitations make it challenging for these methods to balance factual accuracy and comprehensive logical optimization effectively. To address these limitations, we introduce the Retrieval Augmented Thought Tree (RATT), a novel thought structure that considers both overall logical soundness and factual correctness at each step of the thinking process. Specifically, at every point of a thought branch, RATT performs planning and lookahead to explore and evaluate multiple potential reasoning steps, and integrate the fact-checking ability of Retrieval-Augmented Generation (RAG) with LLM's ability to assess overall strategy. Through this combination of factual knowledge and strategic feasibility, the RATT adjusts and integrates the thought tree structure to search for the most promising branches within the search space. This thought structure significantly enhances the model's coherence in logical inference and efficiency in decision-making, and thus increases the limit of the capacity of LLM to generate reliable inferences and decisions based on thought structures. A broad range of experiments on different types of tasks showcases that the RATT structure significantly outperforms existing methods in factual correctness and logical coherence.
△ Less
Submitted 11 July, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Authors:
Yubo Wang,
Xueguang Ma,
Ge Zhang,
Yuansheng Ni,
Abhranil Chandra,
Shiguang Guo,
Weiming Ren,
Aaran Arulraj,
Xuan He,
Ziyan Jiang,
Tianle Li,
Max Ku,
Kai Wang,
Alex Zhuang,
Rongqi Fan,
Xiang Yue,
Wenhu Chen
Abstract:
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in…
▽ More
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field.
△ Less
Submitted 23 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Symmetry enforced solution of the many-body Schrödinger equation with deep neural network
Authors:
Zhe Li,
Zixiang Lu,
Ruichen Li,
Xuelan Wen,
Xiang Li,
Liwei Wang,
Ji Chen,
Weiluo Ren
Abstract:
The integration of deep neural networks with the Variational Monte Carlo (VMC) method has marked a significant advancement in solving the Schrödinger equation. In this work, we enforce spin symmetry in the neural network-based VMC calculation with modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low comput…
▽ More
The integration of deep neural networks with the Variational Monte Carlo (VMC) method has marked a significant advancement in solving the Schrödinger equation. In this work, we enforce spin symmetry in the neural network-based VMC calculation with modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low computational cost. It predicts accurate energies while maintaining the correct symmetry in strongly correlated systems, even in cases where different spin states are nearly degenerate. Our approach also excels at spin-gap calculations, including the singlet-triplet gap in biradical systems, which is of high interest in photochemistry. Overall, this work establishes a robust framework for efficiently calculating various quantum states with specific spin symmetry in correlated systems, paving the way for novel discoveries in quantum science.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
Authors:
Weining Ren,
Zihan Zhu,
Boyang Sun,
Jiaqi Chen,
Marc Pollefeys,
Songyou Peng
Abstract:
Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusi…
▽ More
Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusion scenarios. In this paper, we introduce NeRF On-the-go, a simple yet effective approach that enables the robust synthesis of novel views in complex, in-the-wild scenes from only casually captured image sequences. Delving into uncertainty, our method not only efficiently eliminates distractors, even when they are predominant in captures, but also achieves a notably faster convergence speed. Through comprehensive experiments on various scenes, our method demonstrates a significant improvement over state-of-the-art techniques. This advancement opens new avenues for NeRF in diverse and dynamic real-world applications.
△ Less
Submitted 2 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection
Authors:
Dehong Kong,
Siyuan Liang,
Wenqi Ren
Abstract:
Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches…
▽ More
Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches that appear natural to the human eye while ensuring a high attack success rate. We notice that patches are natural looking when their overall color is consistent with the environment. Therefore, we propose a new method named Environmental Matching Attack(EMA) to address the issue of optimizing the adversarial patch under the constraints of color. To the best of our knowledge, this paper is the first to consider natural patches in the domain of UAVs. The EMA method exploits strong prior knowledge of a pretrained stable diffusion to guide the optimization direction of the adversarial patch, where the text guidance can restrict the color of the patch. To better match the environment, the contrast and brightness of the patch are appropriately adjusted. Instead of optimizing the adversarial patch itself, we optimize an adversarial perturbation patch which initializes to zero so that the model can better trade off attacking performance and naturalness. Experiments conducted on the DroneVehicle and Carpk datasets have shown that our work can reach nearly the same attack performance in the digital attack(no greater than 2 in mAP$\%$), surpass the baseline method in the physical specific scenarios, and exhibit a significant advantage in terms of naturalness in visualization and color difference with the environment.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Quasiparticle and Excitonic Structures of Few-layer and Bulk GaSe: Interlayer Coupling, Self-energy, and Electron-hole Interaction
Authors:
Fanhao Jia,
Zhao Tang,
Greis J. Cruz,
Weiwei Gao,
Shaowen Xu,
Wei Ren,
Peihong Zhang
Abstract:
Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepa…
▽ More
Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepancies between different experiments. We have performed GW plus Bethe-Salpeter equation (BSE) calculations for few-layer and bulk GaSe, aiming at understanding the effects of interlayer coupling and dielectric screening on excited state properties of GaSe, and how the electronic and optical properties evolve from strongly two-dimensional (2D) like to intermediate thick layers, and to three-dimensional (3D) bulk character. Using a new definition of the exciton binding energy, we are able to calculate the binding energies of all excitonic states. Our results reveal an interesting correlation between the binding energy of an exciton and the spread of its wave function in the real and momentum spaces. We find that the existence of (nearly) parallel valence and conduction bands facilitates the formation of excitonic states that spread out in the momentum space. Thus, these excitons tend to be more localized in real space and have large exciton binding energies. The interlayer coupling substantially suppresses the Mexican-hat-like dispersion of the top valence band seen in monolayer system, explaining the greatly enhanced photoluminescence (PL) as layer thickness increases. Our results also help resolve apparent discrepancies between different experiments. After including the quasiparticle and excitonic effects as well the optical activities of excitons, our results compare well with available experimental results.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Hierarchical Characterization of Thermoelectric Performance in Copper-Based Chalcogenide CsCu$_3$S$_2$: Unveiling the role of Anharmonic Lattice Dynamics
Authors:
Jincheng Yue,
Junda Li,
Jiongzhi Zheng,
Xingchen Shen,
Wenling Ren,
Yanhui Liu,
Tian Cui
Abstract:
We explicitly consider both phonon energy shifts and broadening arising from both cubic and quartic anharmonicities, as well as diagonal/non-diagonal terms of heat flux operators in thermal conductivity. Our findings show that the strong anharmonicity of CsCu$_3$S$_2$ primarily arises from the presence of $p$-$d$ anti-bonding hybridization between Cu and S atoms, coupled with the random oscillatio…
▽ More
We explicitly consider both phonon energy shifts and broadening arising from both cubic and quartic anharmonicities, as well as diagonal/non-diagonal terms of heat flux operators in thermal conductivity. Our findings show that the strong anharmonicity of CsCu$_3$S$_2$ primarily arises from the presence of $p$-$d$ anti-bonding hybridization between Cu and S atoms, coupled with the random oscillations of Cs atoms. Notably, the competition between phonon hardening described by the loop diagram and softening induced by the bubble diagram significantly influences particle-like propagation, predominantly reflected in group velocity and energy-conservation rule. Additionally, the electrical transport properties are determined by employing the precise momentum relaxation-time approximation (MRTA). At high temperatures, the thermoelectric performance of $p$-type CsCu$_3$S$_2$ reaches its optimum theoretical value of 0.94 along the in-plane direction based on advanced phonon renormalization theory. In striking contrast, the harmonic approximation theory significantly overestimates the thermoelectric efficiency at the same temperatures, rendering it an impractical expectation. Conversely, the first-order renormalization approach leads to a serious underestimation of the thermoelectric properties due to the over-correction of phonon energy. Our study not only reveals the pivotal role of anharmonic lattice dynamics in accurately assessing thermoelectric properties but also underscores the potential thermoelectric applications for novel copper-based chalcogenides.
△ Less
Submitted 10 May, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
Video Diffusion Models: A Survey
Authors:
Andrew Melnik,
Michal Ljubljanac,
Cong Lu,
Qi Yan,
Weiming Ren,
Helge Ritter
Abstract:
Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends.…
▽ More
Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Zonotope-based Symbolic Controller Synthesis for Linear Temporal Logic Specifications
Authors:
Wei Ren,
Raphael M. Jungers,
Dimos V. Dimarogonas
Abstract:
This paper studies the controller synthesis problem for nonlinear control systems under linear temporal logic (LTL) specifications using zonotope techniques. A local-to-global control strategy is proposed for the desired specification expressed as an LTL formula. First, a novel approach is developed to divide the state space into finite zonotopes and constrained zonotopes, which are called cells a…
▽ More
This paper studies the controller synthesis problem for nonlinear control systems under linear temporal logic (LTL) specifications using zonotope techniques. A local-to-global control strategy is proposed for the desired specification expressed as an LTL formula. First, a novel approach is developed to divide the state space into finite zonotopes and constrained zonotopes, which are called cells and allowed to intersect with the neighbor cells. Second, from the intersection relation, a graph among all cells is generated to verify the realization of the accepting path for the LTL formula. The realization verification determines if there is a need for the control design, and also results in finite local LTL formulas. Third, once the accepting path is realized, a novel abstraction-based method is derived for the controller design. In particular, we only focus on the cells from the realization verification and approximate each cell thanks to properties of zonotopes. Based on local symbolic models and local LTL formulas, an iterative synthesis algorithm is proposed to design all local abstract controllers, whose existence and combination establish the global controller for the LTL formula. Finally, the proposed framework is illustrated via a path planning problem of mobile robots.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Observation of intra-unit-cell superconductivity modulation
Authors:
Tianheng Wei,
Yanzhao Liu,
Wei Ren,
Ziqiang Wang,
Jian Wang
Abstract:
In unconventional high-temperature (high-Tc) superconductors, the symmetry-breaking electronic orders intertwined with the superconductivity provide important clues for understanding the nature of the unconventional pairing mechanism. Recently, an exotic superconducting order showing spatially periodic order parameter modulations and translational symmetry breaking, namely the pair density wave (P…
▽ More
In unconventional high-temperature (high-Tc) superconductors, the symmetry-breaking electronic orders intertwined with the superconductivity provide important clues for understanding the nature of the unconventional pairing mechanism. Recently, an exotic superconducting order showing spatially periodic order parameter modulations and translational symmetry breaking, namely the pair density wave (PDW) state, has attracted broad attention. Without breaking translational symmetry, point group symmetry breaking may also induce superconductivity modulations on different atom sites within a single unit cell. However, the intra-unit-cell superconductivity modulation has never been carefully investigated before. Here, using scanning tunneling microscopy/spectroscopy, we report the observation of intra-unit-cell superconductivity modulations in the superconducting gap size and the coherence peak sharpness in monolayer high-Tc Fe(Te,Se) films epitaxially grown on SrTiO3(001) substrates. Further analysis shows that the maxima and minima in the superconductivity modulation are centered at the crystallographic locations of the Te/Se atoms, revealing the breaking of the glide-mirror symmetry of the Te/Se atoms in monolayer high-Tc Fe(Te,Se) films grown on SrTiO3(001). Our findings provide precise microscopic information of superconductivity within the lattice unit cell and indicate that the p-orbital electrons of the Te/Se atoms also play an important role in Cooper pairing in unconventional high-Tc iron-based superconductors.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Authors:
Lihua Jing,
Rui Wang,
Wenqi Ren,
Xin Dong,
Cong Zou
Abstract:
Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent…
▽ More
Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent of their appearance, shape, size, quantity, and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context, while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations, we propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches, compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types, such as localized noise, printable, and naturalistic patches, exhibit notable improvements over state-of-the-art works. Our code is available at https://github.com/Lihua-Jing/PAD.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Python-Based Quantum Chemistry Calculations with GPU Acceleration
Authors:
Xiaojie Wu,
Qiming Sun,
Zhichen Pu,
Tianze Zheng,
Wenzhi Ma,
Wen Yan,
Xia Yu,
Zhengxiao Wu,
Mian Huo,
Xiang Li,
Weiluo Ren,
Sheng Gong,
Yumin Zhang,
Weihao Gao
Abstract:
To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository at \url{https://github.com/pyscf/gpu4pyscf}. This paper outlines the…
▽ More
To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository at \url{https://github.com/pyscf/gpu4pyscf}. This paper outlines the primary features, innovations, and advantages of this package. When performing Density Functional Theory (DFT) calculations on modern GPU platforms, GPU4PySCF delivers 30 times speedup over a 32-core CPU node, resulting in approximately 90% cost savings for most DFT tasks. The performance advantages and productivity improvements have been found in multiple industrial applications, such as generating potential energy surfaces, analyzing molecular properties, calculating solvation free energy, identifying chemical reactions in lithium-ion batteries, and accelerating neural-network methods. To make the package easy to extend and integrate with other Python packages, it is designed with PySCF-compatible interfaces and Pythonic implementations. This design choice enhances its coordination with the Python ecosystem.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Deep Learning Method for Computing Committor Functions with Adaptive Sampling
Authors:
Bo Lin,
Weiqing Ren
Abstract:
The committor function is a central object for quantifying the transitions between metastable states of dynamical systems. Recently, a number of computational methods based on deep neural networks have been developed for computing the high-dimensional committor function. The success of the methods relies on sampling adequate data for the transition, which still is a challenging task for complex sy…
▽ More
The committor function is a central object for quantifying the transitions between metastable states of dynamical systems. Recently, a number of computational methods based on deep neural networks have been developed for computing the high-dimensional committor function. The success of the methods relies on sampling adequate data for the transition, which still is a challenging task for complex systems at low temperatures. In this work, we propose a deep learning method with two novel adaptive sampling schemes (I and II). In the two schemes, the data are generated actively with a modified potential where the bias potential is constructed from the learned committor function. We theoretically demonstrate the advantages of the sampling schemes and show that the data in sampling scheme II are uniformly distributed along the transition tube. This makes a promising method for studying the transition of complex systems. The efficiency of the method is illustrated in high-dimensional systems including the alanine dipeptide and a solvated dimer system.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Computing Transition Pathways for the Study of Rare Events Using Deep Reinforcement Learning
Authors:
Bo Lin,
Yangzheng Zhong,
Weiqing Ren
Abstract:
Understanding the transition events between metastable states in complex systems is an important subject in the fields of computational physics, chemistry and biology. The transition pathway plays an important role in characterizing the mechanism underlying the transition, for example, in the study of conformational changes of bio-molecules. In fact, computing the transition pathway is a challengi…
▽ More
Understanding the transition events between metastable states in complex systems is an important subject in the fields of computational physics, chemistry and biology. The transition pathway plays an important role in characterizing the mechanism underlying the transition, for example, in the study of conformational changes of bio-molecules. In fact, computing the transition pathway is a challenging task for complex and high-dimensional systems. In this work, we formulate the path-finding task as a cost minimization problem over a particular path space. The cost function is adapted from the Freidlin-Wentzell action functional so that it is able to deal with rough potential landscapes. The path-finding problem is then solved using a actor-critic method based on the deep deterministic policy gradient algorithm (DDPG). The method incorporates the potential force of the system in the policy for generating episodes and combines physical properties of the system with the learning process for molecular systems. The exploitation and exploration nature of reinforcement learning enables the method to efficiently sample the transition events and compute the globally optimal transition pathway. We illustrate the effectiveness of the proposed method using three benchmark systems including an extended Mueller system and the Lennard-Jones system of seven particles.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
DI-Retinex: Digital-Imaging Retinex Theory for Low-Light Image Enhancement
Authors:
Shangquan Sun,
Wenqi Ren,
Jingyang Peng,
Fenglong Song,
Xiaochun Cao
Abstract:
Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex t…
▽ More
Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex theory in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the lowlight enhancement problem in an unsupervised manner, we propose an image-adaptive masked reverse degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Easy-to-configure zero-dimensional valley-chiral modes in a graphene point junction
Authors:
Konstantin Davydov,
Xi Zhang,
Wei Ren,
Matthew Coles,
Logan Kline,
Bryan Zucker,
Kenji Watanabe,
Takashi Taniguchi,
Ke Wang
Abstract:
The valley degree of freedom in 2D materials can be manipulated for low-dissipation quantum electronics called valleytronics. At the boundary between two regions of bilayer graphene with different atomic or electrostatic configuration, valley-polarized current has been realized. However, the demanding fabrication and operation requirements limit device reproducibility and scalability toward more a…
▽ More
The valley degree of freedom in 2D materials can be manipulated for low-dissipation quantum electronics called valleytronics. At the boundary between two regions of bilayer graphene with different atomic or electrostatic configuration, valley-polarized current has been realized. However, the demanding fabrication and operation requirements limit device reproducibility and scalability toward more advanced valleytronics circuits. We demonstrate a new device architecture of a point junction where a valley-chiral 0D PN junction is easily configured, switchable, and capable of carrying valley current with an estimated polarization of ~80%. This work provides a new building block in manipulating valley quantum numbers and scalable valleytronics.
△ Less
Submitted 1 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Charge density wave without long-range structural modulation in canted antiferromagnetic kagome FeGe
Authors:
Chenfei Shi,
Hanbin Deng,
Surya Rohith Kotla,
Yi Liu,
Sitaram Ramakrishnan,
Claudio Eisele,
Harshit Agarwal,
Leila Noohinejad,
Ji-Yong Liu,
Tianyu Yang,
Guowei Liu,
Bishal Baran Maity,
Qi Wang,
Zhaodi Lin,
Baojuan Kang,
Wanting Yang,
Yongchang Li,
Zhihua Yang,
Yuke Li,
Yanpeng Qi,
Arumugam Thamizhavel,
Wei Ren,
Guang-Han Cao,
Jia-Xin Yin,
Sander van Smaalen
, et al. (2 additional authors not shown)
Abstract:
Strongly correlated electron systems with a kagome lattice can host abundant exotic quantum states such as superconductivity and spin/charge density waves (CDW) due to the complicated interactions between different degrees of freedoms in the framework of a unique two-dimensional geometrically frustrated lattice structure. Recently, successive orders of A-type antiferromagnetism (AFM),…
▽ More
Strongly correlated electron systems with a kagome lattice can host abundant exotic quantum states such as superconductivity and spin/charge density waves (CDW) due to the complicated interactions between different degrees of freedoms in the framework of a unique two-dimensional geometrically frustrated lattice structure. Recently, successive orders of A-type antiferromagnetism (AFM), $2\times2\times2$ CDW and canted double-cone AFM have been manifested upon cooling in magnetic kagome FeGe. However, the mechanism of the CDW order and its interaction with magnetism are presently enigmatic at best. Here we investigate the evolution of CDW order with temperature across the spin canting transition in FeGe by single-crystal x-ray diffraction. Refinements of its modulated structure are presented using the superspace approach. Interestingly, the superlattice reflections originating from CDW-induced long-range structural modulation become extremely weak after the system enters the canted AFM while a $2\times2$ CDW in the $ab$ plane persists as a long-range order demonstrated by strong electronic modulation in the d$I$/d$V$ map of scanning tunneling spectroscopy. We discovered a novel CDW order without long-range structural modulation in FeGe probably because of the competition between CDW and canted AFM in determining the underlying crystal structure. In addition, occupational modulations of Ge1 atoms located in the kagome plane and displacive modulations of all the atoms were extracted from the refinements, confirming the existence of Ge atom dimerization along the $c$ axis as the major distortion and indicating a dynamic transformation between different CDW domains.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Electron Collimation in Twisted Bilayer Graphene via Gate-defined Moiré Barriers
Authors:
Wei Ren,
Xi Zhang,
Ziyan Zhu,
Moosa Khan,
Kenji Watanabe,
Takashi Taniguchi,
Efthimios Kaxiras,
Mitchell Luskin,
Ke Wang
Abstract:
Electron collimation via a graphene pn-junction allows electrostatic control of ballistic electron trajectories akin to that of an optical circuit. Similar manipulation of novel correlated electronic phases in twisted-bilayer graphene (tBLG) can provide additional probes to the underlying physics and device components towards advanced quantum electronics. In this work, we demonstrate collimation o…
▽ More
Electron collimation via a graphene pn-junction allows electrostatic control of ballistic electron trajectories akin to that of an optical circuit. Similar manipulation of novel correlated electronic phases in twisted-bilayer graphene (tBLG) can provide additional probes to the underlying physics and device components towards advanced quantum electronics. In this work, we demonstrate collimation of the electron flow via gate-defined moiré barriers in a tBLG device, utilizing the band-insulator gap of the moiré superlattice. A single junction can be tuned to host a chosen combination of conventional pseudo barrier and moiré tunnel barriers, from which we demonstrate improved collimation efficiency. By measuring transport through two consecutive moiré collimators separated by 1 um, we demonstrate evidence of electron collimation in tBLG in the presence of realistic twist-angle inhomogeneity.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points
Authors:
Tian Ma,
Chuyang Shang,
Wanzhu Ren,
Yuancheng Li,
Jiiayi Yang,
Jiali Qian
Abstract:
In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output…
▽ More
In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output, and propose a method called Sparse Generation to make pseudo labels sparse. It constructs dense tensors through the relationship between data and detector model, optimizes three of its parameters, and obtains a sparse tensor via coordinated calculation, thereby indirectly obtaining higher quality pseudo labels, and solving the model's density problem in the situation of only a small amount of supervised annotation data can be used. On two broadly used open-source datasets (RSOD, SIMD) and a self-built dataset (Bullet-Hole), the experimental results showed that the proposed method has a significant advantage in terms of overall performance metrics, comparing to that state-of-the-art method.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
RIOJA. Complex Dusty Starbursts in a Major Merger B14-65666 at z=7.15
Authors:
Yuma Sugahara,
Javier Álvarez-Márquez,
Takuya Hashimoto,
Luis Colina,
Akio K. Inoue,
Luca Costantin,
Yoshinobu Fudamoto,
Ken Mawatari,
Yi W. Ren,
Santiago Arribas,
Tom J. L. C. Bakx,
Carmen Blanco-Prieto,
Daniel Ceverino,
Alejandro Crespo Gómez,
Masato Hagimoto,
Takeshi Hashigaya,
Rui Marques-Chaves,
Hiroshi Matsuo,
Yurina Nakazato,
Miguel Pereira-Santaella,
Yoichi Tamura,
Mitsutaka Usui,
Naoki Yoshida
Abstract:
We present JWST NIRCam imaging of B14-65666 ("Big Three Dragons"), a bright Lyman-break galaxy system ($M_\text{UV}=-22.5$ mag) at $z=7.15$. The high angular resolution of NIRCam reveals the complex morphology of two galaxy components: galaxy E has a compact core (E-core), surrounded by diffuse, extended, rest-frame optical emission, which is likely to be tidal tails; and galaxy W has a clumpy and…
▽ More
We present JWST NIRCam imaging of B14-65666 ("Big Three Dragons"), a bright Lyman-break galaxy system ($M_\text{UV}=-22.5$ mag) at $z=7.15$. The high angular resolution of NIRCam reveals the complex morphology of two galaxy components: galaxy E has a compact core (E-core), surrounded by diffuse, extended, rest-frame optical emission, which is likely to be tidal tails; and galaxy W has a clumpy and elongated morphology with a blue UV slope ($β_\text{UV}=-2.2\pm0.1$). The flux excess, F356W$-$F444W, peaks at the E-core ($1.05^{+0.08}_{-0.09}$ mag), tracing the presence of strong [OIII] 4960,5008 Å emission. ALMA archival data show that the bluer galaxy W is brighter in dust continua than the redder galaxy E, while the tails are bright in [OIII] 88 $\mathrm{μm}$. The UV/optical and sub-mm SED fitting confirms that B14-65666 is a major merger in a starburst phase as derived from the stellar mass ratio (3:1 to 2:1) and the star-formation rate, $\simeq1$ dex higher than the star-formation main sequence at the same redshift. The galaxy E is a dusty ($A_\text{V}=1.2\pm0.1$ mag) starburst with a possible high dust temperature ($\ge63$-$68$ K). The galaxy W would have a low dust temperature ($\le27$-$33$ K) or patchy stellar-and-dust geometry, as suggested from the infrared excess (IRX) and $β_\text{UV}$ diagram. The high optical-to-FIR [OIII] line ratio of the E-core shows its lower gas-phase metallicity ($\simeq0.2$ Z$_{\odot}$) than the galaxy W. These results agree with a scenario where major mergers disturb morphology and induce nuclear dusty starbursts triggered by less-enriched inflows. B14-65666 shows a picture of complex stellar buildup processes during major mergers in the epoch of reionization.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks
Authors:
Max Ku,
Cong Wei,
Weiming Ren,
Harry Yang,
Wenhu Chen
Abstract:
In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these…
▽ More
In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these methods frequently rely on textual input as the editing guidance, leading to ambiguities and limiting the types of edits they can perform. Recognizing these challenges, we introduce AnyV2V, a novel tuning-free paradigm designed to simplify video editing into two primary steps: (1) employing an off-the-shelf image editing model to modify the first frame, (2) utilizing an existing image-to-video generation model to generate the edited video through temporal feature injection. AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods. AnyV2V can also support any video length. Our evaluation indicates that AnyV2V significantly outperforms other baseline methods in automatic and human evaluations by significant margin, maintaining visual consistency with the source video while achieving high-quality edits across all the editing tasks.
△ Less
Submitted 10 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
How Powerful Potential of Attention on Image Restoration?
Authors:
Cong Wang,
Jinshan Pan,
Yeying Jin,
Liyan Wang,
Wei Wang,
Gang Fu,
Wenqi Ren,
Xiaochun Cao
Abstract:
Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that…
▽ More
Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that FFNs are key-value memories \cite{geva2020transformer}, which are vital in modern Transformer architectures. In this paper, we conduct an empirical study to explore the potential of attention mechanisms without using FFN and provide novel structures to demonstrate that removing FFN is flexible for image restoration. Specifically, we propose Continuous Scaling Attention (\textbf{CSAttn}), a method that computes attention continuously in three stages without using FFN. To achieve competitive performance, we propose a series of key components within the attention. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance. We apply our \textbf{CSAttn} to several image restoration tasks and show that our model can outperform CNN-based and Transformer-based image restoration approaches.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier
Authors:
Fan Zhang,
Wei Qin,
Weijieying Ren,
Lei Wang,
Zetong Chen,
Richang Hong
Abstract:
In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients fr…
▽ More
In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients from different negative classes. Therefore, we propose the Gradient-Aware Logit Adjustment (GALA) loss, which adjusts the logits based on accumulated gradients to balance the optimization process. Additionally, We find that most of the solutions to long-tailed problems are still biased towards head classes in the end, and we propose a simple and post hoc prediction re-balancing strategy to further mitigate the basis toward head class. Extensive experiments are conducted on multiple popular long-tailed recognition benchmark datasets to evaluate the effectiveness of these two designs. Our approach achieves top-1 accuracy of 48.5\%, 41.4\%, and 73.3\% on CIFAR100-LT, Places-LT, and iNaturalist, outperforming the state-of-the-art method GCL by a significant margin of 3.62\%, 0.76\% and 1.2\%, respectively. Code is available at https://github.com/lt-project-repository/lt-project.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction
Authors:
Zixuan Li,
Yutao Zeng,
Yuxin Zuo,
Weicheng Ren,
Wenxuan Liu,
Miao Su,
Yucan Guo,
Yantao Liu,
Xiang Li,
Zhilei Hu,
Long Bai,
Wei Li,
Yidan Liu,
Pan Yang,
Xiaolong Jin,
Jiafeng Guo,
Xueqi Cheng
Abstract:
In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code…
▽ More
In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes, with which complex schema information, such as constraints among tasks in UIE, can be captured in an LLM-friendly manner. We further construct a code-style schema library covering over $\textbf{30,000}$ types of knowledge, which is the largest one for UIE, to the best of our knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase learning framework that enhances its schema understanding ability via code pretraining and its schema following ability via instruction tuning. After code pretraining on around $1.5$B automatically constructed data, KnowCoder already attains remarkable generalization ability and achieves relative improvements by $\textbf{49.8%}$ F1, compared to LLaMA2, under the few-shot setting. After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12.5%}$ and $\textbf{21.9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively. Additionally, based on our unified schema representations, various human-annotated datasets can simultaneously be utilized to refine KnowCoder, which achieves significant improvements up to $\textbf{7.5%}$ under the supervised setting.
△ Less
Submitted 13 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Authors:
Jingyun Xue,
Tao Wang,
Jun Wang,
Kaihao Zhang,
Wenhan Luo,
Wenqi Ren,
Zikun Liu,
Hyunhee Park,
Xiaochun Cao
Abstract:
Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing…
▽ More
Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes
Authors:
Stamatios Georgoulis,
Weining Ren,
Alfredo Bochicchio,
Daniel Eckert,
Yuanyou Li,
Abel Gawel
Abstract:
Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Contemporary RGB camera-based methods rely on modeling camera and scene properties however, are often under-constrained and fall short in unknown categories. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demon…
▽ More
Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Contemporary RGB camera-based methods rely on modeling camera and scene properties however, are often under-constrained and fall short in unknown categories. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments with simplified dynamic objects. This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too. To this end, we introduce a novel divide-and-conquer pipeline that combines: (a) ego-motion compensated events, computed via a scene understanding module that predicts monocular depth and camera pose as auxiliary tasks, and (b) optical flow from a dedicated optical flow module. These intermediate representations are then fed into a segmentation module that predicts motion segmentation masks. A novel transformer-based temporal attention module in the segmentation module builds correlations across adjacent 'frames' to get temporally consistent segmentation masks. Our method sets the new state-of-the-art on the classic EV-IMO benchmark (indoors), where we achieve improvements of 2.19 moving object IoU (2.22 mIoU) and 4.52 point IoU respectively, as well as on a newly-generated motion segmentation and tracking benchmark (outdoors) based on the DSEC event dataset, termed DSEC-MOTS, where we show improvement of 12.91 moving object IoU.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
$π$ Phase Interlayer Shift and Stacking Fault in the Kagome Superconductor CsV$_3$Sb$_5$
Authors:
Feng Jin,
Wei Ren,
Mingshu Tan,
Mingtai Xie,
Bingru Lu,
Zheng Zhang,
Jianting Ji,
Qingming Zhang
Abstract:
The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- a…
▽ More
The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- and temperature-dependent Raman measurements reveal the breaking of C$_6$ rotational symmetry and the presence of three distinct domains oriented at approximately 120°to each other. The observations demonstrate that the CDW phase can be naturally explained as a 2c staggered order phase with adjacent layers exhibiting a relative $π$ phase shift. Further, we discover a first-order structural phase transition at approximately 65 K and suggest that it is a stacking order-disorder phase transition due to stacking fault, supported by the thermal hysteresis behavior of a Cs-related phonon mode. Our findings highlight the significance of the stacking degree of freedom in CsV$_3$Sb$_5$ and offer structural insights to comprehend the entanglement between superconductivity and CDW.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Logit Standardization in Knowledge Distillation
Authors:
Shangquan Sun,
Wenqi Ren,
Jingzhi Li,
Rui Wang,
Xiaochun Cao
Abstract:
Knowledge distillation involves transferring soft labels from a teacher to a student using a shared temperature-based softmax function. However, the assumption of a shared temperature between teacher and student implies a mandatory exact match between their logits in terms of logit range and variance. This side-effect limits the performance of student, considering the capacity discrepancy between…
▽ More
Knowledge distillation involves transferring soft labels from a teacher to a student using a shared temperature-based softmax function. However, the assumption of a shared temperature between teacher and student implies a mandatory exact match between their logits in terms of logit range and variance. This side-effect limits the performance of student, considering the capacity discrepancy between them and the finding that the innate logit relations of teacher are sufficient for student to learn. To address this issue, we propose setting the temperature as the weighted standard deviation of logit and performing a plug-and-play Z-score pre-process of logit standardization before applying softmax and Kullback-Leibler divergence. Our pre-process enables student to focus on essential logit relations from teacher rather than requiring a magnitude match, and can improve the performance of existing logit-based distillation methods. We also show a typical case where the conventional setting of sharing temperature between teacher and student cannot reliably yield the authentic distillation evaluation; nonetheless, this challenge is successfully alleviated by our Z-score. We extensively evaluate our method for various student and teacher models on CIFAR-100 and ImageNet, showing its significant superiority. The vanilla knowledge distillation powered by our pre-process can achieve favorable performance against state-of-the-art methods, and other distillation variants can obtain considerable gain with the assistance of our pre-process.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Mixed-halide perovskite alloys $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$: New insight of configuration entropy effect from first principles and phase diagrams
Authors:
Fang Pan,
Junni Zhai,
Jinyu Chen,
Lin Yang,
Hua Dong,
Fang Yuan,
Zhuangde Jiang,
Wei Ren,
Zuo-Guang Ye,
Guo-Xu Zhang,
Jingrui Li
Abstract:
Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of co…
▽ More
Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of configuration and vibration entropies. Allowed by the $20$-atomic models for the $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$ series, the partition functions and therewith thermodynamic state functions are calculated by traversing all possible mixed-halide configurations. We can thus evaluate the temperature- and system-dependent configuration entropy, which largely corrects the conventional approach based on the ideal solution model. Finally, temperature-composition phase diagrams that include $α$, $β$, $γ$ and $δ$ phases of both alloys are constructed based on the free energy data, for which the contribution of phonon vibrations is included.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
Authors:
Weijieying Ren,
Xinlong Li,
Lei Wang,
Tianxiang Zhao,
Wei Qin
Abstract:
Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between l…
▽ More
Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between learning plasticity and memory stability. Plenty of existing works have explored strategies like memory replay, regularization and parameter isolation, but little is known about the geometric connection of various adjacent minima in the continual LLMs fine-tuning scenarios. In this work, we investigate the geometric connections of different minima through the lens of mode connectivity, which means different minima can be connected by a low-loss valley. Through extensive experiments, we uncover the mode connectivity phenomenon in the LLMs continual learning scenario and find that it can strike a balance between plasticity and stability. Building upon these findings, we propose a simple yet effective method called Interpolation-based LoRA (I-LoRA), which constructs a dual-memory experience replay framework based on LoRA parameter interpolations. Extensive experiments and analysis on eight domain-specific CL benchmarks demonstrate that I-LoRA consistently show significant improvement over the previous state-of-the-art approaches with up to $11\%$ performance gains, providing a strong baseline and insights for future research on the large language model continual learning problem. Our code is available at \url{https://github.com/which47/LLMCL}.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Authors:
Alex Zhuang,
Ge Zhang,
Tianyu Zheng,
Xinrun Du,
Junjie Wang,
Weiming Ren,
Stephen W. Huang,
Jie Fu,
Xiang Yue,
Wenhu Chen
Abstract:
Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (…
▽ More
Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks. Furthermore, StructLM demonstrates strong generalization across 6 novel held-out SKG tasks, outperforming TableLlama by an average of 35\% and Flan-UL2 20B by an average of 10\%. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level.
△ Less
Submitted 24 April, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Authors:
Weiming Ren,
Huan Yang,
Ge Zhang,
Cong Wei,
Xinrun Du,
Wenhao Huang,
Wenhu Chen
Abstract:
Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrati…
▽ More
Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrative. To mitigate these issues, we propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos. We also extend the proposed approaches to show their potential to improve consistency in auto-regressive long video generation and camera motion control. To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation. Our automatic and human evaluation results demonstrate the superiority of ConsistI2V over existing methods.
△ Less
Submitted 30 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Ultra-low glassy thermal conductivity and controllable, promising thermoelectric properties in crystalline o-CsCu5S3
Authors:
Jincheng Yue,
Jiongzhi Zheng,
Junda Li,
Siqi Guo,
Wenling Ren,
Han Liu,
Yanhui Liu,
Tian Cui
Abstract:
We thoroughly investigate the microscopic mechanisms of the thermal transport in orthorhombic \textit{o}-CsCu$_5$S$_3$ by integrating the first-principles-based self-consistent phonon calculations (SCP) with the linearized Wigner transport equation (LWTE). Our methodology takes into account contributions to phonon energy shifts and phonon scattering rates from both three- and four-phonon processes…
▽ More
We thoroughly investigate the microscopic mechanisms of the thermal transport in orthorhombic \textit{o}-CsCu$_5$S$_3$ by integrating the first-principles-based self-consistent phonon calculations (SCP) with the linearized Wigner transport equation (LWTE). Our methodology takes into account contributions to phonon energy shifts and phonon scattering rates from both three- and four-phonon processes. Additionally, it incorporates the off-diagonal terms of heat flux operators to calculate the total thermal conductivity. The predicted $κ_\mathrm{L}$ with an extremely weak temperature dependence following $\sim T^{-0.33}$, in good agreement with experimental values along with the parallel to the Bridgman growth direction. Such nonstandard temperature dependence of $κ_\mathrm{L}$ can be traced back to the dual particlelike-wavelike behavior exhibited by thermal phonons. Specifically, the coexistence of the stochastic oscillation of Cs atoms and metavalent bonding among interlayer Cu-S atoms limits the particle-like phonon propagation and enhances the wave-like tunneling of phonons. Simultaneously, the electrical transport properties are determined by employing a precise momentum relaxation-time approximation (MRTA) within the framework of the linearized Boltzmann transport equation (LBTE). By properly adjusting the carrier concentration, excellent thermoelectric performance is achieved, with a maximum thermoelectric conversion efficiency of 18.4$\%$ observed at 800 K in \textit{p}-type \textit{o}-CsCu$_5$S$_3$.} Our work not only elucidates the anomalous thermal transport behavior in the copper-based chalcogenide \textit{o}-CsCu$_5$S$_3$ but also provides insights for manipulating its thermal and electronic properties for potential thermoelectric applications.
△ Less
Submitted 15 April, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Authors:
Qingpei Guo,
Furong Xu,
Hanxiao Zhang,
Wang Ren,
Ziping Ma,
Lin Ju,
Jian Wang,
Jingdong Chen,
Ming Yang
Abstract:
Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed a…
▽ More
Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed at enhancing multimodal foundation models to well understand images in both languages. To handle such a scale of dataset, we propose a novel grouped aggregation approach for image-text contrastive loss computation, which reduces the communication overhead and GPU memory demands significantly, facilitating a 60% increase in training speed. We pretrain a series of bilingual image-text foundation models with an enhanced fine-grained understanding ability on BM-6B, the resulting models, dubbed as $M^2$-Encoders (pronounced "M-Square"), set new benchmarks in both languages for multimodal retrieval and classification tasks. Notably, Our largest $M^2$-Encoder-10B model has achieved top-1 accuracies of 88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification setting, surpassing previously reported SoTA methods by 2.2% and 21.1%, respectively. The $M^2$-Encoder series represents one of the most comprehensive bilingual image-text foundation models to date, so we are making it available to the research community for further exploration and development.
△ Less
Submitted 3 February, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Emulation-based Stabilization for Networked Control Systems with Stochastic Channels
Authors:
Wei Ren,
Wei Wang,
Zhuo-Rui Pan,
Xi-Ming Sun,
Andrew R. Teel,
Dragan Nesic
Abstract:
This paper studies the stabilization problem of networked control systems (NCSs) with random packet dropouts caused by stochastic channels. To describe the effects of stochastic channels on the information transmission, the transmission times are assumed to be deterministic, whereas the packet transmission is assumed to be random. We first propose a stochastic scheduling protocol to model random p…
▽ More
This paper studies the stabilization problem of networked control systems (NCSs) with random packet dropouts caused by stochastic channels. To describe the effects of stochastic channels on the information transmission, the transmission times are assumed to be deterministic, whereas the packet transmission is assumed to be random. We first propose a stochastic scheduling protocol to model random packet dropouts, and address the properties of the proposed stochastic scheduling protocol. The proposed scheduling protocol provides a unified modelling framework for a general class of random packet dropouts due to different stochastic channels. Next, the proposed scheduling protocol is embedded into the closed-loop system, which leads to a stochastic hybrid model for NCSs with random packet dropouts. Based on this stochastic hybrid model, we follow the emulation approach to establish sufficient conditions to guarantee uniform global asymptotical stability in probability. In particular, an upper bound on the maximally allowable transmission interval is derived explicitly for all stochastic protocols satisfying Lyapunov conditions that guarantee uniform global asymptotic stability in probability. Finally, two numerical examples are presented to demonstrate the derived results.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
MixNet: Towards Effective and Efficient UHD Low-Light Image Enhancement
Authors:
Chen Wu,
Zhuoran Zheng,
Xiuyi Jia,
Wenqi Ren
Abstract:
With the continuous advancement of imaging devices, the prevalence of Ultra-High-Definition (UHD) images is rising. Although many image restoration methods have achieved promising results, they are not directly applicable to UHD images on devices with limited computational resources due to the inherently high computational complexity of UHD images. In this paper, we focus on the task of low-light…
▽ More
With the continuous advancement of imaging devices, the prevalence of Ultra-High-Definition (UHD) images is rising. Although many image restoration methods have achieved promising results, they are not directly applicable to UHD images on devices with limited computational resources due to the inherently high computational complexity of UHD images. In this paper, we focus on the task of low-light image enhancement (LLIE) and propose a novel LLIE method called MixNet, which is designed explicitly for UHD images. To capture the long-range dependency of features without introducing excessive computational complexity, we present the Global Feature Modulation Layer (GFML). GFML associates features from different views by permuting the feature maps, enabling efficient modeling of long-range dependency. In addition, we also design the Local Feature Modulation Layer (LFML) and Feed-forward Layer (FFL) to capture local features and transform features into a compact representation. This way, our MixNet achieves effective LLIE with few model parameters and low computational complexity. We conducted extensive experiments on both synthetic and real-world datasets, and the comprehensive results demonstrate that our proposed method surpasses the performance of current state-of-the-art methods. The code will be available at \url{https://github.com/zzr-idam/MixNet}.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection
Authors:
Weibo Jiang,
Weihong Ren,
Jiandong Tian,
Liangqiong Qu,
Zhiyong Wang,
Honghai Liu
Abstract:
Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregati…
▽ More
Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
EsaCL: Efficient Continual Learning of Sparse Models
Authors:
Weijieying Ren,
Vasant G Honavar
Abstract:
A key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks. Many existing approaches to this problem work by either retraining the model on previous tasks or by expanding the model to accommodate new tasks. However, these approaches typically suffer from increased storage and computational requirements, a…
▽ More
A key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks. Many existing approaches to this problem work by either retraining the model on previous tasks or by expanding the model to accommodate new tasks. However, these approaches typically suffer from increased storage and computational requirements, a problem that is worsened in the case of sparse models due to need for expensive re-training after sparsification. To address this challenge, we propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power, and circumvent the need of retraining. We conduct a theoretical analysis of loss landscapes with parameter pruning, and design a directional pruning (SDP) strategy that is informed by the sharpness of the loss function with respect to the model parameters. SDP ensures model with minimal loss of predictive accuracy, accelerating the learning of sparse models at each stage. To accelerate model update, we introduce an intelligent data selection (IDS) strategy that can identify critical instances for estimating loss landscape, yielding substantially improved data efficiency. The results of our experiments show that EsaCL achieves performance that is competitive with the state-of-the-art methods on three continual learning benchmarks, while using substantially reduced memory and computational resources.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Outflow-related radio emission in radio-quiet quasars
Authors:
Mai Liao,
Junxian Wang,
Wenke Ren,
Minhua Zhou
Abstract:
In this work, we revisit the relationship between [O III] line width $w_{\rm 90}$ (as the indicator of AGN outflow velocity) and the radio emission in RQQs by employing a large sample of Type I quasars ($\sim 37,000$) selected from the Sloan Digital Sky Survey (SDSS) Data Release Sixteen. By median stacking the radio images (to include the dominant fraction of individually radio non-detected RQQs)…
▽ More
In this work, we revisit the relationship between [O III] line width $w_{\rm 90}$ (as the indicator of AGN outflow velocity) and the radio emission in RQQs by employing a large sample of Type I quasars ($\sim 37,000$) selected from the Sloan Digital Sky Survey (SDSS) Data Release Sixteen. By median stacking the radio images (to include the dominant fraction of individually radio non-detected RQQs) of Karl G. Jansky Very Large Array (VLA) Sky Survey (VLASS) for subsamples of RQQs with different $w_{\rm 90}$, our study demonstrates that, the correlation between $w_{\rm 90}$ and radio emission in our SDSS RQQs is significant, and remains solid after controlling the effects of black hole mass, quasar luminosity, Eddington ratio and redshift. This intrinsic link supports that the [O III] outflows in quasars, most likely resulted from wide-angled sub-relativistic quasar winds launched from the accretion disc, could make a dominant contribution to radio emission in the general RQQs. Alternatively, the correlation may be attributed to low-power jets in RQQs if they are ubiquitous and could efficiently enhance the [O III] width through interacting with the ISM. Meanwhile, the star-formation rates traced by the flux ratio of [Ne V]/[O II] emission lines display no dependence on $w_{\rm 90}$ after controlling the effects of black hole mass, quasar luminosity, Eddington ratio and redshift. This suggests that the stronger radio emission in RQQs with larger $w_{\rm 90}$ could not be attributed to outflow enhanced (positive feedback) star formation in the hosts. However, this also indicates the outflows, though exhibiting robust correlation with radio power, produce neither positive nor negative feedback to the star formation in their hosts.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment
Authors:
Jiquan Yuan,
Xinyan Cao,
Jinming Che,
Qinyuan Wang,
Sen Liang,
Wei Ren,
Jinlong Lin,
Xixin Cao
Abstract:
Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images (AIGIs) from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are typically generated by ge…
▽ More
Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images (AIGIs) from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are typically generated by generative models using text prompts. Considerable efforts have been made in the past years to advance AIGCIQA. However, most existing AIGCIQA methods regress predicted scores directly from individual generated images, overlooking the information contained in the text prompts of these images. This oversight partially limits the performance of these AIGCIQA methods. To address this issue, we propose a text-image encoder-based regression (TIER) framework. Specifically, we process the generated images and their corresponding text prompts as inputs, utilizing a text encoder and an image encoder to extract features from these text prompts and generated images, respectively. To demonstrate the effectiveness of our proposed TIER method, we conduct extensive experiments on several mainstream AIGCIQA databases, including AGIQA-1K, AGIQA-3K, and AIGCIQA2023. The experimental results indicate that our proposed TIER method generally demonstrates superior performance compared to baseline in most cases.
△ Less
Submitted 11 January, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
GraphGPT: Graph Learning with Generative Pre-trained Transformers
Authors:
Qifang Zhao,
Weidong Ren,
Tianyu Li,
Xiaoxiao Xu,
Hong Liu
Abstract:
We introduce \textit{GraphGPT}, a novel model for Graph learning by self-supervised Generative Pre-training Transformers. Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes reversibly using the Eulerian path first. Then we feed the tokens into a standard transformer decoder and pre-train it with the next-token-prediction (NTP) t…
▽ More
We introduce \textit{GraphGPT}, a novel model for Graph learning by self-supervised Generative Pre-training Transformers. Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes reversibly using the Eulerian path first. Then we feed the tokens into a standard transformer decoder and pre-train it with the next-token-prediction (NTP) task. Lastly, we fine-tune the GraphGPT model with the supervised tasks. This intuitive, yet effective model achieves superior or close results to the state-of-the-art methods for the graph-, edge- and node-level tasks on the large scale molecular dataset PCQM4Mv2, the protein-protein association dataset ogbl-ppa and the ogbn-proteins dataset from the Open Graph Benchmark (OGB). Furthermore, the generative pre-training enables us to train GraphGPT up to 400M+ parameters with consistently increasing performance, which is beyond the capability of GNNs and previous graph transformers. The source code and pre-trained checkpoints will be released soon\footnote{\url{https://github.com/alibaba/graph-gpt}} to pave the way for the graph foundation model research, and also to assist the scientific discovery in pharmaceutical, chemistry, material and bio-informatics domains, etc.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
A Literature Review of Energy Justice
Authors:
Weihang Ren,
Yongpei Guan,
Feng Qiu,
Todd Levin,
Miguel Heleno
Abstract:
Energy justice, at the intersection of energy and societal ethics, studies the origins, quantification, and resolution of persistent and potential inequities within the energy sector, serving as a foundational pillar for societal harmony. In this review, we overview the historical and modern definitions of energy equity and frameworks of energy justice. We highlight the tools adopted to measure eq…
▽ More
Energy justice, at the intersection of energy and societal ethics, studies the origins, quantification, and resolution of persistent and potential inequities within the energy sector, serving as a foundational pillar for societal harmony. In this review, we overview the historical and modern definitions of energy equity and frameworks of energy justice. We highlight the tools adopted to measure equity in the energy context, unveiling multifaceted inequities that permeate global energy landscapes. We discuss the limitations of prevalent metrics such as the Gini coefficient and Generalized Entropy Indices in the evaluation of energy justice concerns. Finally, we analyze publications that examined current practices and proposed improving methods towards a more equitable energy market for the society from policy, planning, and operation perspectives.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.