Skip to main content

Showing 1–50 of 422 results for author: Zuo, W

  1. arXiv:2407.09919  [pdf, other

    cs.CV

    Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors

    Authors: Wei Shang, Dongwei Ren, Wanying Zhang, Yuming Fang, Wangmeng Zuo, Kede Ma

    Abstract: Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial detail reproduction, temporal consistency, and computational complexity. In this paper, we first describe a strong baseline for AVSR by putting together three variants of elementary building blocks: 1) a flow-guide… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024, the code is available at https://github.com/shangwei5/ST-AVSR

    ACM Class: I.4.3

  2. arXiv:2407.09734  [pdf, ps, other

    nucl-th

    Ab initio study of Z(N) = 6 magicity

    Authors: H. Li, H. J. Ong, D. Fang, I. A. Mazur, I. J. Shin, A. M. Shirokov, J. P. Vary, P. Yin, X. Zhao, W. Zuo

    Abstract: The existence of magic numbers of protons and neutrons in nuclei is essential for understanding nuclear structure and fundamental nuclear forces. Over decades, researchers have conducted theoretical and experimental studies on the new magic number Z(N) = 6, focusing on observables such as radii, binding energy, electromagnetic transition, and nucleon separation energies. We perform the ab initio n… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  3. arXiv:2407.07518  [pdf, other

    cs.CV

    Multi-modal Crowd Counting via a Broker Modality

    Authors: Haoliang Meng, Xiaopeng Hong, Chenhao Wang, Miao Shang, Wangmeng Zuo

    Abstract: Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images. This task is challenging due to the significant gap between these distinct modalities. In this paper, we propose a novel approach by introducing an auxiliary broker modality and on this basis frame the task as a triple-modal learning problem. We devise a fusion-based method to generate this brok… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of the paper and supplemental material to appear in ECCV 2024. Please cite the final published version. Code is available at https://github.com/HenryCilence/Broker-Modality-Crowd-Counting

  4. arXiv:2407.03422  [pdf, ps, other

    astro-ph.GA

    Rest-Frame Optical Spectroscopy of Ten z $\sim$ 2 Weak Emission-Line Quasars

    Authors: Ying Chen, Bin Luo, W. N. Brandt, Wenwen Zuo, Cooper Dix, Trung Ha, Brandon Matthews, Jeremiah D. Paul, Richard M. Plotkin, Ohad Shemmer

    Abstract: We present near-infrared spectroscopy of ten weak emission-line quasars (WLQs) at redshifts of $z\sim2$, obtained with the Palomar 200-inch Hale Telescope. WLQs are an exceptional population of type 1 quasars that exhibit weak or no broad emission lines in the ultraviolet (e.g., the C IV $λ1549$ line), and they display remarkable X-ray properties. We derive H$β$-based single-epoch virial black-hol… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 8 figures, accepted for publication in ApJ

  5. arXiv:2407.01155  [pdf, other

    cs.LG

    CPT: Consistent Proxy Tuning for Black-box Optimization

    Authors: Yuanyang He, Zitong Huang, Xinxing Xu, Rick Siow Mong Goh, Salman Khan, Wangmeng Zuo, Yong Liu, Chun-Mei Feng

    Abstract: Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It applies the difference of the output logits before and after tuning a smaller white-box "proxy" model to improve the black-box model. However, this technique serv… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 10 pages,2 figures plus supplementary materials

  6. arXiv:2407.01094  [pdf, other

    cs.CV

    Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

    Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

    Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.00884  [pdf, other

    nucl-th nucl-ex

    Mechanisms of mirror energy difference for states exhibiting Thomas-Ehrman shift: Gamow shell model case studies of $^{18}$Ne/$^{18}$O and $^{19}$Na/$^{19}$O

    Authors: J. G. Li, K. H. Li, N. Michel, H. H. Li, W. Zuo

    Abstract: The mirror energy difference (MED) of the mirror state, especially for states bearing the Thomas-Erhman shift, serves as a sensitive probe of isospin symmetry breaking. We employ the Gamow shell model, which includes the inter-nucleon correlation and continuum coupling, to investigate the MED for $sd$-shell nuclei by taking the $^{18}$Ne/$^{18}$O and $^{19}$Na/$^{19}$O as examples. Our GSM provide… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 6 Pages,6 figures

  8. arXiv:2406.14207  [pdf, other

    cs.LG

    LayerMatch: Do Pseudo-labels Benefit All Layers?

    Authors: Chaoqi Liang, Guanglei Yang, Lifeng Qiao, Zitong Huang, Hongliang Yan, Yunchao Wei, Wangmeng Zuo

    Abstract: Deep neural networks have achieved remarkable performance across various tasks when supplied with large-scale labeled data. However, the collection of labeled data can be time-consuming and labor-intensive. Semi-supervised learning (SSL), particularly through pseudo-labeling algorithms that iteratively assign pseudo-labels for self-training, offers a promising solution to mitigate the dependency o… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.11138  [pdf, other

    cs.CV cs.AI

    Diffusion Models in Low-Level Vision: A Survey

    Authors: Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, Xiu Li

    Abstract: Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compellin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages, 23 figures, 4 tables

  10. arXiv:2406.08151  [pdf, other

    nucl-th

    Unveiling potential neutron halos in intermediate-mass nuclei: an \textit{ab initio} study

    Authors: H. H. Li, J. G. Li, M. R. Xie, W. Zuo

    Abstract: Halos epitomize the fascinating interplay between weak binding, shell evolution, and deformation effects, especially in nuclei near the drip line. In this Letter, we apply the state-of-the-art \textit{ab initio} valence-space in-medium similarity renormalization group approach to predict potential candidates for one- and two-neutron halo in the intermediate-mass region. Notably, we use spectroscop… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.07956  [pdf, other

    nucl-th

    Ab initio calculations with a new local chiral N3LO nucleon-nucleon force

    Authors: P. Y. Wang, J. G. Li, S. Zhang, Q. Yuan, M. R. Xie, W. Zuo

    Abstract: Ab initio calculations have achieved remarkable success in nuclear structure studies. Numerous works highlight the pivotal role of three-body forces in nuclear ab initio calculations. Concurrently, efforts have been made to replicate these calculations using only realistic nucleon-nucleon (NN) interactions. A novel local chiral next-to-next-to-next-to-leading order (N3LO) NN interaction, distinct… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 9 pages, 7 figures

  12. arXiv:2406.07487  [pdf, other

    cs.CV

    GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

    Authors: Hang Yao, Ming Liu, Haolin Wang, Zhicun Yin, Zifei Yan, Xiaopeng Hong, Wangmeng Zuo

    Abstract: Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with dif… ▽ More

    Submitted 2 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ECCV 2024, code and models: https://github.com/hyao1/GLAD. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  13. arXiv:2406.05669  [pdf, other

    nucl-th

    Ab initio valence-space in-medium similarity renormalization group calculations for neutron-rich P, Cl, and K isotopes

    Authors: M. R. Xie, L. Y. Shen, J. G. Li, H. H. Li, Q. Yuan, W. Zuo

    Abstract: Neutron-rich P, Cl, and K isotopes, particularly those with neutron numbers around $N=28$, have attracted extensive experimental and theoretical interest. We utilize the \textit{ab initio} valence-space in-medium similarity renormalization group approach, based on chiral nucleon-nucleon and three-nucleon forces, to investigate the exotic properties of these isotopes. Systematic calculations of the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures, 1 table

  14. Spectroscopic factors of resonance states with the Gamow shell model

    Authors: M. R. Xie, J. G. Li, N. Michel, H. H. Li, W. Zuo

    Abstract: We provide an investigation of the spectroscopic factor of resonance states in $A =5-8$ nuclei, utilizing the Gamow shell model (GSM). Within the GSM, the configuration mixing is taken into account exactly with the shell model framework, and the continuum coupling is addressed via the complex-energy Berggren ensemble, which treats bound, resonance, and non-resonant continuum single-particle states… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 7 pages, 3 figures, 1 table

  15. arXiv:2406.01476  [pdf, other

    cs.CV

    DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

    Authors: Tianyu Huang, Yihan Zeng, Hui Li, Wangmeng Zuo, Rynson W. H. Lau

    Abstract: Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Technical report. Codes are released at: https://github.com/tyhuang0428/DreamPhysics

  16. arXiv:2405.20778  [pdf, other

    cs.CR cs.LG

    Improved Generation of Adversarial Examples Against Safety-aligned LLMs

    Authors: Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen

    Abstract: Despite numerous efforts to ensure large language models (LLMs) adhere to safety standards and produce harmless content, some successes have been achieved in bypassing these restrictions, known as jailbreak attacks against LLMs. Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing jailbreak attacks automatically. Nevertheless, due to the discrete… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  17. arXiv:2405.19732  [pdf, other

    cs.CV cs.CL cs.LG

    Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning

    Authors: Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo

    Abstract: Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.11750  [pdf, other

    astro-ph.GA

    The Intermediate-Mass Black Hole Reverberation Mapping Project: Initial Results for a candidate IMBH in a nearby Seyfert 1 Galaxy

    Authors: Wenwen Zuo, Hengxiao Guo, Jingbo Sun, Qi Yuan, Paulina Lira, Minfeng Gu, Philip G. Edwards, Alok C. Gupta, Shubham Kishore, Jamie Stevens, Tao An, Zhen-Yi Cai, Haicheng Feng, Luis C. Ho, Dragana Ilić, Andjelka B. Kovačević, ShaSha Li, Mar Mezcua, Luka Č. Popović, Mouyuan Sun, Tushar Tripathi, Vivian U., Oliver Vince, Jianguo Wang, Junxian Wang , et al. (3 additional authors not shown)

    Abstract: To investigate the short-term variability and determine the size of the optical continuum emitting size of intermediate-mass black holes (IMBHs), we carried out high-cadence, multi-band photometric monitoring of a Seyfert 1 galaxy J0249-0815 across two nights, together with a one-night single-band preliminary test. The presence of the broad Ha component in our target was confirmed by recent Paloma… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 14 pages, 6 figures, submitted to ApJ, comments welcome

  19. arXiv:2405.09799  [pdf, ps, other

    nucl-th

    Direct ab initio calculation of the $^{4}$He nuclear electric dipole polarizability

    Authors: Peng Yin, Andrey M. Shirokov, Pieter Maris, Patrick J. Fasano, Mark A. Caprio, He Li, Wei Zuo, James P. Vary

    Abstract: The calculation of nuclear electromagnetic sum rules by directly diagonalizing the nuclear Hamiltonian in a large basis is numerically challenging and has not been performed for $A>2$ nuclei. With the significant progress of high performance computing, we show that calculating sum rules using numerous discretized continuum states obtained by directly diagonalizing the ab initio no-core shell model… ▽ More

    Submitted 15 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  20. arXiv:2405.09555  [pdf, ps, other

    eess.SP

    Analysis of Near-Field Effects, Spatial Non-Stationary Characteristics Based on 11-15 GHz Channel Measurement in Indoor Scenario

    Authors: Haiyang Miao, Pan Tang, Weirang Zuo, Qi Wei, Lei Tian, Jianhua Zhang

    Abstract: In the sixth-generation (6G), with the further expansion of array element number and frequency bands, the wireless communications are expected to operate in the near-field region. The near-field radio communications (NFRC) will become crucial in 6G communication systems. The new mid-band (6-24 GHz) is the 6G potential candidate spectrum. In this paper, we will investigate the channel measurements… ▽ More

    Submitted 19 April, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2404.17270

  21. arXiv:2405.08589  [pdf, other

    cs.CV

    Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets

    Authors: Wei Lian, Zhesen Cui, Fei Ma, Hang Pan, Wangmeng Zuo

    Abstract: In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, t… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  22. arXiv:2405.05806  [pdf, other

    cs.CV

    MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

    Authors: Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, Wangmeng Zuo

    Abstract: Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they usually suffer from overfitting issues. The learned identity tends to entangle with irrelevant information… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: 34 pages

  23. arXiv:2405.02171  [pdf, other

    cs.CV

    Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

    Authors: Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, Wangmeng Zuo

    Abstract: In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accpted by IEEE TPAMI in 2024. Extended version of ECCV 2022 paper "Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations" (arXiv:2203.01325)

  24. arXiv:2404.17364  [pdf, other

    cs.CV

    MV-VTON: Multi-View Virtual Try-On with Diffusion Models

    Authors: Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo

    Abstract: The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing. However, most existing methods solely focus on the frontal try-on using the frontal clothing. When the views of the clothing and person are significantly inconsistent, particularly when the person's view is non-frontal, the results are unsatisfactory. To address this challenge,… ▽ More

    Submitted 29 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 15 pages

  25. arXiv:2404.17270  [pdf, other

    cs.IT eess.SP

    Empirical Studies of Propagation Characteristics and Modeling Based on XL-MIMO Channel Measurement: From Far-Field to Near-Field

    Authors: Haiyang Miao, Jianhua Zhang, Pan Tang, Lei Tian, Weirang Zuo, Qi Wei, Guangyi Liu

    Abstract: In the sixth-generation (6G), the extremely large-scale multiple-input-multiple-output (XL-MIMO) is considered a promising enabling technology. With the further expansion of array element number and frequency bands, near-field effects will be more likely to occur in 6G communication systems. The near-field radio communications (NFRC) will become crucial in 6G communication systems. It is known tha… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  26. arXiv:2404.16331  [pdf, other

    cs.CV cs.AI

    IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks

    Authors: Zitong Huang, Ze Chen, Bowen Dong, Chaoqi Liang, Erjin Zhou, Wangmeng Zuo

    Abstract: Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models. This paper first empirically finds that 1) the vanilla MWA can benefit the class-imbalanced learning, and 2) performing model averaging in the early epochs of training yields a greater performance improvement than doing that in later epochs. Inspired by these t… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  27. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  28. arXiv:2404.08514  [pdf, other

    cs.CV

    NIR-Assisted Image Denoising: A Selective Fusion Approach and A Real-World Benchmark Dataset

    Authors: Rongjian Xu, Zhilu Zhang, Renlong Wu, Wangmeng Zuo

    Abstract: Despite the significant progress in image denoising, it is still challenging to restore fine-scale details while removing noise, especially in extremely low-light environments. Leveraging near-infrared (NIR) images to assist visible RGB image denoising shows the potential to address this issue, becoming a promising technology. Nonetheless, existing works still struggle with taking advantage of NIR… ▽ More

    Submitted 18 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages

  29. arXiv:2404.07846  [pdf, other

    cs.CV eess.IV

    TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

    Authors: Junyi Li, Zhilu Zhang, Wangmeng Zuo

    Abstract: Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID). Existing BSNs are mostly conducted with convolution layers. Although transformers offer potential solutions to the limitations of convolutions and have demonstrated success in various image restoration tasks, their attention mechanisms may violate the blind-spot requirement, thus restrict… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  30. arXiv:2404.06451  [pdf, other

    cs.CV

    SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

    Authors: Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, Wangmeng Zuo

    Abstract: Human visual imagination usually begins with analogies or rough sketches. For example, given an image with a girl playing guitar before a building, one may analogously imagine how it seems like if Iron Man playing guitar before Pyramid in Egypt. Nonetheless, visual condition may not be precisely aligned with the imaginary result indicated by text prompt, and existing layout-controllable text-to-im… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  31. arXiv:2404.05580  [pdf, other

    cs.CV

    Responsible Visual Editing

    Authors: Minheng Ni, Yeli Shen, Lei Zhang, Wangmeng Zuo

    Abstract: With recent advancements in visual synthesis, there is a growing risk of encountering images with detrimental effects, such as hate, discrimination, or privacy violations. The research on transforming harmful images into responsible ones remains unexplored. In this paper, we formulate a new task, responsible visual editing, which entails modifying specific concepts within an image to render it mor… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 24 pages, 12 figures

  32. arXiv:2404.05268  [pdf, other

    cs.CV

    MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation

    Authors: Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, Wangmeng Zuo

    Abstract: Customized text-to-image generation aims to synthesize instantiations of user-specified concepts and has achieved unprecedented progress in handling individual concept. However, when extending to multiple customized concepts, existing methods exhibit limitations in terms of flexibility and fidelity, only accommodating the combination of limited types of models and potentially resulting in a mix of… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  33. arXiv:2404.04908  [pdf, other

    cs.CV

    Dual-Camera Smooth Zoom on Mobile Phones

    Authors: Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo

    Abstract: When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address th… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 24

  34. arXiv:2404.04833  [pdf, other

    cs.CV

    ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

    Authors: Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, Wangmeng Zuo

    Abstract: With the development of the large-scale diffusion model, Artificial Intelligence Generated Content (AIGC) techniques are popular recently. However, how to truly make it serve our daily lives remains an open question. To this end, in this paper, we focus on employing AIGC techniques in one filed of E-commerce marketing, i.e., generating hyper-realistic advertising images for displaying user-specifi… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 16 pages

  35. arXiv:2404.04317  [pdf, other

    stat.ML cs.LG q-bio.QM

    DeepLINK-T: deep learning inference for time series data using knockoffs and LSTM

    Authors: Wenxuan Zuo, Zifan Zhu, Yuxuan Du, Yi-Chun Yeh, Jed A. Fuhrman, Jinchi Lv, Yingying Fan, Fengzhu Sun

    Abstract: High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains un… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  36. arXiv:2403.11192  [pdf, other

    cs.CV

    Self-Supervised Video Desmoking for Laparoscopic Surgery

    Authors: Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, Wangmeng Zuo

    Abstract: Due to the difficulty of collecting real paired data, most existing desmoking methods train the models by synthesizing smoke, generalizing poorly to real surgical scenarios. Although a few works have explored single-image real-world desmoking in unpaired learning manners, they still encounter challenges in handling dense smoke. In this work, we address these issues together by introducing the self… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 28 pages

  37. arXiv:2403.07290  [pdf, other

    cs.CV

    Learning Hierarchical Color Guidance for Depth Map Super-Resolution

    Authors: Runmin Cong, Ronghui Sheng, Hao Wu, Yulan Guo, Yunchao Wei, Wangmeng Zuo, Yao Zhao, Sam Kwong

    Abstract: Color information is the most commonly used prior knowledge for depth map super-resolution (DSR), which can provide high-frequency boundary guidance for detail restoration. However, its role and functionality in DSR have not been fully developed. In this paper, we rethink the utilization of color information and propose a hierarchical color guidance network to achieve DSR. On the one hand, the low… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  38. arXiv:2403.05807  [pdf, other

    cs.CV eess.IV

    A self-supervised CNN for image watermark removal

    Authors: Chunwei Tian, Menghua Zheng, Tiancai Jiao, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin

    Abstract: Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal. However, watermarked images do not have reference images in the real world, which results in poor robustness of image watermark removal techniques. In this paper, we propose a self-supervised convolutional neural network (CNN) in image watermark removal (SWCNN). SWCNN uses a self-supervi… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  39. arXiv:2403.05438  [pdf, other

    cs.CV

    VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

    Authors: Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, Wangmeng Zuo

    Abstract: Text-to-image diffusion models (T2I) have demonstrated unprecedented capabilities in creating realistic and aesthetic images. On the contrary, text-to-video diffusion models (T2V) still lag far behind in frame quality and text alignment, owing to insufficient quality and quantity of training videos. In this paper, we introduce VideoElevator, a training-free and plug-and-play method, which elevates… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project page: https://videoelevator.github.io Code: https://github.com/YBYBZhang/VideoElevator

  40. arXiv:2403.05428  [pdf, other

    cs.MM

    Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

    Authors: Bingbing Wang, Bin Liang, Chun-Mei Feng, Wangmeng Zuo, Zhixin Bai, Shijue Huang, Kam-Fai Wong, Xi Zeng, Ruifeng Xu

    Abstract: In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, design… ▽ More

    Submitted 16 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  41. arXiv:2403.01852  [pdf, other

    cs.CV

    PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

    Authors: Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo, Kwan-Yee K. Wong

    Abstract: Recent advancements in large-scale pre-trained text-to-image models have led to remarkable progress in semantic image synthesis. Nevertheless, synthesizing high-quality images with consistent semantics and layout remains a challenge. In this paper, we propose the adaPtive LAyout-semantiC fusion modulE (PLACE) that harnesses pre-trained models to alleviate the aforementioned issues. Specifically, w… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  42. arXiv:2402.16674  [pdf, other

    cs.CV

    ConSept: Continual Semantic Segmentation via Adapter-based Vision Transformer

    Authors: Bowen Dong, Guanglei Yang, Wangmeng Zuo, Lei Zhang

    Abstract: In this paper, we delve into the realm of vision transformers for continual semantic segmentation, a problem that has not been sufficiently explored in previous literature. Empirical investigations on the adaptation of existing frameworks to vanilla ViT reveal that incorporating visual adapters into ViTs or fine-tuning ViTs with distillation terms is advantageous for enhancing the segmentation cap… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  43. arXiv:2402.15704  [pdf, other

    eess.IV cs.CV

    A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

    Authors: Chunwei Tian, Xuanyu Zhang, Jia Ren, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin

    Abstract: Convolutional neural networks can automatically learn features via deep network architectures and given input samples. However, robustness of obtained models may have challenges in varying scenes. Bigger differences of a network architecture are beneficial to extract more complementary structural information to enhance robustness of an obtained super-resolution model. In this paper, we present a h… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 11pages, 7 figures

  44. arXiv:2402.05044  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

    Authors: Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao

    Abstract: In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose \emph{SALAD-Bench}, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy s… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 Findings

  45. arXiv:2402.01166  [pdf, other

    cs.CV cs.AI

    A Comprehensive Survey on 3D Content Generation

    Authors: Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, Wangmeng Zuo, Junjun Jiang, Xianming Liu

    Abstract: Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e.g., text, image, video, audio and 3D. The 3D is the most close visual modality to real-world 3D environment and carries enormous knowledge. The 3D content generation shows both academic and practical values while also presenting formidable technical challenges. This… ▽ More

    Submitted 19 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: under review

  46. arXiv:2401.17138  [pdf, other

    nucl-th quant-ph

    Nuclear scattering via quantum computing

    Authors: Peiyan Wang, Weijie Du, Wei Zuo, James P. Vary

    Abstract: We propose a hybrid quantum-classical framework to solve the elastic scattering phase shift of two well-bound nuclei in an uncoupled channel. Within this framework, we develop a many-body formalism in which the continuum scattering states of the two colliding nuclei are regulated by a weak external harmonic oscillator potential with varying strength. Based on our formalism, we propose an approach… ▽ More

    Submitted 15 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: We welcome comments!

  47. arXiv:2401.01598  [pdf, other

    cs.CV

    Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning

    Authors: Zitong Huang, Ze Chen, Zhixing Chen, Erjin Zhou, Xinxing Xu, Rick Siow Mong Goh, Yong Liu, Wangmeng Zuo, Chunmei Feng

    Abstract: Few-shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes based on very limited training data without forgetting the old ones encountered. Existing studies solely relied on pure visual networks, while in this paper we solved FSCIL by leveraging the Vision-Language model (e.g., CLIP) and propose a simple yet effective framework, named Learning Prompt with Distribution-based… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  48. arXiv:2401.00766  [pdf, other

    cs.CV eess.IV

    Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks

    Authors: Zhilu Zhang, Shuohao Zhang, Renlong Wu, Zifei Yan, Wangmeng Zuo

    Abstract: It is highly desired but challenging to acquire high-quality photos with clear content in low-light environments. Although multi-image processing methods (using burst, dual-exposure, or multi-exposure images) have made significant progress in addressing this issue, they typically focus on specific restoration or enhancement problems, and do not fully explore the potential of utilizing multiple ima… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: 21 pages

  49. arXiv:2312.17334  [pdf, other

    cs.CV

    Improving Image Restoration through Removing Degradations in Textual Representations

    Authors: Jingbo Lin, Zhilu Zhang, Yuxiang Wei, Dongwei Ren, Dongsheng Jiang, Wangmeng Zuo

    Abstract: In this paper, we introduce a new perspective for improving image restoration by removing degradation in the textual representations of a given degraded image. Intuitively, restoration is much easier on text modality than image one. For example, it can be easily conducted by removing degradation-related words while keeping the content-aware words. Hence, we combine the advantages of images in deta… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  50. arXiv:2312.17051  [pdf, other

    cs.CV

    FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

    Authors: Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng Zuo

    Abstract: Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. While the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL faces limitations. These limitations arise from feature space misalignment and signif… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.