Skip to main content

Showing 1–50 of 111 results for author: Zhao, N

  1. arXiv:2406.16560  [pdf

    cs.SI physics.soc-ph

    GNNTAL:A Novel Model for Identifying Critical Nodes in Complex Networks

    Authors: Hao Wang, Ting Luo, Shuang-ping Yang, Ming Jing, Jian Wang, Na Zhao

    Abstract: Identification of critical nodes is a prominent topic in the study of complex networks. Numerous methods have been proposed, yet most exhibit inherent limitations. Traditional approaches primarily analyze specific structural features of the network; however, node influence is typically the result of a combination of multiple factors. Machine learning-based methods struggle to effectively represent… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.11311  [pdf, other

    cs.CV

    Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection

    Authors: Yunsong Wang, Na Zhao, Gim Hee Lee

    Abstract: The use of synthetic data in indoor 3D object detection offers the potential of greatly reducing the manual labor involved in 3D annotations and training effective zero-shot detectors. However, the complicated domain shifts across syn-to-real indoor datasets remains underexplored. In this paper, we propose a novel Object-wise Hierarchical Domain Alignment (OHDA) framework for syn-to-real unsupervi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.11283  [pdf, other

    cs.CV

    Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding

    Authors: Yunsong Wang, Na Zhao, Gim Hee Lee

    Abstract: The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets. However, it continues to be hindered by the lack of diverse, large-scale, real-world 3D scene datasets for source data. To address this shortfall, we propose Generalizable Representation Learning (GRL), where we devi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.09305  [pdf, other

    cs.CV

    Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun

    Abstract: In subject-driven text-to-image generation, recent works have achieved superior performance by training the model on synthetic datasets containing numerous image pairs. Trained on these datasets, generative models can produce text-aligned images for specific subject from arbitrary testing image in a zero-shot manner. They even outperform methods which require additional fine-tuning on testing imag… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2406.08152  [pdf, other

    cs.CV

    CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye

    Abstract: The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two fram… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  6. arXiv:2406.07670  [pdf

    cs.RO

    Design and Control of a Compact Series Elastic Actuator Module for Robots in MRI Scanners

    Authors: Binghan He, Naichen Zhao, David Y. Guo, Charles H. Paxson, Ronald S. Fearing

    Abstract: In this study, we introduce a novel MRI-compatible rotary series elastic actuator module utilizing velocity-sourced ultrasonic motors for force-controlled robots operating within MRI scanners. Unlike previous MRI-compatible SEA designs, our module incorporates a transmission force sensing series elastic actuator structure, with four off-the-shelf compression springs strategically placed between th… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2405.20195  [pdf, other

    cs.HC

    Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

    Authors: Zilin Ma, Susannah, Su, Nathan Zhao, Linn Bieske, Blake Bullwinkel, Yanyi Zhang, Sophia, Yang, Ziqing Luo, Siyao Li, Gekai Liao, Boxiang Wang, Jinglun Gao, Zihan Wen, Claude Bruderlein, Weiwei Pan

    Abstract: Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid d… ▽ More

    Submitted 30 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  8. arXiv:2405.16099  [pdf, other

    cs.CV

    Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation

    Authors: Huizhou Chen, Jiangyi Wang, Yuxin Li, Na Zhao, Jun Cheng, Xulei Yang

    Abstract: 3D environment recognition is essential for autonomous driving systems, as autonomous vehicles require a comprehensive understanding of surrounding scenes. Recently, the predominant approach to define this real-life problem is through 3D occupancy prediction. It attempts to predict the occupancy states and semantic labels for all voxels in 3D space, which enhances the perception capability. Birds-… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures, accepted by IEEE CAI 2024

  9. arXiv:2405.15217  [pdf, other

    cs.CV cs.GR

    NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

    Authors: Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukac

    Abstract: The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics. Unfortunately due to their variable structure and scarcity of vector training data, directly applying diffusion models on this domain remains a challenging problem. Using workarounds like optimization… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  10. Last-Level Cache Side-Channel Attacks Are Feasible in the Modern Public Cloud (Extended Version)

    Authors: Zirui Neil Zhao, Adam Morrison, Christopher W. Fletcher, Josep Torrellas

    Abstract: Last-level cache side-channel attacks have been mostly demonstrated in highly-controlled, quiescent local environments. Hence, it is unclear whether such attacks are feasible in a production cloud environment. In the cloud, side channels are flooded with noise from activities of other tenants and, in Function-as-a-Service (FaaS) workloads, the attacker has a very limited time window to mount the a… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Journal ref: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2024), Volume 2, pages 582-600, La Jolla, CA, USA, May 2024

  11. arXiv:2405.10317  [pdf, other

    cs.CV cs.GR

    Text-to-Vector Generation with Neural Path Representation

    Authors: Peiying Zhang, Nanxuan Zhao, Jing Liao

    Abstract: Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making it a time-consuming task. Recent advancements in text-to-vector (T2V) generation have aimed to make this process more accessible. However, existing T2V methods… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGGRAPH 2024. Project page: https://intchous.github.io/T2V-NPR

  12. arXiv:2404.19702  [pdf, other

    cs.CV

    GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

    Authors: Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu

    Abstract: We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian para… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project webpage: https://sai-bi.github.io/project/gs-lrm/

  13. arXiv:2404.13522  [pdf, other

    cs.AI cs.LG stat.ML

    Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective

    Authors: Ningsheng Zhao, Jia Yuan Yu, Krzysztof Dzieciolowski, Trang Bui

    Abstract: Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outp… ▽ More

    Submitted 29 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  14. arXiv:2404.05717  [pdf, other

    cs.CV cs.AI

    SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

    Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

    Abstract: Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, w… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 18 pages, 16 figures, 3 tables

  15. arXiv:2403.11868  [pdf, other

    cs.GR cs.CV

    View-Consistent 3D Editing with Gaussian Splatting

    Authors: Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

    Abstract: The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance imag… ▽ More

    Submitted 4 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: accepted to ECCV 2024

  16. arXiv:2403.00644  [pdf, other

    cs.CV

    Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

    Authors: Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W. H. Lau

    Abstract: Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity result… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024. Replaced some celebrity images to avoid copyright disputes

  17. arXiv:2403.00095  [pdf

    cs.CY physics.soc-ph

    Solving Jigsaw Puzzles using Iterative Random Sampling: Parallels with Development of Skill Mastery

    Authors: Neil Zhao, Diana Zheng

    Abstract: Skill mastery is a priority for success in all fields. We present a parallel between the development of skill mastery and the process of solving jigsaw puzzles. We show that iterative random sampling solves jigsaw puzzles in two phases: a lag phase that is characterized by little change and occupies the majority of the time, and a growth phase that marks rapid and imminent puzzle completion. Chang… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 26 pages, 15 figures, 1 table

  18. arXiv:2402.03549  [pdf

    cs.CV

    AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising

    Authors: Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali Mahdavi-Amiri, Hao Zhang

    Abstract: We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters. Our goal is to accurately transfer motions from a 2D driving video onto a source character, with its identity, in terms of appearance and natural movement, well preserved, even when there may be significant discrepancies between the source and driving c… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  19. arXiv:2401.15560  [pdf

    cs.IT cs.CL

    An Analysis of Letter Dynamics in the English Alphabet

    Authors: Neil Zhao, Diana Zheng

    Abstract: The frequency with which the letters of the English alphabet appear in writings has been applied to the field of cryptography, the development of keyboard mechanics, and the study of linguistics. We expanded on the statistical analysis of the English alphabet by examining the average frequency which each letter appears in different categories of writings. We evaluated news articles, novels, plays,… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: 22 pages, 6 figures, 5 tables

    MSC Class: 94A15

  20. arXiv:2401.12023  [pdf

    cs.DM

    A Simulation of Optimal Dryness When Moving in the Rain or Snow Using MATLAB

    Authors: Neil Zhao, Emilee Brockner, Asia Winslow, Megan Seraydarian

    Abstract: The classic question of whether one should walk or run in the rain to remain the least wet has inspired a myriad of solutions ranging from physically performing test runs in raining conditions to mathematically modeling human movement through rain. This manuscript approaches the classical problem by simulating movement through rainfall using MATLAB. Our simulation was generalizable to include snow… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 15 pages, 9 figures

    MSC Class: 68U20

  21. arXiv:2401.05011  [pdf, other

    cs.CV

    Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

    Authors: Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang

    Abstract: Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Code is available at https://github.com/tingxueronghua/DPKE

  22. arXiv:2312.14216  [pdf, other

    cs.CV

    DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

    Authors: Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge

    Abstract: The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  23. arXiv:2312.11306  [pdf

    eess.SY cs.RO

    Human-machine cooperation: optimization of drug retrieval sequencing in automated drug dispensing systems

    Authors: Mengge Yuan, Kan Wu, Ning Zhao

    Abstract: Automated drug dispensing systems (ADDSs) are increasingly in demand in today's pharmacies, primarily driven by the growing ageing population. Recognizing the practical challenges faced by pharmacies implementing ADDSs, this study aims to optimize the layout design and sequencing issues within a human-machine cooperation environment to enhance the system throughput of ADDSs. Specifically, we devel… ▽ More

    Submitted 16 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  24. arXiv:2312.10078  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Early ChatGPT User Portrait through the Lens of Data

    Authors: Yuyang Deng, Ni Zhao, Xin Huang

    Abstract: Since its launch, ChatGPT has achieved remarkable success as a versatile conversational AI platform, drawing millions of users worldwide and garnering widespread recognition across academic, industrial, and general communities. This paper aims to point a portrait of early GPT users and understand how they evolved. Specific questions include their topics of interest and their potential careers; and… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 6 pages, 5 figures, 2023 IEEE International Conference on Big Data (BigData), to be published

    Report number: SP02207

  25. DECLASSIFLOW: A Static Analysis for Modeling Non-Speculative Knowledge to Relax Speculative Execution Security Measures (Full Version)

    Authors: Rutvik Choudhary, Alan Wang, Zirui Neil Zhao, Adam Morrison, Christopher W. Fletcher

    Abstract: Speculative execution attacks undermine the security of constant-time programming, the standard technique used to prevent microarchitectural side channels in security-sensitive software such as cryptographic code. Constant-time code must therefore also deploy a defense against speculative execution attacks to prevent leakage of secret data stored in memory or the processor registers. Unfortunately… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Journal ref: In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23). Association for Computing Machinery, New York, NY, USA, 2053-2067

  26. arXiv:2312.06488  [pdf, other

    cs.CR

    Performance-lossless Black-box Model Watermarking

    Authors: Na Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu

    Abstract: With the development of deep learning, high-value and high-cost models have become valuable assets, and related intellectual property protection technologies have become a hot topic. However, existing model watermarking work in black-box scenarios mainly originates from training-based backdoor methods, which probably degrade primary task performance. To address this, we propose a branch backdoor-b… ▽ More

    Submitted 14 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  27. arXiv:2311.15302  [pdf

    math.OC cs.RO

    A Quick Response Algorithm for Dynamic Autonomous Mobile Robot Routing Problem with Time Windows

    Authors: Lulu Cheng, Ning Zhao, Mengge Yuan, Kan Wu

    Abstract: This paper investigates the optimization problem of scheduling autonomous mobile robots (AMRs) in hospital settings, considering dynamic requests with different priorities. The primary objective is to minimize the daily service cost by dynamically planning routes for the limited number of available AMRs. The total cost consists of AMR's purchase cost, transportation cost, delay penalty cost, and l… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  28. arXiv:2311.11574  [pdf, other

    cs.IT

    A Framework on Complex Matrix Derivatives with Special Structure Constraints for Wireless Systems

    Authors: Xin Ju, Shiqi Gong, Nan Zhao, Chengwen Xing, Arumugam Nallanathan, Dusit Niyato

    Abstract: Matrix-variate optimization plays a central role in advanced wireless system designs. In this paper, we aim to explore optimal solutions of matrix variables under two special structure constraints using complex matrix derivatives, including diagonal structure constraints and constant modulus constraints, both of which are closely related to the state-of-the-art wireless applications. Specifically,… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  29. arXiv:2310.17327  [pdf, ps, other

    cs.IT eess.SP

    Near-Field Positioning and Attitude Sensing Based on Electromagnetic Propagation Modeling

    Authors: Ang Chen, Li Chen, Yunfei Chen, Nan Zhao, Changsheng You

    Abstract: Positioning and sensing over wireless networks are imperative for many emerging applications. However, since traditional wireless channel models over-simplify the user equipment (UE) as a point target, they cannot be used for sensing the attitude of the UE, which is typically described by the spatial orientation. In this paper, a comprehensive electromagnetic propagation modeling (EPM) based on el… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 19 pages, 13 figures. Accepted by IEEE Journal on Selected Areas in Communications

  30. arXiv:2310.13730  [pdf, other

    cs.CV

    Localizing and Editing Knowledge in Text-to-Image Generative Models

    Authors: Samyadeep Basu, Nanxuan Zhao, Vlad Morariu, Soheil Feizi, Varun Manjunatha

    Abstract: Text-to-Image Diffusion Models such as Stable-Diffusion and Imagen have achieved unprecedented quality of photorealism with state-of-the-art FID scores on MS-COCO and other generation benchmarks. Given a caption, image generation requires fine-grained knowledge about attributes such as object structure, style, and viewpoint amongst others. Where does this information reside in text-to-image genera… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 61 pages

  31. arXiv:2309.12318  [pdf

    cs.RO math.OC

    Stochastic scheduling of autonomous mobile robots at hospitals

    Authors: Lulu Cheng, Ning Zhao, Mengge Yuan, Kan Wu

    Abstract: This paper studies the scheduling of autonomous mobile robots (AMRs) at hospitals where the stochastic travel times and service times of AMRs are affected by the surrounding environment. The routes of AMRs are planned to minimize the daily cost of the hospital (including the AMR fixed cost, penalty cost of violating the time window, and transportation cost). To efficiently generate high-quality so… ▽ More

    Submitted 23 November, 2023; v1 submitted 30 July, 2023; originally announced September 2023.

  32. arXiv:2309.12302  [pdf, other

    cs.CV cs.GR

    Text-Guided Vector Graphics Customization

    Authors: Peiying Zhang, Nanxuan Zhao, Jing Liao

    Abstract: Vector graphics are widely used in digital art and valued by designers for their scalability and layer-wise topological properties. However, the creation and editing of vector graphics necessitate creativity and design expertise, leading to a time-consuming process. In this paper, we propose a novel pipeline that generates high-quality customized vector graphics based on textual prompts while pres… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted by SIGGRAPH Asia 2023. Project page: https://intchous.github.io/SVGCustomization

  33. arXiv:2309.11228  [pdf, other

    cs.CV

    Towards Robust Few-shot Point Cloud Semantic Segmentation

    Authors: Yating Xu, Na Zhao, Gim Hee Lee

    Abstract: Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples. However, the noise-free assumption in the support set can be easily violated in many practical real-world settings. In this paper, we focus on improving the robustness of few-shot point cloud segmentation under the detrimental influence of noisy suppor… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: BMVC 2023

  34. arXiv:2309.11222  [pdf, other

    cs.CV

    Generalized Few-Shot Point Cloud Segmentation Via Geometric Words

    Authors: Yating Xu, Conghui Hu, Na Zhao, Gim Hee Lee

    Abstract: Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes. Few-shot point cloud segmentation algorithms address this problem by learning to adapt to new classes at the sacrifice of segmentation accuracy for the base classes, which severely impedes its practicality. This largely motivates us to present the first attempt at a more… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  35. arXiv:2309.05956  [pdf, other

    cs.CV

    Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation

    Authors: Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet

    Abstract: We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-to-image synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach1 decouples training data generation into foreground object generation, and contextually coherent background generation. To generate foreground objects, we employ a straightforward textual template,… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Code in https://github.com/gyhandy/Text2Image-for-Detection

  36. arXiv:2308.12163  [pdf, other

    cs.CV

    NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

    Authors: Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He

    Abstract: Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive non-photorealistic videos with eye fixation (\ie, saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset fo… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  37. arXiv:2308.06928  [pdf, other

    cs.RO

    Refining 6-DoF Grasps with Context-Specific Classifiers

    Authors: Tasbolat Taunyazov, Heng Zhang, John Patrick Eala, Na Zhao, Harold Soh

    Abstract: In this work, we present GraspFlow, a refinement approach for generating context-specific grasps. We formulate the problem of grasp synthesis as a sampling problem: we seek to sample from a context-conditioned probability distribution of successful grasps. However, this target distribution is unknown. As a solution, we devise a discriminator gradient-flow method to evolve grasps obtained from a si… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: IROS 2023, Code and Datasets are available at https://github.com/tasbolat1/graspflow

  38. arXiv:2308.03059  [pdf, other

    cs.CV cs.AI cs.GR

    Language-based Photo Color Adjustment for Graphic Designs

    Authors: Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau

    Abstract: Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuiti… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 15 pages, 19 figures. Accepted by SIGGRAPH 2023. Project page: https://zhenwwang.github.io/langrecol

  39. arXiv:2308.00543  [pdf, other

    cs.IT eess.SP

    On the Performance Tradeoff of an ISAC System with Finite Blocklength

    Authors: Xiao Shen, Na Zhao, Yuan Shen

    Abstract: Integrated sensing and communication (ISAC) has been proposed as a promising paradigm in the future wireless networks, where the spectral and hardware resources are shared to provide a considerable performance gain. It is essential to understand how sensing and communication (S\&C) influences each other to guide the practical algorithm and system design in ISAC. In this paper, we investigate the p… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted by ICC 2023

  40. arXiv:2307.16215  [pdf

    cs.RO math.OC

    The Multi-Trip Autonomous Mobile Robot Scheduling Problem with Time Windows in a Stochastic Environment at Smart Hospitals

    Authors: Lulu Cheng, Ning Zhao, Kan Wu, Zhibin Chen

    Abstract: Autonomous mobile robots (AMRs) play a crucial role in transportation and service tasks at hospitals, contributing to enhanced efficiency and meeting medical demands. This paper investigates the optimization problem of scheduling strategies for AMRs at smart hospitals, where the service and travel times of AMRs are stochastic. A stochastic mixed-integer programming model is formulated to minimize… ▽ More

    Submitted 23 November, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

  41. arXiv:2307.08631  [pdf, ps, other

    cs.IT eess.SP

    Dual-Functional MIMO Beamforming Optimization for RIS-Aided Integrated Sensing and Communication

    Authors: Xin Zhao, Heng Liu, Shiqi Gong, Xin Ju, Chengwen Xing, Nan Zhao

    Abstract: Aiming at providing wireless communication systems with environment-perceptive capacity, emerging integrated sensing and communication (ISAC) technologies face multiple difficulties, especially in balancing the performance trade-off between the communication and radar functions. In this paper, we introduce a reconfigurable intelligent surface (RIS) to assist both data transmission and target detec… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 30 pages, 8 figures, manuscript submitted to IEEE TCOM

  42. arXiv:2305.18286  [pdf, other

    cs.CV cs.AI

    Photoswap: Personalized Subject Swapping in Images

    Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

    Abstract: In an era where images and visual content dominate our digital landscape, the ability to manipulate and personalize these images has become a necessity. Envision seamlessly substituting a tabby cat lounging on a sunlit window sill in a photograph with your own playful puppy, all while preserving the original charm and composition of the image. We present Photoswap, a novel approach that enables th… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 14 pages

  43. FashionTex: Controllable Virtual Try-on with Text and Texture

    Authors: Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang, Xiaoguang Han

    Abstract: Virtual try-on attracts increasing research attention as a promising way for enhancing the user experience for online cloth shopping. Though existing methods can generate impressive results, users need to provide a well-designed reference image containing the target fashion clothes that often do not exist. To support user-friendly fashion customization in full-body portraits, we propose a multi-mo… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to SIGGRAPH 2023 (Conference Proceedings)

  44. arXiv:2303.14001  [pdf, other

    cs.CV

    Grid-guided Neural Radiance Fields for Large Urban Scenes

    Authors: Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin

    Abstract: Purely MLP-based neural radiance fields (NeRF-based methods) often suffer from underfitting with blurred renderings on large-scale scenes due to limited model capacity. Recent approaches propose to geographically divide the scene and adopt multiple sub-NeRFs to model each region individually, leading to linear scale-up in training costs and the number of sub-NeRFs as the scene expands. An alternat… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR2023, Project page at https://city-super.github.io/gridnerf/

  45. arXiv:2303.13953  [pdf, other

    cs.CV cs.AI

    AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation

    Authors: Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin

    Abstract: Both indoor and outdoor environments are inherently structured and repetitive. Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice. Inspired by this observation, we propose AssetField, a novel neural scene representation that learns a set of object-aware ground feature planes to represent the scene, where an… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Project page can be found in https://city-super.github.io/assetfield/

  46. arXiv:2303.13511  [pdf, other

    cs.CV cs.AI cs.LG

    Neural Preset for Color Style Transfer

    Authors: Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W. H. Lau

    Abstract: In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Mapping (DNCM) to consistently operate on each pixel via an image-adaptive color mapping matrix, avoiding ar… ▽ More

    Submitted 24 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Project page with demos: https://zhkkke.github.io/NeuralPreset . Artifact-free real-time 4K color style transfer via AI-generated presets. CVPR 2023

  47. arXiv:2212.09068  [pdf, other

    cs.CV

    Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

    Authors: Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee

    Abstract: Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to the poor generalization ability, which limits the real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we pro… ▽ More

    Submitted 24 November, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accepted by IJCV. Journal extension of arXiv:2204.02548. Code is available at https://github.com/HeliosZhao/SHADE-VisualDG

  48. arXiv:2212.07629  [pdf, other

    cs.CV

    EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation

    Authors: Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet

    Abstract: We propose EM-PASTE: an Expectation Maximization(EM) guided Cut-Paste compositional dataset augmentation approach for weakly-supervised instance segmentation using only image-level supervision. The proposed method consists of three main components. The first component generates high-quality foreground object masks. To this end, an EM-like approach is proposed that iteratively refines an initial se… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: 15 pages (including appendix), 7 figures

  49. arXiv:2212.04668  [pdf, other

    cs.CV

    Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds

    Authors: Yuyang Zhao, Na Zhao, Gim Hee Lee

    Abstract: Semantic segmentation in 3D indoor scenes has achieved remarkable performance under the supervision of large-scale annotated data. However, previous works rely on the assumption that the training and testing data are of the same distribution, which may suffer from performance degradation when evaluated on the out-of-distribution scenes. To alleviate the annotation cost and the performance degradat… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  50. arXiv:2212.02084  [pdf, other

    cs.SD eess.AS

    End-to-end Recording Device Identification Based on Deep Representation Learning

    Authors: Chunyan Zeng, Dongliang Zhu, Zhifeng Wang, Minghu Wu, Wei Xiong, Nan Zhao

    Abstract: Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recordin… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: 20 pages, 5 figures, recording device identification