subscribe to arXiv mailings

Spin-valley-locked Electroluminescence for High-Performance Circularly-Polarized Organic Light-Emitting Diodes

Authors: Yibo Deng, Teng Long, Pingyang Wang, Han Huang, Zijian Deng, Chunling Gu, Cunbin An, Bo Liao, Guillaume Malpuech, Dmitry Solnyshkov, Hongbing Fu, Qing Liao

Abstract: Circularly polarized (CP) organic light-emitting diodes (OLEDs) have attracted attention in potential applications including novel display and photonic technologies. However, conventional approaches cannot meet the requirements of device performance, such as high dissymmetry factor, high directionality, narrowband emission, simplified device structure and low costs. Here, we demonstrate spin-valle… ▽ More Circularly polarized (CP) organic light-emitting diodes (OLEDs) have attracted attention in potential applications including novel display and photonic technologies. However, conventional approaches cannot meet the requirements of device performance, such as high dissymmetry factor, high directionality, narrowband emission, simplified device structure and low costs. Here, we demonstrate spin-valley-locked CP-OLEDs without chiral emitters, but based on photonic spin-orbit coupling, where photons with opposite CP characteristics are emitted from different optical valleys. These spin-valley locked OLEDs exhibit a narrowband emission of 16 nm, a high EQE of 3.65, a maximum luminance of near 98000 cd/m2 and a gEL of up to 1.80, which are among the best performances of active single-crystal CP-OLEDs, achieved with a simple device structure. This strategy opens an avenue for practical applications towards three-dimensional displays and on-chip CP-OLEDs. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.06584 [pdf, other]

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2407.05918 [pdf, other]

In vacuum metasurface for optical microtrap array

Authors: Donghao Li, Qiming Liao, Beining Xu, Yaoting Zhou, Keyu Qin, Zhongxiao Xu, Heng Shen, Lingling Huang

Abstract: Optical tweezer arrays of laser-cooled and individual controlled particles have revolutionized the atomic, molecular and optical physics, and they afford exquisite capabilities for applications in quantum simulation of many-body physics, quantum computation and quantum sensing. Underlying this development is the technical maturity of generating scalable optical beams, enabled by active components… ▽ More Optical tweezer arrays of laser-cooled and individual controlled particles have revolutionized the atomic, molecular and optical physics, and they afford exquisite capabilities for applications in quantum simulation of many-body physics, quantum computation and quantum sensing. Underlying this development is the technical maturity of generating scalable optical beams, enabled by active components and high numerical aperture objective. However, such a complex combination of bulk optics outside the vacuum chamber is very sensitive to any vibration and drift. Here we demonstrate the generation of 3*3 static tweezer array with a single chip-scale multifunctional metasurface element in vacuum, replacing the meter-long free space optics. Fluorescence counts on the camera validates the successfully trapping of the atomic ensemble array. Further, we discuss the strategy to achieve low scattering and crosstalk, where a metasurface design featuring dual-wavelength independent control is included. Our results, together with other recent development in integrated photonics for cold atoms, could pave the way for compact and portable quantum sensors and simulators in platforms of neutral atom arrays. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.08723 [pdf, other]

ECBD: Evidence-Centered Benchmark Design for NLP

Authors: Yu Lu Liu, Su Lin Blodgett, Jackie Chi Kit Cheung, Q. Vera Liao, Alexandra Olteanu, Ziang Xiao

Abstract: Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity… ▽ More Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity of the benchmark's measurements. To address this gap, we draw on evidence-centered design in educational assessments and propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. ECBD specifies the role each module plays in helping practitioners collect evidence about capabilities of interest. Specifically, each module requires benchmark designers to describe, justify, and support benchmark design choices -- e.g., clearly specifying the capabilities the benchmark aims to measure or how evidence about those capabilities is collected from model responses. To demonstrate the use of ECBD, we conduct case studies with three benchmarks: BoolQ, SuperGLUE, and HELM. Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.19618 [pdf, other]

Spectral multiplexing based on multi-distance lensless imaging

Authors: Qijun You, Lingshuo Meng, Yun Gao, Qing Liao, Wei Cao, Peixiang Lu

Abstract: We have demonstrated the capability of spectral multiplexing in multi-distance diffractive imaging, enabling the reconstruction of samples with diverse spectral responses. While previous methods like ptychography utilize redundancy in radial diffraction data to achieve information multiplexing, they typically require capturing a substantial amount of diffraction data. In contrast, our approach eff… ▽ More We have demonstrated the capability of spectral multiplexing in multi-distance diffractive imaging, enabling the reconstruction of samples with diverse spectral responses. While previous methods like ptychography utilize redundancy in radial diffraction data to achieve information multiplexing, they typically require capturing a substantial amount of diffraction data. In contrast, our approach effectively harnesses the redundancy information in axial diffraction data. This significantly reduces the amount of diffraction data required and relaxes the stringent requirements on optical path stability. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 4 pages, 5 figures

arXiv:2405.19609 [pdf, other]

SMPLX-Lite: A Realistic and Drivable Avatar Benchmark with Rich Geometry and Texture Annotations

Authors: Yujiao Jiang, Qingmin Liao, Zhaolong Wang, Xiangru Lin, Zongqing Lu, Yuxi Zhao, Hanqing Wei, Jingrui Ye, Yu Zhang, Zhijing Shao

Abstract: Recovering photorealistic and drivable full-body avatars is crucial for numerous applications, including virtual reality, 3D games, and tele-presence. Most methods, whether reconstruction or generation, require large numbers of human motion sequences and corresponding textured meshes. To easily learn a drivable avatar, a reasonable parametric body model with unified topology is paramount. However,… ▽ More Recovering photorealistic and drivable full-body avatars is crucial for numerous applications, including virtual reality, 3D games, and tele-presence. Most methods, whether reconstruction or generation, require large numbers of human motion sequences and corresponding textured meshes. To easily learn a drivable avatar, a reasonable parametric body model with unified topology is paramount. However, existing human body datasets either have images or textured models and lack parametric models which fit clothes well. We propose a new parametric model SMPLX-Lite-D, which can fit detailed geometry of the scanned mesh while maintaining stable geometry in the face, hand and foot regions. We present SMPLX-Lite dataset, the most comprehensive clothing avatar dataset with multi-view RGB sequences, keypoints annotations, textured scanned meshes, and textured SMPLX-Lite-D models. With the SMPLX-Lite dataset, we train a conditional variational autoencoder model that takes human pose and facial keypoints as input, and generates a photorealistic drivable human avatar. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: ICME 2024;Project page: https://alex-jyj.github.io/SMPLX-Lite/

arXiv:2405.17037 [pdf, other]

BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Authors: Zongkai Zhang, Zidong Xu, Wenming Yang, Qingmin Liao, Jing-Hao Xue

Abstract: Existing 3D occupancy networks demand significant hardware resources, hindering the deployment of edge devices. Binarized Neural Networks (BNN) offer substantially reduced computational and memory requirements. However, their performance decreases notably compared to full-precision networks. Moreover, it is challenging to enhance the performance of binarized models by increasing the number of bina… ▽ More Existing 3D occupancy networks demand significant hardware resources, hindering the deployment of edge devices. Binarized Neural Networks (BNN) offer substantially reduced computational and memory requirements. However, their performance decreases notably compared to full-precision networks. Moreover, it is challenging to enhance the performance of binarized models by increasing the number of binarized convolutional layers, which limits their practicability for 3D occupancy prediction. To bridge these gaps, we propose a novel binarized deep convolution (BDC) unit that effectively enhances performance while increasing the number of binarized convolutional layers. Firstly, through theoretical analysis, we demonstrate that 1 \times 1 binarized convolutions introduce minimal binarization errors. Therefore, additional binarized convolutional layers are constrained to 1 \times 1 in the BDC unit. Secondly, we introduce the per-channel weight branch to mitigate the impact of binarization errors from unimportant channel features on the performance of binarized models, thereby improving performance while increasing the number of binarized convolutional layers. Furthermore, we decompose the 3D occupancy network into four convolutional modules and utilize the proposed BDC unit to binarize these modules. Our BDC-Occ model is created by applying the proposed BDC unit to binarize the existing 3D occupancy networks. Comprehensive quantitative and qualitative experiments demonstrate that the proposed BDC-Occ is the state-of-the-art binarized 3D occupancy network algorithm. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 19 pages, 8 figures

arXiv:2405.11764 [pdf, other]

doi 10.1145/3626772.3657802

Modeling User Fatigue for Sequential Recommendation

Authors: Nian Li, Xin Ban, Cheng Ling, Chen Gao, Lantao Hu, Peng Jiang, Kun Gai, Yong Li, Qingmin Liao

Abstract: Recommender systems filter out information that meets user interests. However, users may be tired of the recommendations that are too similar to the content they have been exposed to in a short historical period, which is the so-called user fatigue. Despite the significance for a better user experience, user fatigue is seldom explored by existing recommenders. In fact, there are three main challen… ▽ More Recommender systems filter out information that meets user interests. However, users may be tired of the recommendations that are too similar to the content they have been exposed to in a short historical period, which is the so-called user fatigue. Despite the significance for a better user experience, user fatigue is seldom explored by existing recommenders. In fact, there are three main challenges to be addressed for modeling user fatigue, including what features support it, how it influences user interests, and how its explicit signals are obtained. In this paper, we propose to model user Fatigue in interest learning for sequential Recommendations (FRec). To address the first challenge, based on a multi-interest framework, we connect the target item with historical items and construct an interest-aware similarity matrix as features to support fatigue modeling. Regarding the second challenge, built upon feature cross, we propose a fatigue-enhanced multi-interest fusion to capture long-term interest. In addition, we develop a fatigue-gated recurrent unit for short-term interest learning, with temporal fatigue representations as important inputs for constructing update and reset gates. For the last challenge, we propose a novel sequence augmentation to obtain explicit fatigue signals for contrastive learning. We conduct extensive experiments on real-world datasets, including two public datasets and one large-scale industrial dataset. Experimental results show that FRec can improve AUC and GAUC up to 0.026 and 0.019 compared with state-of-the-art models, respectively. Moreover, large-scale online experiments demonstrate the effectiveness of FRec for fatigue reduction. Our codes are released at https://github.com/tsinghua-fib-lab/SIGIR24-FRec. △ Less

Submitted 22 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

Comments: SIGIR 2024

arXiv:2405.11233 [pdf, other]

Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code

Authors: Yujia Chen, Cuiyun Gao, Zezhou Yang, Hongyu Zhang, Qing Liao

Abstract: In the field of code intelligence, effectively modeling long-range code poses a significant challenge. Existing pre-trained language models (PLMs) such as UniXcoder have achieved remarkable success, but they still face difficulties with long code inputs. This is mainly due to their limited capacity to maintain contextual continuity and memorize the key information over long-range code. To alleviat… ▽ More In the field of code intelligence, effectively modeling long-range code poses a significant challenge. Existing pre-trained language models (PLMs) such as UniXcoder have achieved remarkable success, but they still face difficulties with long code inputs. This is mainly due to their limited capacity to maintain contextual continuity and memorize the key information over long-range code. To alleviate the difficulties, we propose EXPO, a framework for EXtending Pre-trained language models for lOng-range code. EXPO incorporates two innovative memory mechanisms we propose in this paper: Bridge Memory and Hint Memory. Bridge Memory uses a tagging mechanism to connect disparate snippets of long-range code, helping the model maintain contextual coherence. Hint Memory focuses on crucial code elements throughout the global context, such as package imports, by integrating a kNN attention layer to adaptively select the relevant code elements. This dual-memory approach bridges the gap between understanding local code snippets and maintaining global code coherence, thereby enhancing the model overall comprehension of long code sequences. We validate the effectiveness of EXPO on five popular pre-trained language models such as UniXcoder and two code intelligence tasks including API recommendation and vulnerability detection. Experimental results demonstrate that EXPO significantly improves the pre-training language models. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: Accepted by ISSTA 2024

arXiv:2405.02810 [pdf, other]

Adaptive deep density approximation for stochastic dynamical systems

Authors: Junjie He, Qifeng Liao, Xiaoliang Wan

Abstract: In this paper we consider adaptive deep neural network approximation for stochastic dynamical systems. Based on the Liouville equation associated with the stochastic dynamical systems, a new temporal KRnet (tKRnet) is proposed to approximate the probability density functions (PDFs) of the state variables. The tKRnet gives an explicit density model for the solution of the Liouville equation, which… ▽ More In this paper we consider adaptive deep neural network approximation for stochastic dynamical systems. Based on the Liouville equation associated with the stochastic dynamical systems, a new temporal KRnet (tKRnet) is proposed to approximate the probability density functions (PDFs) of the state variables. The tKRnet gives an explicit density model for the solution of the Liouville equation, which alleviates the curse of dimensionality issue that limits the application of traditional grid based numerical methods. To efficiently train the tKRnet, an adaptive procedure is developed to generate collocation points for the corresponding residual loss function, where samples are generated iteratively using the approximate density function at each iteration. A temporal decomposition technique is also employed to improve the long-time integration. Theoretical analysis of our proposed method is provided, and numerical examples are presented to demonstrate its performance. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 24 pages, 13 figures

MSC Class: 34F05; 60H35; 62M45; 65C30

arXiv:2405.00623 [pdf, other]

doi 10.1145/3630106.3658941

"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust

Authors: Sunnie S. Y. Kim, Q. Vera Liao, Mihaela Vorvoreanu, Stephanie Ballard, Jennifer Wortman Vaughan

Abstract: Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We ex… ▽ More Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., "I'm not sure, but...") decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., "It's not clear, but..."), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale. △ Less

Submitted 15 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted to FAccT 2024. This version includes the appendix

arXiv:2405.00297 [pdf, ps, other]

Generalized Cayley graphs of complete groups

Authors: Qianfen Liao, Liu Weijun

Abstract: A group $G$ is complete group if it satisfies $Z(G)=e$ and $Aut(G)=Inn(G)$. In this paper, on the one hand, we study the basic properties of generalized Cayley graphs and characterize two classes isomorphic generalized generalized Cayley graphs of complete groups. On the other hand, we give the sufficient and necessary conditions of complete group to be $GCI$ group and restricted $GCI$ group. As… ▽ More A group $G$ is complete group if it satisfies $Z(G)=e$ and $Aut(G)=Inn(G)$. In this paper, on the one hand, we study the basic properties of generalized Cayley graphs and characterize two classes isomorphic generalized generalized Cayley graphs of complete groups. On the other hand, we give the sufficient and necessary conditions of complete group to be $GCI$ group and restricted $GCI$ group. As an application, we complete the classification of restricted $GCI$-groups for symmetric groups. △ Less

Submitted 6 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.08449 [pdf, other]

OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

Authors: Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu

Abstract: Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day t… ▽ More Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes. △ Less

Submitted 14 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.17320 [pdf, other]

Leveraging Symmetry in RL-based Legged Locomotion Control

Authors: Zhi Su, Xiaoyu Huang, Daniel Ordoñez-Apraez, Yunfei Li, Zhongyu Li, Qiayuan Liao, Giulio Turrisi, Massimiliano Pontil, Claudio Semini, Yi Wu, Koushil Sreenath

Abstract: Model-free reinforcement learning is a promising approach for autonomously solving challenging robotics control problems, but faces exploration difficulty without information of the robot's kinematics and dynamics morphology. The under-exploration of multiple modalities with symmetric states leads to behaviors that are often unnatural and sub-optimal. This issue becomes particularly pronounced in… ▽ More Model-free reinforcement learning is a promising approach for autonomously solving challenging robotics control problems, but faces exploration difficulty without information of the robot's kinematics and dynamics morphology. The under-exploration of multiple modalities with symmetric states leads to behaviors that are often unnatural and sub-optimal. This issue becomes particularly pronounced in the context of robotic systems with morphological symmetries, such as legged robots for which the resulting asymmetric and aperiodic behaviors compromise performance, robustness, and transferability to real hardware. To mitigate this challenge, we can leverage symmetry to guide and improve the exploration in policy learning via equivariance/invariance constraints. In this paper, we investigate the efficacy of two approaches to incorporate symmetry: modifying the network architectures to be strictly equivariant/invariant, and leveraging data augmentation to approximate equivariant/invariant actor-critics. We implement the methods on challenging loco-manipulation and bipedal locomotion tasks and compare with an unconstrained baseline. We find that the strictly equivariant policy consistently outperforms other methods in sample efficiency and task performance in simulation. In addition, symmetry-incorporated approaches exhibit better gait quality, higher robustness and can be deployed zero-shot in real-world experiments. △ Less

Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.11589 [pdf, other]

UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling

Authors: Yujiao Jiang, Qingmin Liao, Xiaoyu Li, Li Ma, Qi Zhang, Chaopeng Zhang, Zongqing Lu, Ying Shan

Abstract: Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body… ▽ More Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose. The code and data will be made available on the homepage https://alex-jyj.github.io/UV-Gaussians/ once the paper is accepted. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.09284 [pdf, other]

DA-PFL: Dynamic Affinity Aggregation for Personalized Federated Learning

Authors: Xu Yang, Jiyuan Feng, Songyue Guo, Ye Wang, Ye Ding, Binxing Fang, Qing Liao

Abstract: Personalized federated learning becomes a hot research topic that can learn a personalized learning model for each client. Existing personalized federated learning models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similaritybased personalized federated learning methods may exacerbate the class imbalanced problem. In th… ▽ More Personalized federated learning becomes a hot research topic that can learn a personalized learning model for each client. Existing personalized federated learning models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similaritybased personalized federated learning methods may exacerbate the class imbalanced problem. In this paper, we propose a novel Dynamic Affinity-based Personalized Federated Learning model (DA-PFL) to alleviate the class imbalanced problem during federated learning. Specifically, we build an affinity metric from a complementary perspective to guide which clients should be aggregated. Then we design a dynamic aggregation strategy to dynamically aggregate clients based on the affinity metric in each round to reduce the class imbalanced risk. Extensive experiments show that the proposed DA-PFL model can significantly improve the accuracy of each client in three real-world datasets with state-of-the-art comparison methods. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.07429 [pdf]

Dual orthogonally-polarized lasing assisted by imaginary Fermi arcs in organic microcavities

Authors: Teng Long, Jiahuan Ren, Peng Li, Feng Yun, Guillaume Malpuech, Dmitry Solnyshkov, Hongbing Fu, Feng Li, Qing Liao

Abstract: The polarization control of micro/nano lasers is an important topic in nanophotonics. Up to now, the simultaneous generation of two distinguishable orthogonally-polarized lasing modes from a single organic microlaser remains a critical challenge. Here, we demonstrate simultaneously orthogonally-polarized dual lasing from a microcavity filled with an organic single crystal exhibiting selective stro… ▽ More The polarization control of micro/nano lasers is an important topic in nanophotonics. Up to now, the simultaneous generation of two distinguishable orthogonally-polarized lasing modes from a single organic microlaser remains a critical challenge. Here, we demonstrate simultaneously orthogonally-polarized dual lasing from a microcavity filled with an organic single crystal exhibiting selective strong coupling. We show that the non-Hermiticity due to polarization-dependent losses leads to the formation of real and imaginary Fermi arcs with exceptional points. Simultaneous orthogonally-polarized lasing becomes possible thanks to the eigenstate mixing by the photonic spin-orbit coupling at the imaginary Fermi arcs. Our work provides a novel way to develop linearly-polarized lasers and paves the way for the future fundamental research in topological photonics, non-Hermitian optics, and other fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2110.13456

arXiv:2403.02630 [pdf, other]

FedHCDR: Federated Cross-Domain Recommendation with Hypergraph Signal Decoupling

Authors: Hongyu Zhang, Dongyi Zheng, Lin Zhong, Xu Yang, Jiyuan Feng, Yunqing Feng, Qing Liao

Abstract: In recent years, Cross-Domain Recommendation (CDR) has drawn significant attention, which utilizes user data from multiple domains to enhance the recommendation performance. However, current CDR methods require sharing user data across domains, thereby violating the General Data Protection Regulation (GDPR). Consequently, numerous approaches have been proposed for Federated Cross-Domain Recommenda… ▽ More In recent years, Cross-Domain Recommendation (CDR) has drawn significant attention, which utilizes user data from multiple domains to enhance the recommendation performance. However, current CDR methods require sharing user data across domains, thereby violating the General Data Protection Regulation (GDPR). Consequently, numerous approaches have been proposed for Federated Cross-Domain Recommendation (FedCDR). Nevertheless, the data heterogeneity across different domains inevitably influences the overall performance of federated learning. In this study, we propose FedHCDR, a novel Federated Cross-Domain Recommendation framework with Hypergraph signal decoupling. Specifically, to address the data heterogeneity across domains, we introduce an approach called hypergraph signal decoupling (HSD) to decouple the user features into domain-exclusive and domain-shared features. The approach employs high-pass and low-pass hypergraph filters to decouple domain-exclusive and domain-shared user representations, which are trained by the local-global bi-directional transfer algorithm. In addition, a hypergraph contrastive learning (HCL) module is devised to enhance the learning of domain-shared user relationship information by perturbing the user hypergraph. Extensive experiments conducted on three real-world scenarios demonstrate that FedHCDR outperforms existing baselines significantly. △ Less

Submitted 10 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 16 pages, 5 figures

arXiv:2403.01428 [pdf, other]

Localization matters too: How localization error affects UAV flight

Authors: Suquan Zhang, Yuanfan Xu, Shu'ang Yu, Qingmin Liao, Jincheng Yu, Yu Wang

Abstract: The maximum safe flight speed of a Unmanned Aerial Vehicle (UAV) is an important indicator for measuring its efficiency in completing various tasks. This indicator is influenced by numerous parameters such as UAV localization error, perception range, and system latency. However, in terms of localization errors, although there have been many studies dedicated to improving the localization capabilit… ▽ More The maximum safe flight speed of a Unmanned Aerial Vehicle (UAV) is an important indicator for measuring its efficiency in completing various tasks. This indicator is influenced by numerous parameters such as UAV localization error, perception range, and system latency. However, in terms of localization errors, although there have been many studies dedicated to improving the localization capability of UAVs, there is a lack of quantitative research on their impact on speed. In this work, we model the relationship between various parameters of the UAV and its maximum flight speed. We consider a scenario similar to navigating through dense forests, where the UAV needs to quickly avoid obstacles directly ahead and swiftly reorient after avoidance. Based on this scenario, we studied how parameters such as localization error affect the maximum safe speed during UAV flight, as well as the coupling relationships between these parameters. Furthermore, we validated our model in a simulation environment, and the results showed that the predicted maximum safe speed had an error of less than 20% compared to the test speed. In high-density situations, localization error has a significant impact on the UAV's maximum safe flight speed. This model can help designers utilize more suitable software and hardware to construct a UAV system. △ Less

Submitted 7 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 8 pages,8 figures

arXiv:2402.05880 [pdf, other]

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking

Authors: Nikhil Sharma, Q. Vera Liao, Ziang Xiao

Abstract: Large language models (LLMs) powered conversational search systems have already been used by hundreds of millions of people, and are believed to bring many benefits over conventional search. However, while decades of research and public discourse interrogated the risk of search systems in increasing selective exposure and creating echo chambers -- limiting exposure to diverse opinions and leading… ▽ More Large language models (LLMs) powered conversational search systems have already been used by hundreds of millions of people, and are believed to bring many benefits over conventional search. However, while decades of research and public discourse interrogated the risk of search systems in increasing selective exposure and creating echo chambers -- limiting exposure to diverse opinions and leading to opinion polarization, little is known about such a risk of LLM-powered conversational search. We conduct two experiments to investigate: 1) whether and how LLM-powered conversational search increases selective exposure compared to conventional search; 2) whether and how LLMs with opinion biases that either reinforce or challenge the user's view change the effect. Overall, we found that participants engaged in more biased information querying with LLM-powered conversational search, and an opinionated LLM reinforcing their views exacerbated this bias. These results present critical implications for the development of LLMs and conversational search systems, and the policy governing these technologies. △ Less

Submitted 10 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted in CHI'24. Supplementary material will be available online with the official submission in CHI 2024

arXiv:2402.02060 [pdf, other]

DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and Authentication

Authors: Yanjun Liu, Wenming Yang, Qingmin Liao

Abstract: Finger vein authentication, recognized for its high security and specificity, has become a focal point in biometric research. Traditional methods predominantly concentrate on vein feature extraction for discriminative modeling, with a limited exploration of generative approaches. Suffering from verification failure, existing methods often fail to obtain authentic vein patterns by segmentation. To… ▽ More Finger vein authentication, recognized for its high security and specificity, has become a focal point in biometric research. Traditional methods predominantly concentrate on vein feature extraction for discriminative modeling, with a limited exploration of generative approaches. Suffering from verification failure, existing methods often fail to obtain authentic vein patterns by segmentation. To fill this gap, we introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks. DiffVein is composed of two dedicated branches: one for segmentation and the other for denoising. For better feature interaction between these two branches, we introduce two specialized modules to improve their collective performance. The first, a mask condition module, incorporates the semantic information of vein patterns from the segmentation branch into the denoising process. Additionally, we also propose a Semantic Difference Transformer (SD-Former), which employs Fourier-space self-attention and cross-attention modules to extract category embedding before feeding it to the segmentation task. In this way, our framework allows for a dynamic interplay between diffusion and segmentation embeddings, thus vein segmentation and authentication tasks can inform and enhance each other in the joint training. To further optimize our model, we introduce a Fourier-space Structural Similarity (FourierSIM) loss function, which is tailored to improve the denoising network's learning efficacy. Extensive experiments on the USM and THU-MVFV3V datasets substantiates DiffVein's superior performance, setting new benchmarks in both vein segmentation and authentication tasks. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.15845 [pdf]

doi 10.1103/PhysRevB.109.205403

Extremely intrinsic chirality in two-dimensional planar waveguide grating induced by quasi-bound states in the continuum

Authors: Dandan Zhang, Tingting Liu, Linlin Lei, Weimin Deng, Tongbiao Wang, Qinghua Liao, Wenxing Liu, Shuyuan Xiao, Tianbao Yu

Abstract: The strong chiral light-matter interaction is crucial for various important fields such as chiral optics, quantum optics, and biomedical optics, driving a quest for the extreme intrinsic chirality assisted by ultrahigh quality ($Q$-) factor resonances. In this quest, we propose a straightforward method to achieve extreme intrinsic chirality in lossless planar structures by manipulating the quasi-B… ▽ More The strong chiral light-matter interaction is crucial for various important fields such as chiral optics, quantum optics, and biomedical optics, driving a quest for the extreme intrinsic chirality assisted by ultrahigh quality ($Q$-) factor resonances. In this quest, we propose a straightforward method to achieve extreme intrinsic chirality in lossless planar structures by manipulating the quasi-BIC through in-plane perturbation. The temporal coupled-mode theory is employed to derive the conditions necessary for achieving maximal intrinsic chirality. The quasi-BIC should be excited within the transparent spectral range of the structure and couple with $x$- and $y$-polarized waves with the same intensity but a phase difference of $π$/2. For an illustration, a planar chiral dielectric dimeric waveguide grating is designed that strong interacts with left circularly polarized (LCP) light while decouples from right circularly polarized (RCP) light through in-plane symmetry engineering. Furthermore, by adjusting the magnitude of the in-plane asymmetry, we can independently manipulate the $Q$-factors of the chiral quasi-BIC while maintaining nearly unity circular dichroism. Our results provide a simple yet powerful paradigm for achieving extreme intrinsic chirality on an easily manufacturable platform, which may have potential applications in chiral emission, chiral sensing, and enantiomer separation. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Journal ref: Physical Review B 109 (20), 205403 (2024)

arXiv:2401.15843 [pdf, other]

APIGen: Generative API Method Recommendation

Authors: Yujia Chen, Cuiyun Gao, Muyijie Zhu, Qing Liao, Yong Wang, Guoai Xu

Abstract: Automatic API method recommendation is an essential task of code intelligence, which aims to suggest suitable APIs for programming queries. Existing approaches can be categorized into two primary groups: retrieval-based and learning-based approaches. Although these approaches have achieved remarkable success, they still come with notable limitations. The retrieval-based approaches rely on the text… ▽ More Automatic API method recommendation is an essential task of code intelligence, which aims to suggest suitable APIs for programming queries. Existing approaches can be categorized into two primary groups: retrieval-based and learning-based approaches. Although these approaches have achieved remarkable success, they still come with notable limitations. The retrieval-based approaches rely on the text representation capabilities of embedding models, while the learning-based approaches require extensive task-specific labeled data for training. To mitigate the limitations, we propose APIGen, a generative API recommendation approach through enhanced in-context learning (ICL). APIGen involves two main components: (1) Diverse Examples Selection. APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives, providing more informative examples for ICL. (2) Guided API Recommendation. APIGen enables large language models (LLMs) to perform reasoning before generating API recommendations, where the reasoning involves fine-grained matching between the task intent behind the queries and the factual knowledge of the APIs. With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries and also enhances the interpretability of results. We compare APIGen with four existing approaches on two publicly available benchmarks. Experiments show that APIGen outperforms the best baseline CLEAR by 105.8% in method-level API recommendation and 54.3% in class-level API recommendation in terms of SuccessRate@1. Besides, APIGen achieves an average 49.87% increase compared to the zero-shot performance of popular LLMs such as GPT-4 in method-level API recommendation regarding the SuccessRate@3 metric. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: To appear in the proceedings of the 31st IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2024)

arXiv:2401.13169 [pdf, other]

ReposVul: A Repository-Level High-Quality Vulnerability Dataset

Authors: Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, Qing Liao

Abstract: Open-Source Software (OSS) vulnerabilities bring great challenges to the software security and pose potential risks to our society. Enormous efforts have been devoted into automated vulnerability detection, among which deep learning (DL)-based approaches have proven to be the most effective. However, the current labeled data present the following limitations: (1) Tangled Patches: Developers may su… ▽ More Open-Source Software (OSS) vulnerabilities bring great challenges to the software security and pose potential risks to our society. Enormous efforts have been devoted into automated vulnerability detection, among which deep learning (DL)-based approaches have proven to be the most effective. However, the current labeled data present the following limitations: (1) Tangled Patches: Developers may submit code changes unrelated to vulnerability fixes within patches, leading to tangled patches. (2) Lacking Inter-procedural Vulnerabilities: The existing vulnerability datasets typically contain function-level and file-level vulnerabilities, ignoring the relations between functions, thus rendering the approaches unable to detect the inter-procedural vulnerabilities. (3) Outdated Patches: The existing datasets usually contain outdated patches, which may bias the model during training. To address the above limitations, in this paper, we propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul. The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches, in which the Large Language Models (LLMs) and static analysis tools are jointly employed. (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities, in which we construct multiple-granularity information for each vulnerability patch, including repository-level, file-level, function-level, and line-level. (3) A trace-based filtering module, aiming at filtering the outdated patches, which leverages the file path trace-based filter and commit time trace-based filter to construct an up-to-date dataset. △ Less

Submitted 8 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted by ICSE 2024 Industry Challenge Track

arXiv:2401.11731 [pdf, other]

Fast and Scalable Network Slicing by Integrating Deep Learning with Lagrangian Methods

Authors: Tianlun Hu, Qi Liao, Qiang Liu, Antonio Massaro, Georg Carle

Abstract: Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services. Many network slicing solutions rely on deep learning to manage complex and high-dimensional resource allocation problems. However, deep learning models suffer limited generalization and adaptability to dynamic slicing configurations. In this paper, we propose a novel framework that integrates constrain… ▽ More Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services. Many network slicing solutions rely on deep learning to manage complex and high-dimensional resource allocation problems. However, deep learning models suffer limited generalization and adaptability to dynamic slicing configurations. In this paper, we propose a novel framework that integrates constrained optimization methods and deep learning models, resulting in strong generalization and superior approximation capability. Based on the proposed framework, we design a new neural-assisted algorithm to allocate radio resources to slices to maximize the network utility under inter-slice resource constraints. The algorithm exhibits high scalability, accommodating varying numbers of slices and slice configurations with ease. We implement the proposed solution in a system-level network simulator and evaluate its performance extensively by comparing it to state-of-the-art solutions including deep reinforcement learning approaches. The numerical results show that our solution obtains near-optimal quality-of-service satisfaction and promising generalization performance under different network slicing scenarios. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures, IEEE Global Communications Conference 2023

arXiv:2401.09051 [pdf, other]

Canvil: Designerly Adaptation for LLM-Powered User Experiences

Authors: K. J. Kevin Feng, Q. Vera Liao, Ziang Xiao, Jennifer Wortman Vaughan, Amy X. Zhang, David W. McDonald

Abstract: Advancements in large language models (LLMs) are poised to spark a proliferation of LLM-powered user experiences. In product teams, designers are often tasked with crafting user experiences that align with user needs. To involve designers and leverage their user-centered perspectives to create effective and responsible LLM-powered products, we introduce the practice of designerly adaptation for en… ▽ More Advancements in large language models (LLMs) are poised to spark a proliferation of LLM-powered user experiences. In product teams, designers are often tasked with crafting user experiences that align with user needs. To involve designers and leverage their user-centered perspectives to create effective and responsible LLM-powered products, we introduce the practice of designerly adaptation for engaging with LLMs as an adaptable design material. We first identify key characteristics of designerly adaptation through a formative study with designers experienced in designing for LLM-powered products (N=12). These characteristics are 1) have a low technical barrier to entry, 2) leverage designers' unique perspectives bridging users and technology, and 3) encourage model tinkering. Based on this characterization, we build Canvil, a Figma widget that operationalizes designerly adaptation. Canvil supports structured authoring of system prompts to adapt LLM behavior, testing of adapted models on diverse user inputs, and integration of model outputs into interface designs. We use Canvil as a technology probe in a group-based design study (6 groups, N=17) to investigate the implications of integrating designerly adaptation into design workflows. We find that designers are able to iteratively tinker with different adaptation approaches and reason about interface affordances to enhance end-user interaction with LLMs. Furthermore, designers identified promising collaborative workflows for designerly adaptation. Our work opens new avenues for collaborative processes and tools that foreground designers' user-centered expertise in the crafting and deployment of LLM-powered user experiences. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.08360 [pdf, other]

AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization

Authors: Qi Liao, Tze-Yang Tung

Abstract: Recently, deep autoencoders have gained traction as a powerful method for implementing goal-oriented semantic communications systems. The idea is to train a mapping from the source domain directly to channel symbols, and vice versa. However, prior studies often focused on rate-distortion tradeoff and transmission delay, at the cost of increasing end-to-end complexity and thus latency. Moreover, th… ▽ More Recently, deep autoencoders have gained traction as a powerful method for implementing goal-oriented semantic communications systems. The idea is to train a mapping from the source domain directly to channel symbols, and vice versa. However, prior studies often focused on rate-distortion tradeoff and transmission delay, at the cost of increasing end-to-end complexity and thus latency. Moreover, the datasets used are often not reflective of real-world environments, and the results were not validated against real-world baseline systems, leading to an unfair comparison. In this paper, we study the problem of remote camera pose estimation and propose AdaSem, an adaptive semantic communications approach that optimizes the tradeoff between inference accuracy and end-to-end latency. We develop an adaptive semantic codec model, which encodes the source data into a dynamic number of symbols, based on the latent space distribution and the channel state feedback. We utilize a lightweight model for both transmitter and receiver to ensure comparable complexity to the baseline implemented in a real-world system. Extensive experiments on real-environment data show the effectiveness of our approach. When compared to a real implementation of a client-server camera relocalization service, AdaSem outperforms the baseline by reducing the end-to-end delay and estimation error by over 75% and 63%, respectively. △ Less

Submitted 24 May, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: IEEE INFOCOM 2024

arXiv:2401.08131 [pdf, other]

Game Rewards Vulnerabilities: Software Vulnerability Detection with Zero-Sum Game and Prototype Learning

Authors: Xin-Cheng Wen, Cuiyun Gao, Xinchen Wang, Ruiqi Wang, Tao Zhang, Qing Liao

Abstract: Recent years have witnessed a growing focus on automated software vulnerability detection. Notably, deep learning (DL)-based methods, which employ source code for the implicit acquisition of vulnerability patterns, have demonstrated superior performance compared to other approaches. However, the DL-based approaches are still hard to capture the vulnerability-related information from the whole code… ▽ More Recent years have witnessed a growing focus on automated software vulnerability detection. Notably, deep learning (DL)-based methods, which employ source code for the implicit acquisition of vulnerability patterns, have demonstrated superior performance compared to other approaches. However, the DL-based approaches are still hard to capture the vulnerability-related information from the whole code snippet, since the vulnerable parts usually account for only a small proportion. As evidenced by our experiments, the approaches tend to excessively emphasize semantic information, potentially leading to limited vulnerability detection performance in practical scenarios. First, they cannot well distinguish between the code snippets before (i.e., vulnerable code) and after (i.e., non-vulnerable code) developers' fixes due to the minimal code changes. Besides, substituting user-defined identifiers with placeholders (e.g., "VAR1" and "FUN1") in obvious performance degradation at up to 14.53% with respect to the F1 score. To mitigate these issues, we propose to leverage the vulnerable and corresponding fixed code snippets, in which the minimal changes can provide hints about semantic-agnostic features for vulnerability detection. In this paper, we propose a software vulneRability dEteCtion framework with zerO-sum game and prototype learNing, named RECON. In RECON, we propose a zero-sum game construction module. Distinguishing the vulnerable code from the corresponding fixed code is regarded as one player (i.e. Calibrator), while the conventional vulnerability detection is another player (i.e. Detector) in the zero-sum game. The goal is to capture the semantic-agnostic features of the first player for enhancing the second player's performance for vulnerability detection. Experiments on the public benchmark dataset show that RECON outperforms the state-of-the-art baseline by 6.29% in F1 score. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 17 pages, 8 figures

arXiv:2401.08083 [pdf, other]

UV-SAM: Adapting Segment Anything Model for Urban Village Identification

Authors: Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, Yong Li

Abstract: Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditionally, governments heavily depend on field survey methods to monitor the urban villages, which however are time-consumin… ▽ More Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditionally, governments heavily depend on field survey methods to monitor the urban villages, which however are time-consuming, labor-intensive, and possibly delayed. Thanks to widely available and timely updated satellite images, recent studies develop computer vision techniques to detect urban villages efficiently. However, existing studies either focus on simple urban village image classification or fail to provide accurate boundary information. To accurately identify urban village boundaries from satellite images, we harness the power of the vision foundation model and adapt the Segment Anything Model (SAM) to urban village segmentation, named UV-SAM. Specifically, UV-SAM first leverages a small-sized semantic segmentation model to produce mixed prompts for urban villages, including mask, bounding box, and image representations, which are then fed into SAM for fine-grained boundary identification. Extensive experimental results on two datasets in China demonstrate that UV-SAM outperforms existing baselines, and identification results over multiple years show that both the number and area of urban villages are decreasing over time, providing deeper insights into the development trends of urban villages and sheds light on the vision foundation models for sustainable cities. The dataset and codes of this study are available at https://github.com/tsinghua-fib-lab/UV-SAM. △ Less

Submitted 1 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2401.03877 [pdf]

Optical spin Hall effect pattern switching in polariton condensates in organic single-crystal microbelts

Authors: Jiahuan Ren, Teng Long, Chunling Gu, Hongbing Fu, Dmitry Solnyshkov, Guillaume Malpuech, Qing Liao

Abstract: Topological polaritons, combining the robustness of the topological protected edge states to defects and disorder with the strong nonlinear properties of polariton bosons, represent an excellent platform to investigate novel photonic topological phases. In this work, we demonstrated the optical spin Hall effect (OSHE) and its symmetry switching in the exciton-polariton regime of pure DPAVBi crysta… ▽ More Topological polaritons, combining the robustness of the topological protected edge states to defects and disorder with the strong nonlinear properties of polariton bosons, represent an excellent platform to investigate novel photonic topological phases. In this work, we demonstrated the optical spin Hall effect (OSHE) and its symmetry switching in the exciton-polariton regime of pure DPAVBi crystals. Benefiting from the photonic Rashba-Dresselhaus spin-orbit coupling in organic crystals, we observed the separation of left- and right-circularly-polarized polariton emission in two-dimensional momentum space and real space, a signature of the OSHE. Above the lasing threshold, the OSHE pattern changes due to transverse quantization in the microbelt. This device without superlattice structure has great potential applications in topological polaritonics, such as information transmission, photonic integrated chips and quantum information. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.16805 [pdf, other]

DarkShot: Lighting Dark Images with Low-Compute and High-Quality

Authors: Jiazhang Zheng, Lei Li, Qiuping Liao, Cheng Li, Li Li, Yangxing Liu

Abstract: Nighttime photography encounters escalating challenges in extremely low-light conditions, primarily attributable to the ultra-low signal-to-noise ratio. For real-world deployment, a practical solution must not only produce visually appealing results but also require minimal computation. However, most existing methods are either focused on improving restoration performance or employ lightweight mod… ▽ More Nighttime photography encounters escalating challenges in extremely low-light conditions, primarily attributable to the ultra-low signal-to-noise ratio. For real-world deployment, a practical solution must not only produce visually appealing results but also require minimal computation. However, most existing methods are either focused on improving restoration performance or employ lightweight models at the cost of quality. This paper proposes a lightweight network that outperforms existing state-of-the-art (SOTA) methods in low-light enhancement tasks while minimizing computation. The proposed network incorporates Siamese Self-Attention Block (SSAB) and Skip-Channel Attention (SCA) modules, which enhance the model's capacity to aggregate global information and are well-suited for high-resolution images. Additionally, based on our analysis of the low-light image restoration process, we propose a Two-Stage Framework that achieves superior results. Our model can restore a UHD 4K resolution image with minimal computation while keeping SOTA restoration quality. △ Less

Submitted 9 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted by IEEE ICASSP 2024

arXiv:2312.15224 [pdf, other]

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Authors: Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, Yu Wang

Abstract: AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal in… ▽ More AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications. △ Less

Submitted 9 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: This paper is accpeted by AAMAS 2024. More demonstrations can be seen on our website https://sites.google.com/view/overcooked-hla/

arXiv:2312.13701 [pdf, ps, other]

Infinite families $2$-designs from binary projective three-weight codes

Authors: Canze Zhu, Qunying Liao, Haibo Liu

Abstract: Combinatorial designs are closely related to linear codes. In recent year, there are a lot of $t$-designs constructed from certain linear codes. In this paper, we aim to construct $2$-designs from binary three-weight codes. For any binary three-weight code $\mathcal{C}$ with length $n$, let $A_{n}(\mathcal{C})$ be the number of codewords in $\mathcal{C}$ with Hamming weight $n$, then we show that… ▽ More Combinatorial designs are closely related to linear codes. In recent year, there are a lot of $t$-designs constructed from certain linear codes. In this paper, we aim to construct $2$-designs from binary three-weight codes. For any binary three-weight code $\mathcal{C}$ with length $n$, let $A_{n}(\mathcal{C})$ be the number of codewords in $\mathcal{C}$ with Hamming weight $n$, then we show that $\mathcal{C}$ holds $2$-designs when $\mathcal{C}$ is projective and $A_{n}(\mathcal{C})=1$. Furthermore, by extending some certain binary projective two-weight codes and basing on the defining set method, we construct two classes of binary projective three-weight codes which are suitable for holding $2$-designs. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.01536 [pdf, other]

CalliPaint: Chinese Calligraphy Inpainting with Diffusion Model

Authors: Qisheng Liao, Zhinuo Wang, Muhammad Abdul-Mageed, Gus Xia

Abstract: Chinese calligraphy can be viewed as a unique form of visual art. Recent advancements in computer vision hold significant potential for the future development of generative models in the realm of Chinese calligraphy. Nevertheless, methods of Chinese calligraphy inpainting, which can be effectively used in the art and education fields, remain relatively unexplored. In this paper, we introduce a new… ▽ More Chinese calligraphy can be viewed as a unique form of visual art. Recent advancements in computer vision hold significant potential for the future development of generative models in the realm of Chinese calligraphy. Nevertheless, methods of Chinese calligraphy inpainting, which can be effectively used in the art and education fields, remain relatively unexplored. In this paper, we introduce a new model that harnesses recent advancements in both Chinese calligraphy generation and image inpainting. We demonstrate that our proposed model CalliPaint can produce convincing Chinese calligraphy. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted as a Machine Learning for Creativity and Design(ML4CD) workshop paper at NeruaIPS 2023. https://neurips.cc/virtual/2023/workshop/66545#wse-detail-75063

arXiv:2311.09919 [pdf, other]

DSR-Diff: Depth Map Super-Resolution with Diffusion Model

Authors: Yuan Shi, Bin Xia, Rui Zhu, Qingmin Liao, Wenming Yang

Abstract: Color-guided depth map super-resolution (CDSR) improve the spatial resolution of a low-quality depth map with the corresponding high-quality color map, benefiting various applications such as 3D reconstruction, virtual reality, and augmented reality. While conventional CDSR methods typically rely on convolutional neural networks or transformers, diffusion models (DMs) have demonstrated notable eff… ▽ More Color-guided depth map super-resolution (CDSR) improve the spatial resolution of a low-quality depth map with the corresponding high-quality color map, benefiting various applications such as 3D reconstruction, virtual reality, and augmented reality. While conventional CDSR methods typically rely on convolutional neural networks or transformers, diffusion models (DMs) have demonstrated notable effectiveness in high-level vision tasks. In this work, we present a novel CDSR paradigm that utilizes a diffusion model within the latent space to generate guidance for depth map super-resolution. The proposed method comprises a guidance generation network (GGN), a depth map super-resolution network (DSRN), and a guidance recovery network (GRN). The GGN is specifically designed to generate the guidance while managing its compactness. Additionally, we integrate a simple but effective feature fusion module and a transformer-style feature extraction module into the DSRN, enabling it to leverage guided priors in the extraction, fusion, and reconstruction of multi-model images. Taking into account both accuracy and efficiency, our proposed method has shown superior performance in extensive experiments when compared to state-of-the-art methods. Our codes will be made available at https://github.com/shiyuan7/DSR-Diff. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09696 [pdf, other]

Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

Authors: Wei-Rui Chen, Ife Adebara, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed

Abstract: ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT `knows', we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 24 language families spoken in five c… ▽ More ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT `knows', we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 24 language families spoken in five continents. Languages in Babel-670 run the gamut from the very high-resource to the very low-resource. We then study ChatGPT's (both GPT-3.5 and GPT-4) ability to (i) identify language names and language codes (ii) under zero- and few-shot conditions (iii) with and without provision of a label set. When compared to smaller finetuned LID tools, we find that ChatGPT lags behind. For example, it has poor performance on African languages. We conclude that current large language models would benefit from further development before they can sufficiently serve diverse communities. △ Less

Submitted 8 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL 2024 Findings

arXiv:2310.17397 [pdf, ps, other]

Simultaneous manipulation of electromagnetic and elastic waves via glide symmetry phoxonic crystal waveguides

Authors: Linlin Lei, Lingjuan He, Qinghua Liao, Wenxing Liu, Tianbao Yu

Abstract: A phoxonic crystal waveguide with the glide symmetry is designed, in which both electromagnetic and elastic waves can propagate along the glide plane at the same time. Due to the band-sticking effect, super-cell bands of the waveguide degenerate in pairs at the boundary of the Brillouin zone, causing the appearance of gapless guided-modes in the bandgaps. The gapless guided-modes are single-modes… ▽ More A phoxonic crystal waveguide with the glide symmetry is designed, in which both electromagnetic and elastic waves can propagate along the glide plane at the same time. Due to the band-sticking effect, super-cell bands of the waveguide degenerate in pairs at the boundary of the Brillouin zone, causing the appearance of gapless guided-modes in the bandgaps. The gapless guided-modes are single-modes over a relatively large frequency range. By adjusting the magnitude of the glide dislocation, the edge bandgaps of the guided-modes can be further adjusted, so as to achieve photonic and phononic single-mode guided-bands with relatively flat dispersion relationship. In addition, there exists acousto-optic interaction in the cavity constructed by the glide plane. The proposed waveguide has potential applications in the design of novel optomechanical devices. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 16 pages, 9 figures

arXiv:2310.14557 [pdf, other]

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages

Authors: Chiyu Zhang, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed

Abstract: Instruction tuned large language models (LLMs), such as ChatGPT, demonstrate remarkable performance in a wide range of tasks. Despite numerous recent studies that examine the performance of instruction-tuned LLMs on various NLP benchmarks, there remains a lack of comprehensive investigation into their ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning embedded within so… ▽ More Instruction tuned large language models (LLMs), such as ChatGPT, demonstrate remarkable performance in a wide range of tasks. Despite numerous recent studies that examine the performance of instruction-tuned LLMs on various NLP benchmarks, there remains a lack of comprehensive investigation into their ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning embedded within social and interactive contexts. This deficiency arises partly from SM not being adequately represented in any of the existing benchmarks. To address this gap, we present SPARROW, an extensive multilingual benchmark specifically designed for SM understanding. SPARROW comprises 169 datasets covering 13 task types across six primary categories (e.g., anti-social language detection, emotion recognition). SPARROW datasets encompass 64 different languages originating from 12 language families representing 16 writing scripts. We evaluate the performance of various multilingual pretrained language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning. Our comprehensive analysis reveals that existing open-source instruction tuned LLMs still struggle to understand SM across various languages, performing close to a random baseline in some cases. We also find that although ChatGPT outperforms many LLMs, it still falls behind task-specific finetuned models with a gap of 12.19 SPARROW score. Our benchmark is available at: https://github.com/UBC-NLP/SPARROW △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP 2023 Main conference

arXiv:2310.14488 [pdf, other]

doi 10.1088/1674-4527/ad013c

Density Functional Theory Calculations on the Interstellar Formation of Biomolecules

Authors: Qingli Liao, Junzhi Wang, Peng Xie, Enwei Liang, Zhao Wang

Abstract: The density functional theory (DFT) is the most versatile electronic structure method used in quantum chemical calculations, and is increasingly applied in astrochemical research. This mini-review provides an overview of the applications of DFT calculations in understanding the chemistry that occurs in star-forming regions. We survey investigations into the formation of biologically-relevant compo… ▽ More The density functional theory (DFT) is the most versatile electronic structure method used in quantum chemical calculations, and is increasingly applied in astrochemical research. This mini-review provides an overview of the applications of DFT calculations in understanding the chemistry that occurs in star-forming regions. We survey investigations into the formation of biologically-relevant compounds such as nucleobases in the interstellar medium, and also covers the formation of both achiral and chiral amino acids, as well as biologically-relevant molecules such as sugars, and nitrogen-containing polycyclic aromatic hydrocarbons. Additionally, DFT calculations are used to estimate the potential barriers for chemical reactions in astronomical environments. We conclude by noting several areas that require more research, such as the formation pathways of chiral amino acids, complex sugars and other biologically-important molecules, and the role of environmental factors in the formation of interstellar biomolecules. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.10436 [pdf, other]

EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities

Authors: Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao

Abstract: The advent of artificial intelligence has led to a growing emphasis on data-driven modeling in macroeconomics, with agent-based modeling (ABM) emerging as a prominent bottom-up simulation paradigm. In ABM, agents (e.g., households, firms) interact within a macroeconomic environment, collectively generating market dynamics. Existing agent modeling typically employs predetermined rules or learning-b… ▽ More The advent of artificial intelligence has led to a growing emphasis on data-driven modeling in macroeconomics, with agent-based modeling (ABM) emerging as a prominent bottom-up simulation paradigm. In ABM, agents (e.g., households, firms) interact within a macroeconomic environment, collectively generating market dynamics. Existing agent modeling typically employs predetermined rules or learning-based neural networks for decision-making. However, customizing each agent presents significant challenges, complicating the modeling of agent heterogeneity. Additionally, the influence of multi-period market dynamics and multifaceted macroeconomic factors are often overlooked in decision-making processes. In this work, we introduce EconAgent, a large language model-empowered agent with human-like characteristics for macroeconomic simulation. We first construct a simulation environment that incorporates various market dynamics driven by agents' decisions regarding work and consumption. Through the perception module, we create heterogeneous agents with distinct decision-making mechanisms. Furthermore, we model the impact of macroeconomic trends using a memory module, which allows agents to reflect on past individual experiences and market dynamics. Simulation experiments show that EconAgent can make realistic decisions, leading to more reasonable macroeconomic phenomena compared to existing rule-based or learning-based agents. Our codes are released at https://github.com/tsinghua-fib-lab/ACL24-EconAgent. △ Less

Submitted 23 May, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: ACL 2024 (main conference)

arXiv:2310.08024 [pdf]

Effective potential engineering by emergent anisotropy in a tunable open-access microcavity

Authors: Yiming Li, Xiaoxuan Luo, Yaxin Guo, Jiahuan Ren, Teng Long, Bohao Wang, Yin Cai, Chaowei Guo, Yuanbin Qin, Hongbing Fu, Yanpeng Zhang, Feng Yun, Qing Liao, Feng Li

Abstract: Photonic spin-orbit (SO) coupling is an important physical mechanism leading to numerous interesting phenomena in the systems of microcavity photons and exciton-polaritons. We report the effect of SO coupling in a tunable open-access microcavity embedded with anisotropic active media. The SO coupling associated with the TE-TM splitting results in an emergent anisotropy, which further leads to fine… ▽ More Photonic spin-orbit (SO) coupling is an important physical mechanism leading to numerous interesting phenomena in the systems of microcavity photons and exciton-polaritons. We report the effect of SO coupling in a tunable open-access microcavity embedded with anisotropic active media. The SO coupling associated with the TE-TM splitting results in an emergent anisotropy, which further leads to fine energy splittings allowing clear observation of the full set of eigenstates, in sharp contrast with the isotropic situation which leads to the isotropic eigenstates of spin vortices. We show that the photonic potential can be engineered by playing with the relation between the emergent anisotropy and the cavity ellipticity. All the experimental results are well reproduced by the degenerate perturbation theory. Our results constitute a significant extension to the research field of microcavity spinoptronics, with potential applications in polarization control and optical property measurement of photonic devices and materials. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 12 pages, 4 figures

arXiv:2309.09496 [pdf, other]

CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

Authors: Yating Liu, Yaowei Li, Zimo Liu, Wenming Yang, Yaowei Wang, Qingmin Liao

Abstract: Text-based Person Retrieval (TPR) aims to retrieve the target person images given a textual query. The primary challenge lies in bridging the substantial gap between vision and language modalities, especially when dealing with limited large-scale datasets. In this paper, we introduce a CLIP-based Synergistic Knowledge Transfer (CSKT) approach for TPR. Specifically, to explore the CLIP's knowledge… ▽ More Text-based Person Retrieval (TPR) aims to retrieve the target person images given a textual query. The primary challenge lies in bridging the substantial gap between vision and language modalities, especially when dealing with limited large-scale datasets. In this paper, we introduce a CLIP-based Synergistic Knowledge Transfer (CSKT) approach for TPR. Specifically, to explore the CLIP's knowledge on input side, we first propose a Bidirectional Prompts Transferring (BPT) module constructed by text-to-image and image-to-text bidirectional prompts and coupling projections. Secondly, Dual Adapters Transferring (DAT) is designed to transfer knowledge on output side of Multi-Head Attention (MHA) in vision and language. This synergistic two-way collaborative mechanism promotes the early-stage feature fusion and efficiently exploits the existing knowledge of CLIP. CSKT outperforms the state-of-the-art approaches across three benchmark datasets when the training parameters merely account for 7.4% of the entire model, demonstrating its remarkable efficiency, effectiveness and generalization. △ Less

Submitted 2 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: ICASSP2024(accepted). minor typos revision compared to version 1 in arxiv

arXiv:2309.08420 [pdf, other]

doi 10.1137/1.9781611978032.62

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

Authors: Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao

Abstract: Cross-domain Sequential Recommendation (CSR) which leverages user sequence data from multiple domains has received extensive attention in recent years. However, the existing CSR methods require sharing origin user data across domains, which violates the General Data Protection Regulation (GDPR). Thus, it is necessary to combine federated learning (FL) and CSR to fully utilize knowledge from differ… ▽ More Cross-domain Sequential Recommendation (CSR) which leverages user sequence data from multiple domains has received extensive attention in recent years. However, the existing CSR methods require sharing origin user data across domains, which violates the General Data Protection Regulation (GDPR). Thus, it is necessary to combine federated learning (FL) and CSR to fully utilize knowledge from different domains while preserving data privacy. Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL. In this paper, we propose FedDCSR, a novel federated cross-domain sequential recommendation framework via disentangled representation learning. Specifically, to address the sequence feature heterogeneity across domains, we introduce an approach called inter-intra domain sequence representation disentanglement (SRD) to disentangle the user sequence features into domain-shared and domain-exclusive features. In addition, we design an intra domain contrastive infomax (CIM) strategy to learn richer domain-exclusive features of users by performing data augmentation on user sequences. Extensive experiments on three real-world scenarios demonstrate that FedDCSR achieves significant improvements over existing baselines. △ Less

Submitted 16 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.07652 [pdf]

Photochemical reaction enabling the engineering of photonic spin-orbit coupling in organic-crystal optical microcavities

Authors: Qian Liang, Xuekai Ma, Jiahuan Ren, Teng Long, Chunling Gu, Cunbin An, Hongbing Fu, Stefan Schumacher, Qing Liao

Abstract: The control and active manipulation of spin-orbit coupling (SOC) in photonic systems is fundamental in the development of modern spin optics and topological photonic devices. Here, we demonstrate the control of an artificial Rashba-Dresselhaus (RD) SOC mediated by photochemical reactions in a microcavity filled with an organic single-crystal of photochromic phase-change character. Splitting of the… ▽ More The control and active manipulation of spin-orbit coupling (SOC) in photonic systems is fundamental in the development of modern spin optics and topological photonic devices. Here, we demonstrate the control of an artificial Rashba-Dresselhaus (RD) SOC mediated by photochemical reactions in a microcavity filled with an organic single-crystal of photochromic phase-change character. Splitting of the circular polarization components of the optical modes induced by photonic RD SOC is observed experimentally in momentum space. By applying an ultraviolet light beam, we control the spatial molecular orientation through a photochemical reaction and with that we control the energies of the photonic modes. This way we realize a reversible conversion of spin-splitting of the optical modes with different energies, leading to an optically controlled switching between circularly and linearly polarized emission from our device. Our strategy of in situ and reversible engineering of SOC induced by a light field provides a promising approach to actively design and manipulate synthetic gauge fields towards future on-chip integration in photonics and topological photonic devices. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.06169 [pdf, other]

Elucidating the solution space of extended reverse-time SDE for diffusion models

Authors: Qinpeng Cui, Xinyi Zhang, Zongqing Lu, Qingmin Liao

Abstract: Diffusion models (DMs) demonstrate potent image generation capabilities in various generative modeling tasks. Nevertheless, their primary limitation lies in slow sampling speed, requiring hundreds or thousands of sequential function evaluations through large neural networks to generate high-quality images. Sampling from DMs can be seen alternatively as solving corresponding stochastic differential… ▽ More Diffusion models (DMs) demonstrate potent image generation capabilities in various generative modeling tasks. Nevertheless, their primary limitation lies in slow sampling speed, requiring hundreds or thousands of sequential function evaluations through large neural networks to generate high-quality images. Sampling from DMs can be seen alternatively as solving corresponding stochastic differential equations (SDEs) or ordinary differential equations (ODEs). In this work, we formulate the sampling process as an extended reverse-time SDE (ER SDE), unifying prior explorations into ODEs and SDEs. Leveraging the semi-linear structure of ER SDE solutions, we offer exact solutions and arbitrarily high-order approximate solutions for VP SDE and VE SDE, respectively. Based on the solution space of the ER SDE, we yield mathematical insights elucidating the superior performance of ODE solvers over SDE solvers in terms of fast sampling. Additionally, we unveil that VP SDE solvers stand on par with their VE SDE counterparts. Finally, we devise fast and training-free samplers, ER-SDE-Solvers, achieving state-of-the-art performance across all stochastic samplers. Experimental results demonstrate achieving 3.45 FID in 20 function evaluations and 2.24 FID in 50 function evaluations on the ImageNet $64\times64$ dataset. △ Less

Submitted 26 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.03283 [pdf, other]

High-rate discretely-modulated continuous-variable quantum key distribution using quantum machine learning

Authors: Qin Liao, Jieyu Liu, Anqi Huang, Lei Huang, Zhuoying Fei, Xiquan Fu

Abstract: We propose a high-rate scheme for discretely-modulated continuous-variable quantum key distribution (DM CVQKD) using quantum machine learning technologies, which divides the whole CVQKD system into three parts, i.e., the initialization part that is used for training and estimating quantum classifier, the prediction part that is used for generating highly correlated raw keys, and the data-postproce… ▽ More We propose a high-rate scheme for discretely-modulated continuous-variable quantum key distribution (DM CVQKD) using quantum machine learning technologies, which divides the whole CVQKD system into three parts, i.e., the initialization part that is used for training and estimating quantum classifier, the prediction part that is used for generating highly correlated raw keys, and the data-postprocessing part that generates the final secret key string shared by Alice and Bob. To this end, a low-complexity quantum k-nearest neighbor (QkNN) classifier is designed for predicting the lossy discretely-modulated coherent states (DMCSs) at Bob's side. The performance of the proposed QkNN-based CVQKD especially in terms of machine learning metrics and complexity is analyzed, and its theoretical security is proved by using semi-definite program (SDP) method. Numerical simulation shows that the secret key rate of our proposed scheme is explicitly superior to the existing DM CVQKD protocols, and it can be further enhanced with the increase of modulation variance. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 18 pages, 17 figures

arXiv:2307.05276 [pdf, other]

Unbiased Scene Graph Generation via Two-stage Causal Modeling

Authors: Shuzhou Sun, Shuaifeng Zhi, Qing Liao, Janne Heikkilä, Li Liu

Abstract: Despite the impressive performance of recent unbiased Scene Graph Generation (SGG) methods, the current debiasing literature mainly focuses on the long-tailed distribution problem, whereas it overlooks another source of bias, i.e., semantic confusion, which makes the SGG model prone to yield false predictions for similar relationships. In this paper, we explore a debiasing procedure for the SGG ta… ▽ More Despite the impressive performance of recent unbiased Scene Graph Generation (SGG) methods, the current debiasing literature mainly focuses on the long-tailed distribution problem, whereas it overlooks another source of bias, i.e., semantic confusion, which makes the SGG model prone to yield false predictions for similar relationships. In this paper, we explore a debiasing procedure for the SGG task leveraging causal inference. Our central insight is that the Sparse Mechanism Shift (SMS) in causality allows independent intervention on multiple biases, thereby potentially preserving head category performance while pursuing the prediction of high-informative tail relationships. However, the noisy datasets lead to unobserved confounders for the SGG task, and thus the constructed causal models are always causal-insufficient to benefit from SMS. To remedy this, we propose Two-stage Causal Modeling (TsCM) for the SGG task, which takes the long-tailed distribution and semantic confusion as confounders to the Structural Causal Model (SCM) and then decouples the causal intervention into two stages. The first stage is causal representation learning, where we use a novel Population Loss (P-Loss) to intervene in the semantic confusion confounder. The second stage introduces the Adaptive Logit Adjustment (AL-Adjustment) to eliminate the long-tailed distribution confounder to complete causal calibration learning. These two stages are model agnostic and thus can be used in any SGG model that seeks unbiased predictions. Comprehensive experiments conducted on the popular SGG backbones and benchmarks show that our TsCM can achieve state-of-the-art performance in terms of mean recall rate. Furthermore, TsCM can maintain a higher recall rate than other debiasing methods, which indicates that our method can achieve a better tradeoff between head and tail relationships. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 17 pages, 9 figures. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2307.03476 [pdf, other]

Unpaired Multi-View Graph Clustering with Cross-View Structure Matching

Authors: Yi Wen, Siwei Wang, Qing Liao, Weixuan Liang, Ke Liang, Xinhang Wan, Xinwang Liu

Abstract: Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, which means that the mappings of all corresponding samples between views are pre-defined or given in advance. However, the data correspondence is often incomplete in real-world applica… ▽ More Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, which means that the mappings of all corresponding samples between views are pre-defined or given in advance. However, the data correspondence is often incomplete in real-world applications due to data corruption or sensor differences, referred as the data-unpaired problem (DUP) in multi-view literature. Although several attempts have been made to address the DUP issue, they suffer from the following drawbacks: 1) Most methods focus on the feature representation while ignoring the structural information of multi-view data, which is essential for clustering tasks; 2) Existing methods for partially unpaired problems rely on pre-given cross-view alignment information, resulting in their inability to handle fully unpaired problems; 3) Their inevitable parameters degrade the efficiency and applicability of the models. To tackle these issues, we propose a novel parameter-free graph clustering framework termed Unpaired Multi-view Graph Clustering framework with Cross-View Structure Matching (UPMGC-SM). Specifically, unlike the existing methods, UPMGC-SM effectively utilizes the structural information from each view to refine cross-view correspondences. Besides, our UPMGC-SM is a unified framework for both the fully and partially unpaired multi-view graph clustering. Moreover, existing graph clustering methods can adopt our UPMGC-SM to enhance their ability for unpaired scenarios. Extensive experiments demonstrate the effectiveness and generalization of our proposed framework for both paired and unpaired datasets. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 15 pages

arXiv:2306.11552 [pdf, other]

doi 10.1109/OJCOMS.2023.3273310

Inter-Cell Network Slicing With Transfer Learning Empowered Multi-Agent Deep Reinforcement Learning

Authors: Tianlun Hu, Qi Liao, Qiang Liu, Georg Carle

Abstract: Network slicing enables operators to efficiently support diverse applications on a common physical infrastructure. The ever-increasing densification of network deployment leads to complex and non-trivial inter-cell interference, which requires more than inaccurate analytic models to dynamically optimize resource management for network slices. In this paper, we develop a DIRP algorithm with multipl… ▽ More Network slicing enables operators to efficiently support diverse applications on a common physical infrastructure. The ever-increasing densification of network deployment leads to complex and non-trivial inter-cell interference, which requires more than inaccurate analytic models to dynamically optimize resource management for network slices. In this paper, we develop a DIRP algorithm with multiple deep reinforcement learning (DRL) agents to cooperatively optimize resource partition in individual cells to fulfill the requirements of each slice, based on two alternative reward functions. Nevertheless, existing DRL approaches usually tie the pretrained model parameters to specific network environments with poor transferability, which raises practical deployment concerns in large-scale mobile networks. Hence, we design a novel transfer learning-aided DIRP (TL-DIRP) algorithm to ease the transfer of DIRP agents across different network environments in terms of sample efficiency, model reproducibility, and algorithm scalability. The TL-DIRP algorithm first centrally trains a generalized model and then transfers the "generalist" to each local agent as "specialist" with distributed finetuning and execution. TL-DIRP consists of two steps: 1) centralized training of a generalized distributed model, 2) transferring the "generalist" to each "specialist" with distributed finetuning and execution. The numerical results show that not only DIRP outperforms existing baseline approaches in terms of faster convergence and higher reward, but more importantly, TL-DIRP significantly improves the service performance, with reduced exploration cost, accelerated convergence rate, and enhanced model reproducibility. As compared to a traffic-aware baseline, TL-DIRP provides about 15% less violation ratio of the quality of service (QoS) for the worst slice service and 8.8% less violation on the average service QoS. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 14 pages, 14 figures, IEEE Open Journal of the Communications Society

Journal ref: Volume 4, 2023, Pages 1141 - 1155

arXiv:2306.06935 [pdf, other]

LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability Types

Authors: Xin-Cheng Wen, Cuiyun Gao, Feng Luo, Haoyu Wang, Ge Li, Qing Liao

Abstract: Prior studies generally focus on software vulnerability detection and have demonstrated the effectiveness of Graph Neural Network (GNN)-based approaches for the task. Considering the various types of software vulnerabilities and the associated different degrees of severity, it is also beneficial to determine the type of each vulnerable code for developers. In this paper, we observe that the distri… ▽ More Prior studies generally focus on software vulnerability detection and have demonstrated the effectiveness of Graph Neural Network (GNN)-based approaches for the task. Considering the various types of software vulnerabilities and the associated different degrees of severity, it is also beneficial to determine the type of each vulnerable code for developers. In this paper, we observe that the distribution of vulnerability type is long-tailed in practice, where a small portion of classes have massive samples (i.e., head classes) but the others contain only a few samples (i.e., tail classes). Directly adopting previous vulnerability detection approaches tends to result in poor detection performance, mainly due to two reasons. First, it is difficult to effectively learn the vulnerability representation due to the over-smoothing issue of GNNs. Second, vulnerability types in tails are hard to be predicted due to the extremely few associated samples.To alleviate these issues, we propose a Long-taIled software VulnerABiLity typE classification approach, called LIVABLE. LIVABLE mainly consists of two modules, including (1) vulnerability representation learning module, which improves the propagation steps in GNN to distinguish node representations by a differentiated propagation method. A sequence-to-sequence model is also involved to enhance the vulnerability representations. (2) adaptive re-weighting module, which adjusts the learning weights for different types according to the training epochs and numbers of associated samples by a novel training loss. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Showing 1–50 of 285 results for author: Liao, Q