subscribe to arXiv mailings

GWPT: A Green Word-Embedding-based POS Tagger

Authors: Chengwei Wei, Runqi Pang, C. -C. Jay Kuo

Abstract: As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) fe… ▽ More As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.16660 [pdf, other]

doi 10.1007/JHEP06(2024)125

Unveiling chiral states in the XXZ chain: Finite-size scaling probing symmetry-enriched $c=1$ conformal field theories

Authors: Chenan Wei, Vagharsh V. Mkhitaryan, Tigran A. Sedrakyan

Abstract: We study the low-energy properties of the one-dimensional spin-1/2 XXZ chain with time-reversal symmetry-breaking pseudo-scalar chiral interaction and propose a phase diagram for the model. In the integrable case of the isotropic Heisenberg model with the chiral interaction, we employ the thermodynamic Bethe ansatz to find "chiralization", the response of the ground state versus the strength of th… ▽ More We study the low-energy properties of the one-dimensional spin-1/2 XXZ chain with time-reversal symmetry-breaking pseudo-scalar chiral interaction and propose a phase diagram for the model. In the integrable case of the isotropic Heisenberg model with the chiral interaction, we employ the thermodynamic Bethe ansatz to find "chiralization", the response of the ground state versus the strength of the pseudo-scalar chiral interaction of a chiral Heisenberg chain. Unlike the magnetization case, the chirality of the ground state remains zero until the transition point corresponding to critical coupling $α_c=2J/π$ with $J$ being the antiferromagnetic spin-exchange interaction. The central-charge $c=1$ conformal field theories (CFTs) describe the two phases with zero and finite chirality. We show for this particular case and conjecture more generally for similar phase transitions that the difference between two emergent CFTs with identical central charges lies in the symmetry of their ground state (lightest weight) primary fields, i.e., the two phases are symmetry-enriched CFTs. At finite but small temperatures, the non-chiral Heisenberg phase acquires a finite chirality that scales with the temperature quadratically. We show that the finite-size effect around the transition point probes the transition. △ Less

Submitted 20 June, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: 33 pages, 8 figures

Journal ref: J. High Energ. Phys. 2024, 125 (2024)

arXiv:2312.16430 [pdf, other]

Preference as Reward, Maximum Preference Optimization with Importance Sampling

Authors: Zaifan Jiang, Xing Huang, Chao Wei

Abstract: Preference learning is a key technology for aligning language models with human values. Reinforcement Learning from Human Feedback (RLHF) is a model-based algorithm to optimize preference learning, which first fits a reward model for preference scores and then optimizes the generating policy with an on-policy PPO algorithm to maximize the reward. The processing of RLHF is complex, time-consuming,… ▽ More Preference learning is a key technology for aligning language models with human values. Reinforcement Learning from Human Feedback (RLHF) is a model-based algorithm to optimize preference learning, which first fits a reward model for preference scores and then optimizes the generating policy with an on-policy PPO algorithm to maximize the reward. The processing of RLHF is complex, time-consuming, and unstable. The Direct Preference Optimization (DPO) algorithm uses an off-policy algorithm to directly optimize the generating policy and eliminates the need for a reward model. DPO is more data-efficient and stable. However, DPO has a drawback of overfitting to the preference data and ignoring the KL-regularization term when the preference is deterministic. Identity mapping Preference Optimization(IPO) uses a root-finding MSE loss to incorporate KL-regularization. However, both DPO and IPO fail to properly address the KL-regularization term because the support of the preference distribution is not equal to the reference distribution. In this paper, we propose a simple and intuitive off-policy preference optimization algorithm from an importance sampling view, which we call Maximum Preference Optimization (MPO). MPO incorporates the off-policy KL-regularization term, making regularization truly effective. MPO achieves the best of both worlds by combining the objectives of RLHF and IPO while being an off-policy algorithm. Furthermore, MPO eliminates the need for a reward model and reference policy, simplifying the learning process and reducing memory usage. △ Less

Submitted 25 March, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.16374 [pdf, other]

LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis

Authors: Jinwen He, Yujia Gong, Kai Chen, Zijin Lin, Chengan Wei, Yue Zhao

Abstract: Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce th… ▽ More Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency. △ Less

Submitted 29 December, 2023; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.14867 [pdf, other]

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

Authors: Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

Abstract: In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIEScore, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. VIEScore leverages general knowledge from Multimodal Large Language M… ▽ More In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIEScore, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. VIEScore leverages general knowledge from Multimodal Large Language Models (MLLMs) as the backbone and does not require training or fine-tuning. We evaluate VIEScore on seven prominent tasks in conditional image tasks and found: (1) VIEScore (GPT4-o) achieves a high Spearman correlation of 0.4 with human evaluations, while the human-to-human correlation is 0.45. (2) VIEScore (with open-source MLLM) is significantly weaker than GPT-4o and GPT-4v in evaluating synthetic images. (3) VIEScore achieves a correlation on par with human ratings in the generation tasks but struggles in editing tasks. With these results, we believe VIEScore shows its great potential to replace human judges in evaluating image synthesis tasks. △ Less

Submitted 3 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Accepted to ACL2024 main

arXiv:2312.11420 [pdf, other]

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Authors: Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, Cihang Xie

Abstract: This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreov… ▽ More This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreover, when benchmarked against other tuning approaches like full parameter finetuning or LoRA, its benefits on efficiency are substantial. For example, when compared to LoRA on a 13B model scale, performance can be enhanced by an average of over 20% across five multi-modal tasks, and meanwhile, results in a significant reduction of trainable parameters by 41.9% and a decrease in GPU memory usage by 17.6%. On top of this LayerNorm strategy, we showcase that selectively tuning only with conversational data can improve efficiency further. Beyond these empirical outcomes, we provide a comprehensive analysis to explore the role of LayerNorm in adapting LLMs to the multi-modal domain and improving the expressive power of the model. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: The first two authors contributed equally

arXiv:2312.04547 [pdf, other]

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Authors: Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

Abstract: In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models perso… ▽ More In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-the-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, a motion captioning module further allows the virtual character to recognize and appropriately respond to human players' actions. Homepage: https://digital-life-project.com/ △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Homepage: https://digital-life-project.com/

arXiv:2311.18320 [pdf, other]

BN-embedded monolayer graphene with tunable electronic and topological properties

Authors: Chih-Piao Chuu, Wei-En Tseng, Kuan-Hung Liu, Ching-Ming Wei, Mei-Yin Chou

Abstract: Finding an effective and controllable way to create a sizable energy gap in graphene-based systems has been a challenging topic of intensive research. We propose that the hybrid of boron nitride and graphene (h-BNC) at low BN doping serves as an ideal platform for band-gap engineering and valleytronic applications. We report a systematic first-principles study of the atomic configurations and band… ▽ More Finding an effective and controllable way to create a sizable energy gap in graphene-based systems has been a challenging topic of intensive research. We propose that the hybrid of boron nitride and graphene (h-BNC) at low BN doping serves as an ideal platform for band-gap engineering and valleytronic applications. We report a systematic first-principles study of the atomic configurations and band gap opening for energetically favorable BN patches embedded in graphene. Based on first-principles calculations, we construct a tight-binding model to simulate general doping configurations in large supercells. Unexpectedly, the calculations find a linear dependence of the band gap on the effective BN concentration at low doping, arising from an induced effective on-site energy difference at the two C sublattices as they are substituted by B and N dopants alternately. The significant and tunable band gap of a few hundred meVs, with preserved topological properties of graphene and feasible sample preparation in the laboratory, presents great opportunities to realize valley physics applications in graphene systems at room temperature. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.17136 [pdf, other]

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Authors: Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen

Abstract: Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image. To approach such different information-seeking demands, we introduce UniIR, a unified instruction-guided multimodal re… ▽ More Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image. To approach such different information-seeking demands, we introduce UniIR, a unified instruction-guided multimodal retriever capable of handling eight distinct retrieval tasks across modalities. UniIR, a single retrieval system jointly trained on ten diverse multimodal-IR datasets, interprets user instructions to execute various retrieval tasks, demonstrating robust performance across existing datasets and zero-shot generalization to new tasks. Our experiments highlight that multi-task training and instruction tuning are keys to UniIR's generalization ability. Additionally, we construct the M-BEIR, a multimodal retrieval benchmark with comprehensive results, to standardize the evaluation of universal multimodal information retrieval. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Our code and dataset are available on this project page: https://tiger-ai-lab.github.io/UniIR/

arXiv:2311.16502 [pdf, other]

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

Abstract: We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and… ▽ More We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 14 open-source LMMs as well as the proprietary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence. △ Less

Submitted 13 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: CVPR 2024 Oral

arXiv:2311.15551 [pdf, other]

Instruct2Attack: Language-Guided Semantic Adversarial Attacks

Authors: Jiang Liu, Chen Wei, Yuxiang Guo, Heng Yu, Alan Yuille, Soheil Feizi, Chun Pong Lau, Rama Chellappa

Abstract: We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing no… ▽ More We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing noise-based and semantic attacks, I2A generates more natural and diverse adversarial examples while providing better controllability and interpretability. We further automate the attack process with GPT-4 to generate diverse image-specific text instructions. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses, and demonstrate great transferability among a variety of network architectures. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: under submission, code coming soon

arXiv:2311.12666 [pdf, other]

SSVEP-DAN: A Data Alignment Network for SSVEP-based Brain Computer Interfaces

Authors: Sung-Yu Chen, Chi-Min Chang, Kuan-Jung Chiang, Chun-Shu Wei

Abstract: Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neur… ▽ More Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neural network model designed for aligning SSVEP data across different domains, which can encompass various sessions, subjects, or devices. Our experimental results across multiple cross-domain scenarios demonstrate SSVEP-DAN's capability to transform existing source SSVEP data into supplementary calibration data, significantly enhancing SSVEP decoding accuracy in scenarios with limited calibration data. We envision SSVEP-DAN as a catalyst for practical SSVEP-based BCI applications with minimal calibration. The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.08994 [pdf]

Thickness dependent mechanical properties of soft ferromagnetic two-dimensional CoTe2

Authors: Surbhi Slathia, Cencen Wei, Manoj Tripathi, Raphael Tromer, Solomon Demiss Negedu, Conor Boland, Suman Sarkar, Douglas S. Galvao, Alan Dalton, Chandra Sekhar Tiwary

Abstract: Two dimensional (2D) layered transition-metal-based tellurides (chalcogens) are known to harness their surface atoms characteristics to enhance topographical activities for energy conversion, storage, and magnetic applications. High surface energy due to unsaturated dangling bonds and larger lateral size than the thickness (volume) makes them a potential candidate for emerging electronics. Neverth… ▽ More Two dimensional (2D) layered transition-metal-based tellurides (chalcogens) are known to harness their surface atoms characteristics to enhance topographical activities for energy conversion, storage, and magnetic applications. High surface energy due to unsaturated dangling bonds and larger lateral size than the thickness (volume) makes them a potential candidate for emerging electronics. Nevertheless, the gradual stacking of each sheet alters the surface atoms' subtle features, such as lattice expansion, leading to several phenomena and rendering tunable properties. In the present work, we have monitored thickness-dependent properties of the 2D CoTe2 sheets from nanoscale mechanics, tribology, surface potential distributions, interfacial interaction and magnetism using atomically resolved spectroscopy and different surface probe techniques, in conjunction with theoretical investigations: density functional theory (DFT) and molecular dynamics (MD). The variation in properties observed in theoretical investigation unleashes the crucial role of crystal planes of the CoTe2. The presented results are beneficial in expanding the use of 2D telluride family in flexible electronics, piezo sensors, tribo-generator, and next-generation memory devices. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.00618 [pdf, other]

De-Diffusion Makes Text a Strong Cross-Modal Interface

Authors: Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu

Abstract: We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is traine… ▽ More We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is trained to transform an input image into text, which is then fed into the fixed text-to-image diffusion decoder to reconstruct the original input -- a process we term De-Diffusion. Experiments validate both the precision and comprehensiveness of De-Diffusion text representing images, such that it can be readily ingested by off-the-shelf text-to-image tools and LLMs for diverse multi-modal tasks. For example, a single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools, and also achieves a new state of the art on open-ended vision-language tasks by simply prompting large language models with few-shot examples. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Technical report. Project page: https://dediffusion.github.io

arXiv:2310.18891 [pdf, other]

Social Interaction-Aware Dynamical Models and Decision Making for Autonomous Vehicles

Authors: Luca Crosato, Kai Tian, Hubert P. H Shum, Edmond S. L. Ho, Yafei Wang, Chongfeng Wei

Abstract: Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current s… ▽ More Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current state of IAAD research is surveyed in this work. Commencing with an examination of terminology, attention is drawn to challenges and existing models employed for modelling the behaviour of drivers and pedestrians. Next, a comprehensive review is conducted on various techniques proposed for interaction modelling, encompassing cognitive methods, machine learning approaches, and game-theoretic methods. The conclusion is reached through a discussion of potential advantages and risks associated with IAAD, along with the illumination of pivotal research inquiries necessitating future exploration. △ Less

Submitted 30 October, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.17380 [pdf, ps, other]

Bott Vanishing via Hodge Theory

Authors: Chuanhao Wei

Abstract: In this paper, we revise the Bott Vanishing on projective toric varieties by giving it an alternative proof with a condition that is compatible with the condition of Kawamata-Viehweg Vanishing. This proof can also be adapted to generalize Bott Vanishing to the setting using mixed Hodge modules. Lastly, we give a counter-example towards the relative Bott Vanishing for birational morphisms. In this paper, we revise the Bott Vanishing on projective toric varieties by giving it an alternative proof with a condition that is compatible with the condition of Kawamata-Viehweg Vanishing. This proof can also be adapted to generalize Bott Vanishing to the setting using mixed Hodge modules. Lastly, we give a counter-example towards the relative Bott Vanishing for birational morphisms. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 8 pages; suggestions are welcome

MSC Class: 14M25; 14F17; 14C30

arXiv:2310.11550 [pdf, ps, other]

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

Authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert

Abstract: We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret performance compared to existing approaches. The first algorithm, although computationally inefficient, ensures a regret of… ▽ More We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret performance compared to existing approaches. The first algorithm, although computationally inefficient, ensures a regret of $\widetilde{\mathcal{O}}\left(\sqrt{K}\right)$, where $K$ is the number of episodes. This is the first result with the optimal $K$ dependence in the considered setting. The second algorithm, which is based on the policy optimization framework, guarantees a regret of $\widetilde{\mathcal{O}}\left(K^{\frac{3}{4}} \right)$ and is computationally efficient. Both our results significantly improve over the state-of-the-art: a computationally inefficient algorithm by Kong et al. [2023] with $\widetilde{\mathcal{O}}\left(K^{\frac{4}{5}}+poly\left(\frac{1}{λ_{\min}}\right) \right)$ regret, for some problem-dependent constant $λ_{\min}$ that can be arbitrarily close to zero, and a computationally efficient algorithm by Sherman et al. [2023b] with $\widetilde{\mathcal{O}}\left(K^{\frac{6}{7}} \right)$ regret. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.04642 [pdf, ps, other]

On two conjectural series involving Riemann zeta function

Authors: Chuanan Wei, Ce Xu

Abstract: Riemann zeta function is important in a lot of branches of number theory. With the help of the operator method and several transformation formulas for hypergeometric series, we prove four series involving Riemann zeta function. Two of them are series expansions for $ζ(7)$ and $ζ(3)^2$ recently conjectured by Z.-W. Sun. Riemann zeta function is important in a lot of branches of number theory. With the help of the operator method and several transformation formulas for hypergeometric series, we prove four series involving Riemann zeta function. Two of them are series expansions for $ζ(7)$ and $ζ(3)^2$ recently conjectured by Z.-W. Sun. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.17448 [pdf, other]

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Authors: Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

Abstract: Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and tra… ▽ More Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments. 1) For the data scaling, we perform a systematic investigation on 32 EHPS datasets, including a wide range of scenarios that a model trained on any single dataset cannot handle. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. 2) For the model scaling, we take advantage of vision transformers to study the scaling law of model sizes in EHPS. Moreover, our finetuning strategy turn SMPLer-X into specialist models, allowing them to achieve further performance boosts. Notably, our foundation model SMPLer-X consistently delivers state-of-the-art results on seven benchmarks such as AGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF (62.3 mm PVE without finetuning). Homepage: https://caizhongang.github.io/projects/SMPLer-X/ △ Less

Submitted 30 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: Homepage: https://caizhongang.github.io/projects/SMPLer-X/

arXiv:2309.10670 [pdf, other]

Symmetry considerations in exact diagonalization: spin-1/2 pyrochlore magnets

Authors: C. Wei, S. H. Curnoe

Abstract: We describe how the methods of group theory (symmetry) are used to optimize the problem of exact diagonalization of a quantum system on a 16-site pyrochlore lattice. By analytically constructing a complete set of symmetrized states, we completely block-diagonalize the Hamiltonian. As an example, we consider a spin-1/2 system with nearest neighbour exchange interactions. We describe how the methods of group theory (symmetry) are used to optimize the problem of exact diagonalization of a quantum system on a 16-site pyrochlore lattice. By analytically constructing a complete set of symmetrized states, we completely block-diagonalize the Hamiltonian. As an example, we consider a spin-1/2 system with nearest neighbour exchange interactions. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.08836 [pdf, other]

Bias and Fairness in Chatbots: An Overview

Authors: Jintang Xue, Yun-Cheng Wang, Chengwei Wei, Xiaofeng Liu, Jonghye Woo, C. -C. Jay Kuo

Abstract: Chatbots have been studied for more than half a century. With the rapid development of natural language processing (NLP) technologies in recent years, chatbots using large language models (LLMs) have received much attention nowadays. Compared with traditional ones, modern chatbots are more powerful and have been used in real-world applications. There are however, bias and fairness concerns in mode… ▽ More Chatbots have been studied for more than half a century. With the rapid development of natural language processing (NLP) technologies in recent years, chatbots using large language models (LLMs) have received much attention nowadays. Compared with traditional ones, modern chatbots are more powerful and have been used in real-world applications. There are however, bias and fairness concerns in modern chatbot design. Due to the huge amounts of training data, extremely large model sizes, and lack of interpretability, bias mitigation and fairness preservation of modern chatbots are challenging. Thus, a comprehensive overview on bias and fairness in chatbot systems is given in this paper. The history of chatbots and their categories are first reviewed. Then, bias sources and potential harms in applications are analyzed. Considerations in designing fair and unbiased chatbot systems are examined. Finally, future research directions are discussed. △ Less

Submitted 10 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.07120 [pdf, other]

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

Authors: Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie

Abstract: Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses. While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. In this study, we get out of the box and unveil an intriguing characteristic of MLLMs -- our prelimi… ▽ More Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses. While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. In this study, we get out of the box and unveil an intriguing characteristic of MLLMs -- our preliminary results suggest that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment in the pure NLP context. For example, a visual-instruction-tuned LLaMA2 7B model surpasses the performance of the LLaMA2-chat 7B model, fine-tuned with over one million human annotations, on TruthfulQA-mc and Ethics benchmarks. Further analysis reveals that the improved alignment can be attributed to the superior instruction quality inherent to visual-text data. In releasing our code at github.com/UCSC-VLAA/Sight-Beyond-Text, we aspire to foster further exploration into the intrinsic value of visual-text synergies and, in a broader scope, multi-modal interactions in alignment research. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.01560 [pdf]

Quasi 1D Nanobelts from the Sustainable Liquid Exfoliation of Terrestrial Minerals for Future Martian based Electronics

Authors: Cencen Wei, Abhijit Roy, Adel K. A. Aljarid, Yi Hu, S. Mark Roe, Dimitrios G. Papageorgiou, Raul Arenal, Conor S. Boland

Abstract: The sky is the limit with regards to the societal impact nanomaterials can have on our lives. However, in this study we show that their potential is out of this world. The planet Mars has an abundant source of calcium sulfate minerals and in our work, we show that these deposits can be the basis of transformative nanomaterials to potentially support future space endeavors. Through a scalable eco-f… ▽ More The sky is the limit with regards to the societal impact nanomaterials can have on our lives. However, in this study we show that their potential is out of this world. The planet Mars has an abundant source of calcium sulfate minerals and in our work, we show that these deposits can be the basis of transformative nanomaterials to potentially support future space endeavors. Through a scalable eco-friendly liquid processing technique performed on two common terrestrial gypsum, our simple method presented a cost-efficient procedure to yield the commercially valuable intermediate phase of gypsum, known as bassanite. Through the liquid exfoliation of bassanite powders, suspensions of large aspect ratio anhydrite nanobelts with long-term stability were characterized through scanning electron microscopy and Raman spectroscopy. Transmission electron microscopy showed nanobelts to have a mesocrystal structure, with distinct nanoparticle constituents making up the lattice. Unexpectedly, anhydrite nanobelts had remarkable electronic properties, namely a bandgap that was easily tuned between semiconducting (~2.2 eV) and insulating (~4 eV) behaviors through dimensional control measured via atomic force microscopy. To demonstrate the application potential of our nanobelts; optoelectronic, electrochemical and nanocomposite measurements were made. For the hydrogen evolution reaction and mechanical reinforcement, selenite-based anhydrite nanobelts displayed superlative performances. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.00814 [pdf, ps, other]

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

Authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert

Abstract: We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. Existing methods for this problem either require access to a simulator to generate free i.i.d. contexts, achieve a sub-optimal regret no better than $\widetilde{O}(T^{\frac{5}{6}})$, or are computat… ▽ More We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. Existing methods for this problem either require access to a simulator to generate free i.i.d. contexts, achieve a sub-optimal regret no better than $\widetilde{O}(T^{\frac{5}{6}})$, or are computationally inefficient. We greatly improve these results by achieving a regret of $\widetilde{O}(\sqrt{T})$ without a simulator, while maintaining computational efficiency when the action set in each round is small. In the special case of sleeping bandits with adversarial loss and stochastic arm availability, our result answers affirmatively the open question by Saha et al. [2020] on whether there exists a polynomial-time algorithm with $poly(d)\sqrt{T}$ regret. Our approach naturally handles the case where the loss is linear up to an additive misspecification error, and our regret shows near-optimal dependence on the magnitude of the error. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2309.00348 [pdf, other]

doi 10.1007/978-3-031-41498-5_10

MuraNet: Multi-task Floor Plan Recognition with Relation Attention

Authors: Lingxiao Huang, Jung-Hsuan Wu, Chiching Wei, Wilson Li

Abstract: The recognition of information in floor plan data requires the use of detection and segmentation models. However, relying on several single-task models can result in ineffective utilization of relevant information when there are multiple tasks present simultaneously. To address this challenge, we introduce MuraNet, an attention-based multi-task model for segmentation and detection tasks in floor p… ▽ More The recognition of information in floor plan data requires the use of detection and segmentation models. However, relying on several single-task models can result in ineffective utilization of relevant information when there are multiple tasks present simultaneously. To address this challenge, we introduce MuraNet, an attention-based multi-task model for segmentation and detection tasks in floor plan data. In MuraNet, we adopt a unified encoder called MURA as the backbone with two separated branches: an enhanced segmentation decoder branch and a decoupled detection head branch based on YOLOX, for segmentation and detection tasks respectively. The architecture of MuraNet is designed to leverage the fact that walls, doors, and windows usually constitute the primary structure of a floor plan's architecture. By jointly training the model on both detection and segmentation tasks, we believe MuraNet can effectively extract and utilize relevant features for both tasks. Our experiments on the CubiCasa5k public dataset show that MuraNet improves convergence speed during training compared to single-task models like U-Net and YOLOv3. Moreover, we observe improvements in the average AP and IoU in detection and segmentation tasks, respectively.Our ablation experiments demonstrate that the attention-based unified backbone of MuraNet achieves better feature extraction in floor plan recognition tasks, and the use of decoupled multi-head branches for different tasks further improves model performance. We believe that our proposed MuraNet model can address the disadvantages of single-task models and improve the accuracy and efficiency of floor plan data recognition. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Document Analysis and Recognition - ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham

arXiv:2309.00218 [pdf, other]

Catastrophic Emission of Charges from Near-Extremal Nariai Black Holes

Authors: Chiang-Mei Chen, Chun-Chih Huang, Sang Pyo Kim, Chun-Yu Wei

Abstract: Using the in-out formalism and also the monodromy method, we study the emission of charges from near-extremal charged Nariai black holes with the black hole and cosmological horizons close to each other. The emission becomes catastrophic for a charge with energy greater than its chemical potential, whose leading exponential factor increases inversely proportional to the separation of two horizons.… ▽ More Using the in-out formalism and also the monodromy method, we study the emission of charges from near-extremal charged Nariai black holes with the black hole and cosmological horizons close to each other. The emission becomes catastrophic for a charge with energy greater than its chemical potential, whose leading exponential factor increases inversely proportional to the separation of two horizons. This implies that near-extremal Nariai black holes quickly evaporate through the charge emission and end in the de Sitter space, in contrast to near-extremal RN-dS black holes that have the Breitenlohner-Friedman bound below which they become stable against Hawking radiation and Schwinger effect of charge emission. We illuminate the origin of the catastrophic emission in the phase-integral formulation by comparing near-extremal charged Nariai black holes with near-extremal RN-dS black holes. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: 15 pages

arXiv:2308.14492 [pdf, other]

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

Authors: Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

Abstract: Human pose and shape estimation (HPS) has attracted increasing attention in recent years. While most existing studies focus on HPS from 2D images or videos with inherent depth ambiguity, there are surging need to investigate HPS from 3D point clouds as depth sensors have been frequently employed in commercial devices. However, real-world sensory 3D points are usually noisy and incomplete, and also… ▽ More Human pose and shape estimation (HPS) has attracted increasing attention in recent years. While most existing studies focus on HPS from 2D images or videos with inherent depth ambiguity, there are surging need to investigate HPS from 3D point clouds as depth sensors have been frequently employed in commercial devices. However, real-world sensory 3D points are usually noisy and incomplete, and also human bodies could have different poses of high diversity. To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture. Specifically, each stage of PointHPS performs a series of downsampling and upsampling operations to extract and collate both local and global cues, which are further enhanced by two novel modules: 1) Cross-stage Feature Fusion (CFF) for multi-scale feature propagation that allows information to flow effectively through the stages, and 2) Intermediate Feature Enhancement (IFE) for body-aware feature aggregation that improves feature quality after each stage. To facilitate a comprehensive study under various scenarios, we conduct our experiments on two large-scale benchmarks, comprising i) a dataset that features diverse subjects and actions captured by real commercial sensors in a laboratory environment, and ii) controlled synthetic data generated with realistic considerations such as clothed humans in crowded outdoor scenes. Extensive experiments demonstrate that PointHPS, with its powerful point feature extraction and processing scheme, outperforms State-of-the-Art methods by significant margins across the board. Homepage: https://caizhongang.github.io/projects/PointHPS/. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.13904 [pdf, other]

doi 10.14722/ndss.2024.23238

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Authors: Chengkun Wei, Wenlong Meng, Zhikun Zhang, Min Chen, Minghu Zhao, Wenjing Fang, Lei Wang, Zihui Zhang, Wenzhi Chen

Abstract: Prompt-tuning has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that prompt-tuning is vulnerable to downstream task-agnostic backdoors, which reside in the pretrained models and can affect arbitrary downstream tasks. The state-of-the-ar… ▽ More Prompt-tuning has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that prompt-tuning is vulnerable to downstream task-agnostic backdoors, which reside in the pretrained models and can affect arbitrary downstream tasks. The state-of-the-art backdoor detection approaches cannot defend against task-agnostic backdoors since they hardly converge in reversing the backdoor triggers. To address this issue, we propose LMSanitator, a novel approach for detecting and removing task-agnostic backdoors on Transformer models. Instead of directly inverting the triggers, LMSanitator aims to invert the predefined attack vectors (pretrained models' output when the input is embedded with triggers) of the task-agnostic backdoors, which achieves much better convergence performance and backdoor detection accuracy. LMSanitator further leverages prompt-tuning's property of freezing the pretrained model to perform accurate and fast output monitoring and input purging during the inference phase. Extensive experiments on multiple language models and NLP tasks illustrate the effectiveness of LMSanitator. For instance, LMSanitator achieves 92.8% backdoor detection accuracy on 960 models and decreases the attack success rate to less than 1% in most scenarios. △ Less

Submitted 14 October, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

Comments: To Appear in the Network and Distributed System Security (NDSS) Symposium 2024, 26 February - 1 March 2024, San Diego, CA, USA; typos corrected

arXiv:2308.13465 [pdf]

doi 10.1002/adma.202311157

Ptychographic nanoscale imaging of the magnetoelectric coupling in freestanding BiFeO$_3$

Authors: Tim A. Butcher, Nicholas W. Phillips, Chun-Chien Chiu, Chia-Chun Wei, Sheng-Zhu Ho, Yi-Chun Chen, Erik Fröjdh, Filippo Baruffaldi, Maria Carulla, Jiaguo Zhang, Anna Bergamaschi, Carlos A. F. Vaz, Armin Kleibert, Simone Finizio, Jan-Chi Yang, Shih-Wen Huang, Jörg Raabe

Abstract: Understanding the magnetic and ferroelectric ordering of magnetoelectric multiferroic materials at the nanoscale necessitates a versatile imaging method with high spatial resolution. Here, soft X-ray ptychography is employed to simultaneously image the ferroelectric and antiferromagnetic domains in an 80 nm thin freestanding film of the room-temperature multiferroic BiFeO$_3$ (BFO). The antiferrom… ▽ More Understanding the magnetic and ferroelectric ordering of magnetoelectric multiferroic materials at the nanoscale necessitates a versatile imaging method with high spatial resolution. Here, soft X-ray ptychography is employed to simultaneously image the ferroelectric and antiferromagnetic domains in an 80 nm thin freestanding film of the room-temperature multiferroic BiFeO$_3$ (BFO). The antiferromagnetic spin cycloid of period 64 nm is resolved by reconstructing the corresponding resonant elastic X-ray scattering in real space and visualized together with mosaic-like ferroelectric domains in a linear dichroic contrast image at the Fe L$_3$ edge. The measurements reveal a near perfect coupling between the antiferromagnetic and ferroelectric ordering by which the propagation direction of the spin cycloid is locked orthogonally to the ferroelectric polarization. In addition, the study evinces both a preference for in-plane propagation of the spin cycloid and changes of the ferroelectric polarization by 71° between multiferroic domains in the epitaxial strain-free, freestanding BFO film. The results provide a direct visualization of the strong magnetoelectric coupling in BFO and of its fine multiferroic domain structure, emphasizing the potential of ptychographic imaging for the study of multiferroics and non-collinear magnetic materials with soft X-rays. △ Less

Submitted 29 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: Supporting information available with published version: https://doi.org/10.1002/adma.202311157

Journal ref: Adv. Mater. 2024, 36, 2311157

arXiv:2308.08198 [pdf, other]

doi 10.1145/3616855.3635788

DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting

Authors: Tianyu Fu, Chiyue Wei, Yu Wang, Rex Ying

Abstract: We introduce DeSCo, a scalable neural deep subgraph counting pipeline, designed to accurately predict both the count and occurrence position of queries on target graphs post single training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs, greatly reducing the count variation while guaranteeing no missing or double-counting. Secondl… ▽ More We introduce DeSCo, a scalable neural deep subgraph counting pipeline, designed to accurately predict both the count and occurrence position of queries on target graphs post single training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs, greatly reducing the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately count in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137x improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity. Our open source project is at https://github.com/fuvty/DeSCo. △ Less

Submitted 19 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 8 pages main text, 2 pages references, 11 pages appendix; open source at https://github.com/fuvty/DeSCo

ACM Class: I.2.8

Journal ref: WSDM'24, March 4-8, 2024, Merida, Mexico

arXiv:2308.06440 [pdf, ps, other]

On some conjectural series containing harmonic numbers of 3-order

Authors: Chuanan Wei, Ce Xu

Abstract: Harmonic numbers are important in a lot of branches of number theory. By means of the derivative operator, the integral operator, and several summation and transformation formulas for hypergeometric series, we prove four series containing harmonic numbers of 3-order. Three of them are conjectures which were recently proposed by Z.-W. Sun. Harmonic numbers are important in a lot of branches of number theory. By means of the derivative operator, the integral operator, and several summation and transformation formulas for hypergeometric series, we prove four series containing harmonic numbers of 3-order. Three of them are conjectures which were recently proposed by Z.-W. Sun. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.14624 [pdf, other]

FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

Authors: Chengrui Wei, Meng Yang, Lei He, Nanning Zheng

Abstract: It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes. We observe that it is essentially due to not only the scale-ambiguous problem but also the focal-ambiguous problem that decreases the generalization ability of monocular depth estimation. That is, images may be captured by cameras of different focal lengths in scenes of different… ▽ More It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes. We observe that it is essentially due to not only the scale-ambiguous problem but also the focal-ambiguous problem that decreases the generalization ability of monocular depth estimation. That is, images may be captured by cameras of different focal lengths in scenes of different scales. In this paper, we develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes. First, a relative depth estimation network is adopted to learn relative depths from single images with diverse scales/semantics. Second, multi-scale features are generated by mapping a single focal length value to focal length features and concatenating them with intermediate features of different scales in relative depth estimation. Finally, relative depths and multi-scale features are jointly fed into an absolute depth estimation network. In addition, a new pipeline is developed to augment the diversity of focal lengths of public datasets, which are often captured with cameras of the same or similar focal lengths. Our model is trained on augmented NYUDv2 and tested on three unseen datasets. Our model considerably improves the generalization ability of depth estimation by 41%/13% (RMSE) with/without data augmentation compared with five recent SOTAs and well alleviates the deformation problem in 3D reconstruction. Notably, our model well maintains the accuracy of depth estimation on original NYUDv2. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.10455 [pdf, other]

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

Authors: Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth

Abstract: In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a c… ▽ More In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier. △ Less

Submitted 13 November, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.07930 [pdf, other]

GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT

Authors: Yifan Zhang, Cheng Wei, Shangyou Wu, Zhengting He, Wenhao Yu

Abstract: Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks. For example, in the task of facility siting, the Buffer tool is usually first used to locate areas close or away from some specific entities; then, the Intersect or Erase tool is used to select candidate areas satisfied multiple requirements. Though professionals can easily understand an… ▽ More Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks. For example, in the task of facility siting, the Buffer tool is usually first used to locate areas close or away from some specific entities; then, the Intersect or Erase tool is used to select candidate areas satisfied multiple requirements. Though professionals can easily understand and solve these geospatial tasks by sequentially utilizing relevant tools, it is difficult for non-professionals to handle these problems. Recently, Generative Pre-trained Transformer (e.g., ChatGPT) presents strong performance in semantic understanding and reasoning. Especially, AutoGPT can further extend the capabilities of large language models (LLMs) by automatically reasoning and calling externally defined tools. Inspired by these studies, we attempt to lower the threshold of non-professional users to solve geospatial tasks by integrating the semantic understanding ability inherent in LLMs with mature tools within the GIS community. Specifically, we develop a new framework called GeoGPT that can conduct geospatial data collection, processing, and analysis in an autonomous manner with the instruction of only natural language. In other words, GeoGPT is used to understand the demands of non-professional users merely based on input natural language descriptions, and then think, plan, and execute defined GIS tools to output final effective results. Several cases including geospatial data crawling, spatial query, facility siting, and mapping validate the effectiveness of our framework. Though limited cases are presented in this paper, GeoGPT can be further extended to various tasks by equipping with more GIS tools, and we think the paradigm of "foundational plus professional" implied in GeoGPT provides an effective way to develop next-generation GIS in this era of large foundation models. △ Less

Submitted 15 July, 2023; originally announced July 2023.

Comments: 23 pages, 4 figures

arXiv:2306.17170 [pdf, other]

doi 10.36227/techrxiv.23272271

An Overview on Generative AI at Scale with Edge-Cloud Computing

Authors: Yun-Cheng Wang, Jintang Xue, Chengwei Wei, C. -C. Jay Kuo

Abstract: As a specific category of artificial intelligence (AI), generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing fram… ▽ More As a specific category of artificial intelligence (AI), generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing framework due to the need for large computation resources. However, such services will encounter high latency because of data transmission and a high volume of requests. On the other hand, edge-cloud computing can provide adequate computation power and low latency at the same time through the collaboration between edges and the cloud. Thus, it is attractive to build GenAI systems at scale by leveraging the edge-cloud computing paradigm. In this overview paper, we review recent developments in GenAI and edge-cloud computing, respectively. Then, we use two exemplary GenAI applications to discuss technical challenges in scaling up their solutions using edge-cloud collaborative systems. Finally, we list design considerations for training and deploying GenAI systems at scale and point out future research directions. △ Less

Submitted 9 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.12624 [pdf, other]

DreamEdit: Subject-driven Image Editing

Authors: Tianle Li, Max Ku, Cong Wei, Wenhu Chen

Abstract: Subject-driven image generation aims at generating images containing customized subjects, which has recently drawn enormous attention from the research community. However, the previous works cannot precisely control the background and position of the target subject. In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i.e., Subject Replacement and Subject Additi… ▽ More Subject-driven image generation aims at generating images containing customized subjects, which has recently drawn enormous attention from the research community. However, the previous works cannot precisely control the background and position of the target subject. In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i.e., Subject Replacement and Subject Addition. The new tasks are challenging in multiple aspects: replacing a subject with a customized one can change its shape, texture, and color, while adding a target subject to a designated position in a provided scene necessitates a context-aware posture. To conquer these two novel tasks, we first manually curate a new dataset DreamEditBench containing 22 different types of subjects, and 440 source images with different difficulty levels. We plan to host DreamEditBench as a platform and hire trained evaluators for standard human evaluation. We also devise an innovative method DreamEditor to resolve these tasks by performing iterative generation, which enables a smooth adaptation to the customized subject. In this project, we conduct automatic and human evaluations to understand the performance of DreamEditor and baselines on DreamEditBench. For Subject Replacement, we found that the existing models are sensitive to the shape and color of the original subject. The model failure rate will dramatically increase when the source and target subjects are highly different. For Subject Addition, we found that the existing models cannot easily blend the customized subjects into the background smoothly, leading to noticeable artifacts in the generated image. We hope DreamEditBench can become a standard platform to enable future investigations toward building more controllable subject-driven image editing. Our project homepage is https://dreameditbenchteam.github.io/. △ Less

Submitted 16 August, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.11700 [pdf, other]

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

Authors: Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro

Abstract: We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To… ▽ More We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual variable via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs. △ Less

Submitted 16 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 65 pages, 17 figures, and 1 table; NeurIPS 2023

arXiv:2306.11189 [pdf]

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

Authors: Po-Ting Lai, Chih-Hsuan Wei, Ling Luo, Qingyu Chen, Zhiyong Lu

Abstract: Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily… ▽ More Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.06659 [pdf]

Ferromagnetic Superconductivity in Two-dimensional Niobium Diselenide

Authors: Tingyu Qu, Shangjian Jin, Fuchen Hou, Deyi Fu, Junye Huang, Darryl Foo Chuan Wei, Xiao Chang, Kenji Watanabe, Takashi Taniguchi, Junhao Lin, Shaffique Adam, Barbaros Özyilmaz

Abstract: The co-existence of ferromagnetism and superconductivity becomes possible through unconventional pairing in the superconducting state. Such materials are exceedingly rare in solid-state systems but are promising platforms to explore topological phases, such as Majorana bound states. Theoretical investigations date back to the late 1950s, but only a few systems have so far been experimentally ident… ▽ More The co-existence of ferromagnetism and superconductivity becomes possible through unconventional pairing in the superconducting state. Such materials are exceedingly rare in solid-state systems but are promising platforms to explore topological phases, such as Majorana bound states. Theoretical investigations date back to the late 1950s, but only a few systems have so far been experimentally identified as potential hosts. Here, we show that atomically-thin niobium diselenide (NbSe$_2$) intercalated with dilute cobalt atoms spontaneously displays ferromagnetism below the superconducting transition temperature ($T_c$). We elucidate the origin of this phase by constructing a magnetic tunnel junction that consists of cobalt and cobalt-doped niobium diselenide (Co-NbSe$_2$) as the two ferromagnetic electrodes, with an ultra-thin boron nitride as the tunnelling barrier. At a temperature well below $T_c$, the tunnelling magnetoresistance shows a bistable state, suggesting a ferromagnetic order in Co-NbSe$_2$. We propose a RKKY exchange coupling mechanism based on the spin-triplet superconducting order parameter to mediate such ferromagnetism. We further perform non-local lateral spin valve measurements to confirm the origin of the ferromagnetism. The observation of Hanle precession signals show spin diffusion length up to micrometres below Tc, demonstrating an intrinsic spin-triplet nature in superconducting NbSe$_2$. Our discovery of superconductivity-mediated ferromagnetism opens the door to an alternative design of ferromagnetic superconductors △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: 26 pages, 13 figures

arXiv:2306.02641 [pdf, ps, other]

On some conjectural series containing binomial coefficients and harmonic numbers

Authors: Chuanan Wei

Abstract: Binomial coefficients and harmonic numbers are important in many branches of number theory. With the help of the operator method and several summation and transformation formulas for hypergeometric series, we prove eight conjectural series of Z.-W. Sun containing binomial coefficients and harmonic numbers in this paper. Binomial coefficients and harmonic numbers are important in many branches of number theory. With the help of the operator method and several summation and transformation formulas for hypergeometric series, we prove eight conjectural series of Z.-W. Sun containing binomial coefficients and harmonic numbers in this paper. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.01747 [pdf, other]

UMDFood: Vision-language models boost food composition compilation

Authors: Peihua Ma, Yixin Wu, Ning Yu, Yang Zhang, Michael Backes, Qin Wang, Cheng-I Wei

Abstract: Nutrition information is crucial in precision nutrition and the food industry. The current food composition compilation paradigm relies on laborious and experience-dependent methods. However, these methods struggle to keep up with the dynamic consumer market, resulting in delayed and incomplete nutrition data. In addition, earlier machine learning methods overlook the information in food ingredien… ▽ More Nutrition information is crucial in precision nutrition and the food industry. The current food composition compilation paradigm relies on laborious and experience-dependent methods. However, these methods struggle to keep up with the dynamic consumer market, resulting in delayed and incomplete nutrition data. In addition, earlier machine learning methods overlook the information in food ingredient statements or ignore the features of food images. To this end, we propose a novel vision-language model, UMDFood-VL, using front-of-package labeling and product images to accurately estimate food composition profiles. In order to empower model training, we established UMDFood-90k, the most comprehensive multimodal food database to date, containing 89,533 samples, each labeled with image and text-based ingredient descriptions and 11 nutrient annotations. UMDFood-VL achieves the macro-AUCROC up to 0.921 for fat content estimation, which is significantly higher than existing baseline methods and satisfies the practical requirements of food composition compilation. Meanwhile, up to 82.2% of selected products' estimated error between chemical analysis results and model estimation results are less than 10%. This performance sheds light on generalization towards other food and nutrition-related data compilation and catalyzation for the evolution of generative AI-based technology in other food applications that require personalization. △ Less

Submitted 6 November, 2023; v1 submitted 17 May, 2023; originally announced June 2023.

Comments: 13 pages, 9 figures

arXiv:2306.00989 [pdf, other]

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Authors: Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

Abstract: Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraini… ▽ More Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer without losing accuracy. In the process, we create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models while being significantly faster both at inference and during training. We evaluate Hiera on a variety of tasks for image and video recognition. Our code and models are available at https://github.com/facebookresearch/hiera. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: ICML 2023 Oral version. Code+Models: https://github.com/facebookresearch/hiera

arXiv:2305.17380 [pdf, ps, other]

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Authors: Tiancheng Jin, Junyan Liu, Chloé Rouyer, William Chang, Chen-Yu Wei, Haipeng Luo

Abstract: Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossib… ▽ More Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $\widetilde{O}(\sqrt{T} + C^{\textsf{P}})$ regret where $C^{\textsf{P}}$ measures how adversarial the transition functions are and can be at most ${O}(T)$. While this algorithm itself requires knowledge of $C^{\textsf{P}}$, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves $\widetilde{O}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})$ regret, where $U$ is some standard gap-dependent coefficient and $C^{\textsf{L}}$ is the amount of corruption on losses. △ Less

Submitted 26 October, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: Update the camera-ready version for NeurIPS 2023

ACM Class: I.2.6

arXiv:2305.07861 [pdf]

doi 10.37188/lam.2023.030

Ultra-wideband Waveguide-coupled Photodiodes Heterogeneously Integrated on a Thin-film Lithium Niobate Platform

Authors: Chao Wei, Youren Yu, Ziyun Wang, Lin Jiang, Zhongming Zeng, Jia Ye, Xihua Zou, Wei Pan, Xiaojun Xie, Lianshan Yan

Abstract: With the advantages of large electro-optical coefficient, wide transparency window, and strong optical confinement, thin-film lithium niobate (TFLN) technique has enabled the development of various high-performance optoelectronics devices, ranging from the ultra-wideband electro-optic modulators to the high-efficient quantum sources. However, the TFLN platform does not natively promise lasers and… ▽ More With the advantages of large electro-optical coefficient, wide transparency window, and strong optical confinement, thin-film lithium niobate (TFLN) technique has enabled the development of various high-performance optoelectronics devices, ranging from the ultra-wideband electro-optic modulators to the high-efficient quantum sources. However, the TFLN platform does not natively promise lasers and photodiodes. This study presents an InP/InGaAs modified uni-traveling carrier (MUTC) photodiodes heterogeneously integrated on the TFLN platform with a record-high 3-dB bandwidth of 110 GHz and a responsivity of 0.4 A/W at a 1550-nm wavelength. It is implemented on a wafer-level TFLN-InP heterogeneous integration platform and is suitable for the large-scale, multi-function, and high-performance TFLN photonic integrated circuits. △ Less

Submitted 1 July, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

Comments: 17 pages, 8 figures

Journal ref: Light: Advanced Manufacturing, 4, Article number: 30 (2023)

arXiv:2305.05900 [pdf, other]

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Authors: Chengkun Wei, Minghu Zhao, Zhikun Zhang, Min Chen, Wenlong Meng, Bo Liu, Yuan Fan, Wenzhi Chen

Abstract: Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in prac… ▽ More Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in practice. Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss. However, these studies are isolated and cannot comprehensively measure the performance of improvements proposed in algorithms. More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability. We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks. We first present a taxonomy of where improvements are located in the machine learning life cycle. Based on our taxonomy, we jointly perform an extensive measurement study of the improved DPML algorithms. We also cover state-of-the-art label differential privacy (Label DP) algorithms in the evaluation. According to our empirical results, DP can effectively defend against MIAs, and sensitivity-bounding techniques such as per-sample gradient clipping play an important role in defense. We also explore some improvements that can maintain model utility and defend against MIAs more effectively. Experiments show that Label DP algorithms achieve less utility loss but are fragile to MIAs. To support our evaluation, we implement a modular re-usable software, DPMLBench, which enables sensitive data owners to deploy DPML algorithms and serves as a benchmark tool for researchers and practitioners. △ Less

Submitted 14 October, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: To appear in the ACM Conference on Computer and Communications Security (CCS), November 2023, Tivoli Congress Center, Copenhagen, Denmark

arXiv:2305.03976 [pdf, ps, other]

doi 10.1088/1674-4527/acd589

The chromatic Point Spread Function of weak lensing measurement in Chinese Space Station survey Telescope

Authors: Q. Y. Liu, X. Z. Er, Z. H. Fan, D. Z. Liu, G. L. Li, C. L. Wei, Z. Ban, X. B. Li, D. Yue

Abstract: The weak gravitational lensing is a powerful tool in modern cosmology. To accurately measure the weak lensing signal, one has to control the systematic bias to a small level. One of the most difficult problems is how to correct the smearing effect of the Point Spread Function (PSF) on the shape of the galaxies. The chromaticity of PSF for a broad-band observation can lead to new subtle effects. Si… ▽ More The weak gravitational lensing is a powerful tool in modern cosmology. To accurately measure the weak lensing signal, one has to control the systematic bias to a small level. One of the most difficult problems is how to correct the smearing effect of the Point Spread Function (PSF) on the shape of the galaxies. The chromaticity of PSF for a broad-band observation can lead to new subtle effects. Since the PSF is wavelength dependent and the spectrum energy distributions between stars and galaxies are different, the effective PSF measured from the star images will be different from that smears the galaxies. Such a bias is called colour bias. We estimate it in the optical bands of the Chinese Space Station Survey Telescope from simulated PSFs, and show the dependence on the colour and redshift of the galaxies. Moreover, due to the spatial variation of spectra over the galaxy image, there exists another higher-order bias, colour gradient bias. Our results show that both colour bias and colour gradient bias are generally below $0.1$ percent in CSST. Only for small-size galaxies, one needs to be careful about the colour gradient bias in the weak lensing analysis using CSST data. △ Less

Submitted 6 May, 2023; originally announced May 2023.

arXiv:2305.03563 [pdf, other]

Cooperative Driving of Connected Autonomous Vehicles in Heterogeneous Mixed Traffic: A Game Theoretic Approach

Authors: Shiyu Fang, Peng Hang, Chongfeng Wei, Yang Xing, Jian Sun

Abstract: High-density, unsignalized intersection has always been a bottleneck of efficiency and safety. The emergence of Connected Autonomous Vehicles (CAVs) results in a mixed traffic condition, further increasing the complexity of the transportation system. Against this background, this paper aims to study the intricate and heterogeneous interaction of vehicles and conflict resolution at the high-density… ▽ More High-density, unsignalized intersection has always been a bottleneck of efficiency and safety. The emergence of Connected Autonomous Vehicles (CAVs) results in a mixed traffic condition, further increasing the complexity of the transportation system. Against this background, this paper aims to study the intricate and heterogeneous interaction of vehicles and conflict resolution at the high-density, mixed, unsignalized intersection. Theoretical insights about the interaction between CAVs and Human-driven Vehicles (HVs) and the cooperation of CAVs are synthesized, based on which a novel cooperative decision-making framework in heterogeneous mixed traffic is proposed. Normalized Cooperative game is concatenated with Level-k game (NCL game) to generate a system optimal solution. Then Lattice planner generates the optimal and collision-free trajectories for CAVs. To reproduce HVs in mixed traffic, interactions from naturalistic human driving data are extracted as prior knowledge. Non-cooperative game and Inverse Reinforcement Learning (IRL) are integrated to mimic the decision making of heterogeneous HVs. Finally, three cases are conducted to verify the performance of the proposed algorithm, including the comparative analysis with different methods, the case study under different Rates of Penetration (ROP) and the interaction analysis with heterogeneous HVs. It is found that the proposed cooperative decision-making framework is beneficial to the driving conflict resolution and the traffic efficiency improvement of the mixed unsignalized intersection. Besides, due to the consideration of driving heterogeneity, better human-machine interaction and cooperation can be realized in this paper. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.00832 [pdf, ps, other]

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Authors: Julia Olkhovskaya, Jack Mayo, Tim van Erven, Gergely Neu, Chen-Yu Wei

Abstract: We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction. Assuming the $d$-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of $T$ rounds is known to scale as $\tilde O(\sqrt{Kd T})$. Under the additional assumption that the… ▽ More We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction. Assuming the $d$-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of $T$ rounds is known to scale as $\tilde O(\sqrt{Kd T})$. Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order $\tilde O(K\sqrt{d V_T})$ in terms of the cumulative second moment of the learner's losses $V_T$, and a closely related first-order bound of order $\tilde O(K\sqrt{d L_T^*})$ in terms of the cumulative loss of the best policy $L_T^*$. Since $V_T$ or $L_T^*$ may be significantly smaller than $T$, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts. △ Less

Submitted 24 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.09753 [pdf, ps, other]

On some conjectures of Z.-W. Sun involving harmonic numbers

Authors: Chuanan Wei

Abstract: Harmonic numbers are significant in various branches of number theory. With the help of the digamma function, we prove ten conjectural series of Z.-W. Sun involving harmonic numbers. Several ones of them are also series expansions of $\log2/π^2$. Harmonic numbers are significant in various branches of number theory. With the help of the digamma function, we prove ten conjectural series of Z.-W. Sun involving harmonic numbers. Several ones of them are also series expansions of $\log2/π^2$. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.07745 [pdf, other]

Framework for Quality Evaluation of Smart Roadside Infrastructure Sensors for Automated Driving Applications

Authors: Laurent Kloeker, Chenghua Liu, Chao Wei, Lutz Eckstein

Abstract: The use of smart roadside infrastructure sensors is highly relevant for future applications of connected and automated vehicles. External sensor technology in the form of intelligent transportation system stations (ITS-Ss) can provide safety-critical real-time information about road users in the form of a digital twin. The choice of sensor setups has a major influence on the downstream function as… ▽ More The use of smart roadside infrastructure sensors is highly relevant for future applications of connected and automated vehicles. External sensor technology in the form of intelligent transportation system stations (ITS-Ss) can provide safety-critical real-time information about road users in the form of a digital twin. The choice of sensor setups has a major influence on the downstream function as well as the data quality. To date, there is insufficient research on which sensor setups result in which levels of ITS-S data quality. We present a novel approach to perform detailed quality assessment for smart roadside infrastructure sensors. Our framework is multimodal across different sensor types and is evaluated on the DAIR-V2X dataset. We analyze the composition of different lidar and camera sensors and assess them in terms of accuracy, latency, and reliability. The evaluations show that the framework can be used reliably for several future ITS-S applications. △ Less

Submitted 16 April, 2023; originally announced April 2023.

Comments: Accepted to be published as part of the 34th IEEE Intelligent Vehicles Symposium (IV), Anchorage, Alaska, USA, June 4-7, 2023

Showing 51–100 of 412 results for author: Wei, C