Skip to main content

Showing 1–50 of 74 results for author: Xian, Y

  1. arXiv:2407.00503  [pdf, other

    cs.CV

    Toward a Diffusion-Based Generalist for Dense Vision Tasks

    Authors: Yue Fan, Yongqin Xian, Xiaohua Zhai, Alexander Kolesnikov, Muhammad Ferjad Naeem, Bernt Schiele, Federico Tombari

    Abstract: Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring results. In this paper, we explore diffusion-based vision generalists, where we unify different types of dense prediction tasks as conditional image g… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Published at CVPR 2024 as a workshop paper

  2. arXiv:2404.13535  [pdf, ps, other

    cs.CR cs.DC

    DesTest: A Decentralised Testing Architecture for Improving Data Accuracy of Blockchain Oracle

    Authors: Xueying Zeng, Youquan Xian, Chunpei Li, Zhengdong Hu, Peng Liu

    Abstract: Blockchain technology ensures secure and trustworthy data flow between multiple participants on the chain, but interoperability of on-chain and off-chain data has always been a difficult problem that needs to be solved. To solve the problem that blockchain systems cannot access off-chain data, oracle is introduced. however, existing research mainly focuses on the consistency and integrity of data,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  3. arXiv:2404.08263  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Relational Prompt-based Pre-trained Language Models for Social Event Detection

    Authors: Pu Li, Xiaoyan Yu, Hao Peng, Yantuan Xian, Linqin Wang, Li Sun, Jingyun Zhang, Philip S. Yu

    Abstract: Social Event Detection (SED) aims to identify significant events from social streams, and has a wide application ranging from public opinion analysis to risk management. In recent years, Graph Neural Network (GNN) based solutions have achieved state-of-the-art performance. However, GNN-based methods often struggle with noisy and missing edges between messages, affecting the quality of learned mess… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: ACM TOIS Under Review

  4. arXiv:2404.05342  [pdf, other

    cs.IR

    Beyond the Sequence: Statistics-Driven Pre-training for Stabilizing Sequential Recommendation Model

    Authors: Sirui Wang, Peiguang Li, Yunsen Xian, Hongzhi Zhang

    Abstract: The sequential recommendation task aims to predict the item that user is interested in according to his/her historical action sequence. However, inevitable random action, i.e. user randomly accesses an item among multiple candidates or clicks several items at random order, cause the sequence fails to provide stable and high-quality signals. To alleviate the issue, we propose the StatisTics-Driven… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  5. arXiv:2403.19596  [pdf, other

    cs.CV

    LocCa: Visual Pretraining with Location-aware Captioners

    Authors: Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai

    Abstract: Image captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper, we propose a simple visual pretraining method with location-aware captioners (LocCa). LocCa uses a simple image captioner task interface, to teach a model to read… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  6. arXiv:2402.18320  [pdf, other

    cs.CV cs.AI

    Location-guided Head Pose Estimation for Fisheye Image

    Authors: Bing Li, Dong Zhang, Cheng Huang, Yun Xian, Ming Li, Dah-Jye Lee

    Abstract: Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location i… ▽ More

    Submitted 10 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Revised Introduction and Related Work; Submitted to lEEE Transactions on Cognitive and Developmental Systems for review

  7. arXiv:2402.17256  [pdf, other

    cs.CL

    Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection

    Authors: Pei Wang, Keqing He, Yejie Wang, Xiaoshuai Song, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

    Abstract: Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems. Previous methods address it by fine-tuning discriminative models. Recently, some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to var… ▽ More

    Submitted 4 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Journal ref: LREC-COLING 2024

  8. arXiv:2402.02543  [pdf, other

    cs.GT cs.CE cs.DC

    Safeguarding the Truth of High-Value Price Oracle Task: A Dynamically Adjusted Truth Discovery Method

    Authors: Youquan Xian, Peng Liu, Dongcheng Li, Xueying Zeng

    Abstract: In recent years, the Decentralized Finance (DeFi) market has witnessed numerous attacks on the price oracle, leading to substantial economic losses. Despite the advent of truth discovery methods opening up new avenues for oracle development, it falls short in addressing high-value attacks on price oracle tasks. Consequently, this paper introduces a dynamically adjusted truth discovery method safeg… ▽ More

    Submitted 22 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 10 pages, 7 figures

  9. arXiv:2401.11107  [pdf, other

    cs.CL cs.IR

    Exploiting Duality in Open Information Extraction with Predicate Prompt

    Authors: Zhen Chen, Jingping Liu, Deqing Yang, Yanghua Xiao, Huimin Xu, Zongyu Wang, Rui Xie, Yunsen Xian

    Abstract: Open information extraction (OpenIE) aims to extract the schema-free triplets in the form of (\emph{subject}, \emph{predicate}, \emph{object}) from a given sentence. Compared with general information extraction (IE), OpenIE poses more challenges for the IE models, {especially when multiple complicated triplets exist in a sentence. To extract these complicated triplets more effectively, in this pap… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  10. arXiv:2312.11897  [pdf, other

    cs.CV

    Text-Conditioned Resampler For Long Form Video Understanding

    Authors: Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari

    Abstract: In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task. TCR localises relevant visual features from the video given a text condition and provides them to a LLM to generate a text response. Due to its lightweight design and use of cross-attention, TCR can pro… ▽ More

    Submitted 25 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  11. arXiv:2312.09256  [pdf, other

    cs.CV

    LIME: Localized Image Editing via Attention Regularization in Diffusion Models

    Authors: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

    Abstract: Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper in… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  12. arXiv:2312.08870  [pdf, other

    cs.CV

    Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

    Authors: Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

    Abstract: Recent advances in large video-language models have displayed promising outcomes in video comprehension. Current approaches straightforwardly convert video into language tokens and employ large language models for multi-modal tasks. However, this method often leads to the generation of irrelevant content, commonly known as "hallucination", as the length of the text increases and the impact of the… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  13. arXiv:2311.17944  [pdf, other

    cs.CV

    LALM: Long-Term Action Anticipation with Language Models

    Authors: Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc Van Gool, Xi Wang

    Abstract: Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint. While traditional methods heavily rely on representation learning trained on extensive video data, there exists a significant limitation: obtaining effective video representations proves challenging due to the inherent complexi… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  14. arXiv:2311.16678  [pdf, other

    cs.CL

    Entity-Aspect-Opinion-Sentiment Quadruple Extraction for Fine-grained Sentiment Analysis

    Authors: Dan Ma, Jun Xu, Zongyu Wang, Xuezhi Cao, Yunsen Xian

    Abstract: Product reviews often contain a large number of implicit aspects and object-attribute co-existence cases. Unfortunately, many existing studies in Aspect-Based Sentiment Analysis (ABSA) have overlooked this issue, which can make it difficult to extract opinions comprehensively and fairly. In this paper, we propose a new task called Entity-Aspect-Opinion-Sentiment Quadruple Extraction (EASQE), which… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  15. arXiv:2311.08268  [pdf, other

    cs.CL

    A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily

    Authors: Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe responses. However, adversarial prompts known as 'jailbreaks' can circumvent safeguards, leading LLMs to generate potentially harmful content. Exploring jailbreak prompts can help to better reveal the weaknesses of LLMs and further steer us to secure them. Unfortunately, existing jailbreak methods eith… ▽ More

    Submitted 6 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Acccepted by NAACL 2024, 18 pages, 7 figures, 13 tables

  16. APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection

    Authors: Pei Wang, Keqing He, Yutao Mou, Xiaoshuai Song, Yanan Wu, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

    Abstract: Detecting out-of-domain (OOD) intents from user queries is essential for a task-oriented dialogue system. Previous OOD detection studies generally work on the assumption that plenty of labeled IND intents exist. In this paper, we focus on a more practical few-shot OOD setting where there are only a few labeled IND data and massive unlabeled mixed data that may belong to IND or OOD. The new scenari… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Journal ref: EMNLP2023, Findings

  17. arXiv:2310.13355  [pdf, other

    cs.CV

    SILC: Improving Vision Language Pretraining with Self-Distillation

    Authors: Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc Van Gool, Federico Tombari

    Abstract: Image-Text pretraining on web-scale image caption datasets has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. Several works have also used CLIP features for dense prediction tasks and have shown the emergence of open-set abilities. However, the contrastive objective used by these models only focuses on image-text al… ▽ More

    Submitted 7 December, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  18. arXiv:2310.10176  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT

    Authors: Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

    Abstract: The tasks of out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent classifier to open-world intent sets, which is crucial to task-oriented dialogue (TOD) systems. Previous methods address them by fine-tuning discriminative models. Recently, although some studies have been exploring the application of large language models (LLMs) represented by Ch… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accpeted to EMNLP 2023 (Main Conference)

  19. arXiv:2310.07165  [pdf, other

    cs.DC

    A Microgrid Trading Framework Based on PoC Consensus

    Authors: Lianghaojie Zhou, Youquan Xian, Yipeng Yang, Jianyong Jiang, Peng Liu, Xianxian Li

    Abstract: In the field of energy Internet, blockchain-based distributed energy trading mode is a promising way to replace the traditional centralized trading mode. However, the current power blockchain platform based on public chain has problems such as low consensus efficiency and waste of computing resources. The energy trading platform based on the consortium chain has problems such as inability to attra… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 14 pages,11 figures

  20. arXiv:2310.04975  [pdf, ps, other

    cs.CR cs.DC

    A Trustworthy and Consistent Blockchain Oracle Scheme for Industrial Internet of Things

    Authors: Peng Liu, Youquan Xian, Chuanjian Yao, Peng Wang, Li-e Wang, Xianxian Li

    Abstract: Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Rejected after the third round of review of IEEE Internet of Things Journal

  21. arXiv:2310.00254  [pdf, other

    cs.NI cs.DC cs.ET

    A Distributed Efficient Blockchain Oracle Scheme for Internet of Things

    Authors: Youquan Xian, Lianghaojie Zhou, Jianyong Jiang, Boyi Wang, Hao Huo, Peng Liu

    Abstract: In recent years, blockchain has been widely applied in the Internet of Things (IoT). Blockchain oracle, as a bridge for data communication between blockchain and off-chain, has also received significant attention. However, the numerous and heterogeneous devices in the IoT pose great challenges to the efficiency and security of data acquisition for oracles. We find that the matching relationship be… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: 10 pages, 9 figures

  22. Towards Visual Taxonomy Expansion

    Authors: Tinghui Zhu, Jingping Liu, Jiaqing Liang, Haiyun Jiang, Yanghua Xiao, Zongyu Wang, Rui Xie, Yunsen Xian

    Abstract: Taxonomy expansion task is essential in organizing the ever-increasing volume of new concepts into existing taxonomies. Most existing methods focus exclusively on using textual semantics, leading to an inability to generalize to unseen terms and the "Prototypical Hypernym Problem." In this paper, we propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion ta… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: ACMMM accepted paper

  23. arXiv:2309.04689  [pdf, other

    cs.CR cs.CE cs.DC

    A Data Middleware for Obtaining Trusted Price Data for Blockchain

    Authors: Youquan Xian, Xueying Zeng, Lianghaojie Zhou, Boyi Wang, Li-e Wang, Peng Liu

    Abstract: As a trusted middleware connecting the blockchain and the real world, the blockchain oracle can obtain trusted real-time price information for financial applications such as payment and settlement, and asset valuation on the blockchain. However, the current oracle schemes face the dilemma of security and service quality in the process of node selection, and the implicit interest relationship in fi… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

    Comments: 12 pages,8 figures

  24. arXiv:2309.02190  [pdf, other

    cs.CV cs.AI

    Exchanging-based Multimodal Fusion with Transformer

    Authors: Renyu Zhu, Chengcheng Han, Yong Qian, Qiushi Sun, Xiang Li, Ming Gao, Xuezhi Cao, Yunsen Xian

    Abstract: We study the problem of multimodal fusion in this paper. Recent exchanging-based methods have been proposed for vision-vision fusion, which aim to exchange embeddings learned from one modality to the other. However, most of them project inputs of multimodalities into different low-dimensional spaces and cannot be applied to the sequential input data. To solve these issues, in this paper, we propos… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  25. arXiv:2308.14436  [pdf, other

    cs.CL cs.IR

    Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA

    Authors: Guanting Dong, Rumei Li, Sirui Wang, Yupeng Zhang, Yunsen Xian, Weiran Xu

    Abstract: Knowledge Base Question Answering (KBQA) aims to answer natural language questions with factual information such as entities and relations in KBs. However, traditional Pre-trained Language Models (PLMs) are directly pre-trained on large-scale natural language corpus, which poses challenges for them in understanding and representing complex subgraphs in structured KBs. To bridge the gap between tex… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted as a short paper at CIKM 2023

  26. arXiv:2308.13259  [pdf, other

    cs.CL cs.AI

    Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

    Authors: Keheng Wang, Feiyu Duan, Sirui Wang, Peiguang Li, Yunsen Xian, Chuantao Yin, Wenge Rong, Zhang Xiong

    Abstract: Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue… ▽ More

    Submitted 28 October, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  27. arXiv:2307.04420  [pdf, ps, other

    cs.DC cs.AI

    FedDCT: A Dynamic Cross-Tier Federated Learning Scheme in Wireless Communication Networks

    Authors: Peng Liu, Youquan Xian, Chuanjian Yao, Xiaoyun Gan, Lianghaojie Zhou, Jianyong Jiang, Dongcheng Li

    Abstract: With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks,… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  28. arXiv:2306.10315  [pdf, other

    cs.CL

    FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue

    Authors: Weihao Zeng, Keqing He, Yejie Wang, Chen Zeng, Jingang Wang, Yunsen Xian, Weiran Xu

    Abstract: Pre-trained language models based on general text enable huge success in the NLP scenario. But the intrinsical difference of linguistic patterns between general text and task-oriented dialogues makes existing pre-trained language models less useful in practice. Current dialogue pre-training methods rely on a contrastive framework and face the challenges of both selecting true positives and hard ne… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

    Comments: ACL 2023 Main Conference

  29. arXiv:2306.00014  [pdf, other

    cs.CL cs.LG

    PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

    Authors: Zhuocheng Gong, Jiahao Liu, Qifan Wang, Yang Yang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Rui Yan

    Abstract: While transformer-based pre-trained language models (PLMs) have dominated a number of NLP applications, these models are heavy to deploy and expensive to use. Therefore, effectively compressing large-scale PLMs becomes an increasingly important problem. Quantization, which represents high-precision tensors with low-bit fix-point format, is a viable solution. However, most existing quantization met… ▽ More

    Submitted 30 May, 2023; originally announced June 2023.

    Comments: Findings of ACL2023

  30. arXiv:2305.17699  [pdf, other

    cs.CL

    Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery

    Authors: Yutao Mou, Xiaoshuai Song, Keqing He, Chen Zeng, Pei Wang, Jingang Wang, Yunsen Xian, Weiran Xu

    Abstract: Generalized intent discovery aims to extend a closed-set in-domain intent classifier to an open-world intent set including in-domain and out-of-domain intents. The key challenges lie in pseudo label disambiguation and representation learning. Previous methods suffer from a coupling of pseudo label disambiguation and representation learning, that is, the reliability of pseudo labels relies on repre… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL2023 main conference

  31. arXiv:2305.16726  [pdf, other

    cs.CL cs.AI

    RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank

    Authors: Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Kai Chen, Rui Yan

    Abstract: Unsupervised sentence representation learning is one of the fundamental problems in natural language processing with various downstream applications. Recently, contrastive learning has been widely adopted which derives high-quality sentence representations by pulling similar semantics closer and pushing dissimilar ones away. However, these methods fail to capture the fine-grained ranking informati… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  32. arXiv:2305.12129  [pdf, other

    cs.CL cs.LG

    Lifting the Curse of Capacity Gap in Distilling Language Models

    Authors: Chen Zhang, Yang Yang, Jiahao Liu, Jingang Wang, Yunsen Xian, Benyou Wang, Dawei Song

    Abstract: Pretrained language models (LMs) have shown compelling performance on various downstream tasks, but unfortunately they require a tremendous amount of inference compute. Knowledge distillation finds a path to compress LMs to small ones with a teacher-student paradigm. However, when the capacity gap between the teacher and the student is large, a curse of capacity gap appears, invoking a deficiency… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: 17 pages, 6 figures, 13 tables, accepted to ACL 2023. Code is available at https://github.com/GeneZC/MiniMoE

  33. arXiv:2304.11359  [pdf, other

    cs.CV cs.AI

    Detecting Adversarial Faces Using Only Real Face Self-Perturbations

    Authors: Qian Wang, Yongqin Xian, Hefei Ling, Jinyuan Zhang, Xiaorui Lin, Ping Li, Jiazhong Chen, Ning Yu

    Abstract: Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely di… ▽ More

    Submitted 3 May, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: IJCAI2023

  34. arXiv:2302.00491  [pdf, other

    cs.CV cs.LG

    Learning Prototype Classifiers for Long-Tailed Recognition

    Authors: Saurabh Sharma, Yongqin Xian, Ning Yu, Ambuj Singh

    Abstract: The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in LTR use softmax classifiers that are biased in that they correlate classifier norm with the amount of training data for a given class. In this work, we show that learning prototype classifiers addresses the biased softm… ▽ More

    Submitted 26 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted at IJCAI-23

  35. arXiv:2212.07911  [pdf, other

    cs.CV

    Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

    Authors: Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

    Abstract: For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as we… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at WACV 2023

  36. arXiv:2212.04362  [pdf, other

    cs.CV

    CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

    Authors: Jiezhang Cao, Qin Wang, Yongqin Xian, Yawei Li, Bingbing Ni, Zhiming Pi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no l… ▽ More

    Submitted 13 April, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  37. arXiv:2212.02291  [pdf, other

    cs.CV

    I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification

    Authors: Muhammad Ferjad Naeem, Muhammad Gul Zain Ali Khan, Yongqin Xian, Muhammad Zeshan Afzal, Didier Stricker, Luc Van Gool, Federico Tombari

    Abstract: Recent works have shown that unstructured text (documents) from online sources can serve as useful auxiliary information for zero-shot image classification. However, these methods require access to a high-quality source like Wikipedia and are limited to a single source of information. Large Language Models (LLM) trained on web-scale text show impressive abilities to repurpose their learned knowled… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  38. arXiv:2209.10304  [pdf, other

    cs.CV

    I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

    Authors: Muhammad Ferjad Naeem, Yongqin Xian, Luc Van Gool, Federico Tombari

    Abstract: Despite the tremendous progress in zero-shot learning(ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the word embedding associated with its semantic class name. However, word embeddings extracted from pre-trained language models do not necessarily capture visual… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  39. arXiv:2207.12515  [pdf, other

    cs.IR

    A Survey on Trustworthy Recommender Systems

    Authors: Yingqiang Ge, Shuchang Liu, Zuohui Fu, Juntao Tan, Zelong Li, Shuyuan Xu, Yunqi Li, Yikun Xian, Yongfeng Zhang

    Abstract: Recommender systems (RS), serving at the forefront of Human-centered AI, are widely deployed in almost every corner of the web and facilitate the human decision-making process. However, despite their enormous capabilities and potential, RS may also lead to undesired effects on users, items, producers, platforms, or even the society at large, such as compromised user trust due to non-transparency,… ▽ More

    Submitted 21 February, 2024; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted by ACM Transactions on Recommender Systems (TORS)

  40. arXiv:2205.09394  [pdf, other

    cs.IR

    AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System

    Authors: Xiang Li, Xiaojiang Zhou, Yao Xiao, Peihao Huang, Dayao Chen, Sheng Chen, Yunsen Xian

    Abstract: Industrial search and recommendation systems mostly follow the classic multi-stage information retrieval paradigm: matching, pre-ranking, ranking, and re-ranking stages. To account for system efficiency, simple vector-product based models are commonly deployed in the pre-ranking stage. Recent works consider distilling the high knowledge of large ranking models to small pre-ranking models for bette… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  41. arXiv:2204.01208  [pdf, other

    cs.CV

    Attribute Prototype Network for Any-Shot Learning

    Authors: Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

    Abstract: Any-shot image classification allows to recognize novel classes with only a few or even zero samples. For the task of zero-shot learning, visual attributes have been shown to play an important role, while in the few-shot regime, the effect of attributes is under-explored. To better transfer attribute-based knowledge from seen to unseen classes, we argue that an image representation with integrated… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

    Comments: arXiv admin note: text overlap with arXiv:2008.08290

  42. arXiv:2203.10444  [pdf, other

    cs.CV

    VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

    Authors: Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

    Abstract: Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings, enable knowledge transfer between classes. However, word embeddings do not always reflect visual similarities and result in inferior zero-shot performance. We propo… ▽ More

    Submitted 26 May, 2023; v1 submitted 19 March, 2022; originally announced March 2022.

  43. arXiv:2111.14673  [pdf, other

    cs.CV

    3D Compositional Zero-shot Learning with DeCompositional Consensus

    Authors: Muhammad Ferjad Naeem, Evin Pınar Örnek, Yongqin Xian, Luc Van Gool, Federico Tombari

    Abstract: Parts represent a basic unit of geometric and semantic similarity across different objects. We argue that part knowledge should be composable beyond the observed object classes. Towards this, we present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes for semantic segmentation. We provide a structured study through benchmarking the task wit… ▽ More

    Submitted 15 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

  44. Discriminator-Free Generative Adversarial Attack

    Authors: Shaohao Lu, Yuqiao Xian, Ke Yan, Yi Hu, Xing Sun, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng

    Abstract: The Deep Neural Networks are vulnerable toadversarial exam-ples(Figure 1), making the DNNs-based systems collapsed byadding the inconspicuous perturbations to the images. Most of the existing works for adversarial attack are gradient-based and suf-fer from the latency efficiencies and the load on GPU memory. Thegenerative-based adversarial attacks can get rid of this limitation,and some relative w… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: 9 pages, 6 figures, 4 tables

  45. arXiv:2106.10683  [pdf, other

    cs.CV cs.AI

    Solution for Large-scale Long-tailed Recognition with Noisy Labels

    Authors: Yuqiao Xian, Jia-Xin Zhuang, Fufu Yu

    Abstract: This is a technical report for CVPR 2021 AliProducts Challenge. AliProducts Challenge is a competition proposed for studying the large-scale and fine-grained commodity image recognition problem encountered by worldleading ecommerce companies. The large-scale product recognition simultaneously meets the challenge of noisy annotations, imbalanced (long-tailed) data distribution and fine-grained clas… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 3 pages

    Journal ref: CVPR 2021 AliProducts Challenge: CVPR 2021 AliProducts Challenge:Large-scale Product Recognition, Technical Report

  46. arXiv:2105.01017  [pdf, other

    cs.CV

    Learning Graph Embeddings for Open World Compositional Zero-Shot Learning

    Authors: Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

    Abstract: Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training. A problem with standard CZSL is the assumption of knowing which unseen compositions will be available at test time. In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time,… ▽ More

    Submitted 8 April, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: Accepted by T-PAMI in March, 2022. arXiv admin note: text overlap with arXiv:2101.12609

  47. arXiv:2104.11692  [pdf, other

    cs.CV

    A Closer Look at Self-training for Zero-Label Semantic Segmentation

    Authors: Giuseppe Pastore, Fabio Cermelli, Yongqin Xian, Massimiliano Mancini, Zeynep Akata, Barbara Caputo

    Abstract: Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation. Prior zero-label semantic segmentation works approach this task by learning visual-semantic embeddings or generative models. However, they are prone to overfitting on the seen classes… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

  48. arXiv:2104.10955  [pdf, other

    cs.CV cs.AI cs.LG

    Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

    Authors: Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata

    Abstract: Having access to multi-modal cues (e.g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality. In this work, we propose to transfer knowledge across heterogeneous modalities, even though these data modalities may not be semantically correlated. Rather than directly aligning the representations of different modalities, we compose audio, image,… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR2021

  49. arXiv:2104.07869  [pdf, other

    cs.IR

    Faithfully Explainable Recommendation via Neural Logic Reasoning

    Authors: Yaxin Zhu, Yikun Xian, Zuohui Fu, Gerard de Melo, Yongfeng Zhang

    Abstract: Knowledge graphs (KG) have become increasingly important to endow modern recommender systems with the ability to generate traceable reasoning paths to explain the recommendation process. However, prior research rarely considers the faithfulness of the derived explanations to justify the decision making process. To the best of our knowledge, this is the first work that models and evaluates faithful… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted in NAACL 2021

  50. arXiv:2102.01987  [pdf, other

    cs.CV

    Learning Graph Embeddings for Compositional Zero-shot Learning

    Authors: Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata

    Abstract: In compositional zero-shot learning, the goal is to recognize unseen compositions (e.g. old dog) of observed visual primitives states (e.g. old, cute) and objects (e.g. car, dog) in the training set. This is challenging because the same state can for example alter the visual appearance of a dog drastically differently from a car. As a solution, we propose a novel graph formulation called Compositi… ▽ More

    Submitted 3 May, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted in IEEE CVPR 2021