Skip to main content

Showing 1–50 of 52 results for author: Lv, H

  1. arXiv:2406.19776  [pdf, other

    cs.MM cs.IR

    MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

    Authors: Hongzhen Lv, Wenzhong Yang, Fuyuan Wei, Jiaren Peng, Haokun Geng

    Abstract: Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18118  [pdf, other

    cs.CR cs.CL

    SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

    Authors: Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  4. arXiv:2403.04780  [pdf, other

    cs.CL cs.AI

    MuseGraph: Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining

    Authors: Yanchao Tan, Hang Lv, Xinyi Huang, Jiawei Zhang, Shiping Wang, Carl Yang

    Abstract: Graphs with abundant attributes are essential in modeling interconnected entities and improving predictions in various real-world applications. Traditional Graph Neural Networks (GNNs), which are commonly used for modeling attributed graphs, need to be re-trained every time when applied to different graph tasks and datasets. Although the emergence of Large Language Models (LLMs) has introduced a n… ▽ More

    Submitted 13 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  5. arXiv:2402.19282  [pdf, other

    cs.CL

    WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

    Authors: Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan , et al. (1 additional authors not shown)

    Abstract: This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy… ▽ More

    Submitted 17 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  6. Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge Graphs

    Authors: Tianyu Zhang, Chengbin Hou, Rui Jiang, Xuegong Zhang, Chenghu Zhou, Ke Tang, Hairong Lv

    Abstract: Node Importance Estimation (NIE) is a task of inferring importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicating to knowledge graphs for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model by available labels, and they consider every interested node e… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE TNNLS

  7. arXiv:2402.16717  [pdf, other

    cs.CL cs.AI cs.CR

    CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

    Authors: Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothes… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  8. arXiv:2401.16762  [pdf, other

    cs.CV

    Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization

    Authors: Henglei Lv, Jiayu Xiao, Liang Li, Qingming Huang

    Abstract: Diffusion-based text-to-image personalization have achieved great success in generating subjects specified by users among various contexts. Even though, existing finetuning-based methods still suffer from model overfitting, which greatly harms the generative diversity, especially when given subject images are few. To this end, we propose Pick-and-Draw, a training-free semantic guidance approach to… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  9. arXiv:2401.05702  [pdf, other

    cs.CV

    Video Anomaly Detection and Explanation via Large Language Models

    Authors: Hui Lv, Qianru Sun

    Abstract: Video Anomaly Detection (VAD) aims to localize abnormal events on the timeline of long-range surveillance videos. Anomaly-scoring-based methods have been prevailing for years but suffer from the high complexity of thresholding and low explanability of detection results. In this paper, we conduct pioneer research on equipping video-based large language models (VLLMs) in the framework of VAD, making… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 9 pages, 6 figures

  10. arXiv:2311.13562  [pdf, other

    cs.CV cs.AI

    Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object

    Authors: Junhao Chen, Peng Rong, Jingbo Sun, Chao Li, Xiang Li, Hongwu Lv

    Abstract: Image style transfer occupies an important place in both computer graphics and computer vision. However, most current methods require reference to stylized images and cannot individually stylize specific objects. To overcome this limitation, we propose the "Soulstyler" framework, which allows users to guide the stylization of specific objects in an image through simple textual descriptions. We int… ▽ More

    Submitted 29 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 5 pages,3 figures,ICASSP2024

  11. arXiv:2310.14278  [pdf, other

    cs.SD cs.CL eess.AS

    Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

    Authors: Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the C… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: TASLP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

  12. arXiv:2310.10195  [pdf, other

    cs.LG cs.CL

    AdaLomo: Low-memory Optimization with Adaptive Learning Rate

    Authors: Kai Lv, Hang Yan, Qipeng Guo, Haijun Lv, Xipeng Qiu

    Abstract: Large language models have achieved remarkable success, but their extensive parameter size necessitates substantial memory for training, thereby setting a high threshold. While the recently proposed low-memory optimization (LOMO) reduces memory footprint, its optimization technique, akin to stochastic gradient descent, is sensitive to hyper-parameters and exhibits suboptimal convergence, failing t… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ACL 2024 camera ready version

  13. arXiv:2310.08872  [pdf, other

    cs.CV

    R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

    Authors: Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang

    Abstract: Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input. However, these models fail to convey appropriate spatial composition specified by a layout instruction. In this work, we probe into zero-shot grounded T2I generation with diffusion models, that is, generating images corresponding to the input layout informati… ▽ More

    Submitted 27 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Preprint. Under review. Project page: https://sagileo.github.io/Region-and-Boundary

  14. arXiv:2310.02064  [pdf, ps, other

    cs.GT

    Auction Design for Bidders with Ex Post ROI Constraints

    Authors: Hongtao Lv, Xiaohui Bei, Zhenzhe Zheng, Fan Wu

    Abstract: Motivated by practical constraints in online advertising, we investigate single-parameter auction design for bidders with constraints on their Return On Investment (ROI) -- a targeted minimum ratio between the obtained value and the payment. We focus on ex post ROI constraints, which require the ROI condition to be satisfied for every realized value profile. With ROI-constrained bidders, we first… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted by WINE2023

  15. arXiv:2309.13373  [pdf, other

    cs.SD cs.LG eess.AS

    Asca: less audio data is more insightful

    Authors: Xiang Li, Junhao Chen, Chao Li, Hongwu Lv

    Abstract: Audio recognition in specialized areas such as birdsong and submarine acoustics faces challenges in large-scale pre-training due to the limitations in available samples imposed by sampling environments and specificity requirements. While the Transformer model excels in audio recognition, its dependence on vast amounts of data becomes restrictive in resource-limited settings. Addressing this, we in… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: 6 pages,3 figures

  16. arXiv:2308.12647  [pdf, other

    cs.NE

    Multitasking Evolutionary Algorithm Based on Adaptive Seed Transfer for Combinatorial Problem

    Authors: Haoyuan Lv, Ruochen Liu

    Abstract: Evolutionary computing (EC) is widely used in dealing with combinatorial optimization problems (COP). Traditional EC methods can only solve a single task in a single run, while real-life scenarios often need to solve multiple COPs simultaneously. In recent years, evolutionary multitasking optimization (EMTO) has become an emerging topic in the EC community. And many methods have been designed to d… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  17. arXiv:2304.04972  [pdf, other

    cs.LG

    Federated Learning with Classifier Shift for Class Imbalance

    Authors: Yunheng Shen, Haoxiang Wang, Hairong Lv

    Abstract: Federated learning aims to learn a global model collaboratively while the training data belongs to different clients and is not allowed to be exchanged. However, the statistical heterogeneity challenge on non-IID data, such as class imbalance in classification, will cause client drift and significantly reduce the performance of the global model. This paper proposes a simple and effective approach… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  18. arXiv:2303.12369  [pdf, other

    cs.CV

    Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

    Authors: Hui Lv, Zhongqi Yue, Qianru Sun, Bin Luo, Zhen Cui, Hanwang Zhang

    Abstract: Weakly Supervised Video Anomaly Detection (WSVAD) is challenging because the binary anomaly label is only given on the video level, but the output requires snippet-level predictions. So, Multiple Instance Learning (MIL) is prevailing in WSVAD. However, MIL is notoriously known to suffer from many false alarms because the snippet-level detector is easily biased towards the abnormal snippets with si… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 11 pages,10 figures

  19. Variation Enhanced Attacks Against RRAM-based Neuromorphic Computing System

    Authors: Hao Lv, Bing Li, Lei Zhang, Cheng Liu, Ying Wang

    Abstract: The RRAM-based neuromorphic computing system has amassed explosive interests for its superior data processing capability and energy efficiency than traditional architectures, and thus being widely used in many data-centric applications. The reliability and security issues of the NCS therefore become an essential problem. In this paper, we systematically investigated the adversarial threats to the… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

  20. arXiv:2302.08062  [pdf

    cs.CV cs.AI q-bio.PE

    Fossil Image Identification using Deep Learning Ensembles of Data Augmented Multiviews

    Authors: Chengbin Hou, Xinyu Lin, Hanhui Huang, Sheng Xu, Junxuan Fan, Yukun Shi, Hairong Lv

    Abstract: Identification of fossil species is crucial to evolutionary studies. Recent advances from deep learning have shown promising prospects in fossil image identification. However, the quantity and quality of labeled fossil images are often limited due to fossil preservation, conditioned sampling, and expensive and inconsistent label annotation by domain experts, which pose great challenges to training… ▽ More

    Submitted 1 February, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: published in Methods in Ecology and Evolution

    Journal ref: Methods in Ecology and Evolution, 14, 3020-3034 (2023)

  21. arXiv:2302.07493  [pdf, other

    cs.LG cs.AI cs.DC

    Adaptive incentive for cross-silo federated learning: A multi-agent reinforcement learning approach

    Authors: Shijing Yuan, Hongze Liu, Hongtao Lv, Zhanbo Feng, Jie Li, Hongyang Chen, Chentao Wu

    Abstract: Cross-silo federated learning (FL) is a typical FL that enables organizations(e.g., financial or medical entities) to train global models on isolated data. Reasonable incentive is key to encouraging organizations to contribute data. However, existing works on incentivizing cross-silo FL lack consideration of the environmental dynamics (e.g., precision of the trained global model and data owned by… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  22. ChameleMon: Shifting Measurement Attention as Network State Changes

    Authors: Kaicheng Yang, Yuhan Wu, Ruijie Miao, Tong Yang, Zirui Liu, Zicang Xu, Rui Qiu, Yikai Zhao, Hanglong Lv, Zhigang Ji, Gaogang Xie

    Abstract: Flow-level network measurement is critical to many network applications. Among various measurement tasks, packet loss detection and heavy-hitter detection are two most important measurement tasks, which we call the two key tasks. In practice, the two key tasks are often required at the same time, but existing works seldom handle both tasks. In this paper, we design ChameleMon to support the two ke… ▽ More

    Submitted 20 July, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: This is a preprint of ChameleMon: Shifting Measurement Attention as Network State Changes, to appear in SIGCOMM 2023

    Journal ref: ACM SIGCOMM (2023) 881-903

  23. arXiv:2211.16716  [pdf, other

    cs.SE

    Automated Generating Natural Language Requirements based on Domain Ontology

    Authors: Ziyan Zhao, Li Zhang, Xiaoyun Gao, Xiaoli Lian, Heyang Lv, Lin Shi

    Abstract: Software requirements specification is undoubtedly critical for the whole software life-cycle. Nowadays, writing software requirements specifications primarily depends on human work. Although massive studies have been proposed to fasten the process via proposing advanced elicitation and analysis techniques, it is still a time-consuming and error-prone task that needs to take domain knowledge and b… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  24. arXiv:2211.16251  [pdf, other

    cs.GT

    Utility Maximizer or Value Maximizer: Mechanism Design for Mixed Bidders in Online Advertising

    Authors: Hongtao Lv, Zhilin Zhang, Zhenzhe Zheng, Jinghan Liu, Chuan Yu, Lei Liu, Lizhen Cui, Fan Wu

    Abstract: Digital advertising constitutes one of the main revenue sources for online platforms. In recent years, some advertisers tend to adopt auto-bidding tools to facilitate advertising performance optimization, making the classical \emph{utility maximizer} model in auction theory not fit well. Some recent studies proposed a new model, called \emph{value maximizer}, for auto-bidding advertisers with retu… ▽ More

    Submitted 30 November, 2022; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: accepted by AAAI2023

  25. arXiv:2210.04287  [pdf, other

    cs.CV

    Learning to Decompose Visual Features with Latent Textual Prompts

    Authors: Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander G. Schwing, Heng Ji

    Abstract: Recent advances in pre-training vision-language models like CLIP have shown great potential in learning transferable visual representations. Nonetheless, for downstream inference, CLIP-like models suffer from either 1) degraded accuracy and robustness in the case of inaccurate text descriptions during retrieval-based inference (the challenge for zero-shot protocol); or 2) breaking the well-establi… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

  26. arXiv:2209.13116  [pdf, other

    cs.CV

    Spatio-Temporal Relation Learning for Video Anomaly Detection

    Authors: Hui Lv, Zhen Cui, Biao Wang, Jian Yang

    Abstract: Anomaly identification is highly dependent on the relationship between the object and the scene, as different/same object actions in same/different scenes may lead to various degrees of normality and anomaly. Therefore, object-scene relation actually plays a crucial role in anomaly detection but is inadequately explored in previous works. In this paper, we propose a Spatial-Temporal Relation Learn… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 8 pages, 5 figures,Journal

  27. arXiv:2209.08933  [pdf, ps, other

    eess.IV cs.CV

    Estimating Brain Age with Global and Local Dependencies

    Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Haiyan Lv, Ting Ma

    Abstract: The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  28. Hardware-in-the-Loop Simulation for Evaluating Communication Impacts on the Wireless-Network-Controlled Robots

    Authors: Honghao Lv, Zhibo Pang, Ming Xiao, Geng Yang

    Abstract: More and more robot automation applications have changed to wireless communication, and network performance has a growing impact on robotic systems. This study proposes a hardware-in-the-loop (HiL) simulation methodology for connecting the simulated robot platform to real network devices. This project seeks to provide robotic engineers and researchers with the capability to experiment without heav… ▽ More

    Submitted 28 September, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 6 pages, 11 figures, to appear in 48th Annual Conference of the Industrial Electronics Society IECON 2022 Conference

  29. arXiv:2207.01261  [pdf, other

    cs.SD eess.AS

    Minimizing Sequential Confusion Error in Speech Command Recognition

    Authors: Zhanheng Yang, Hang Lv, Xiong Wang, Ao Zhang, Lei Xie

    Abstract: Speech command recognition (SCR) has been commonly used on resource constrained devices to achieve hands-free user experience. However, in real applications, confusion among commands with similar pronunciations often happens due to the limited capacity of small models deployed on edge devices, which drastically affects the user experience. In this paper, inspired by the advances of discriminative… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted by Interspeech 2022

  30. arXiv:2203.16539  [pdf, other

    cs.LG eess.SP physics.optics

    Identification of diffracted vortex beams at different propagation distances using deep learning

    Authors: Heng Lv, Yan Guo, Zi-Xiang Yang, Chunling Ding, Wu-Hao Cai, Chenglong You, Rui-Bo Jin

    Abstract: Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this articl… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: 9 pages, 4 figures

    Journal ref: Frontiers in Physics 10, 843932 (2022)

  31. arXiv:2203.15455  [pdf, other

    cs.SD cs.CL eess.AS

    WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

    Authors: Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

    Abstract: Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) W… ▽ More

    Submitted 5 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  32. arXiv:2110.14636  [pdf, other

    cs.CL

    Pay attention to emoji: Feature Fusion Network with EmoGraph2vec Model for Sentiment Analysis

    Authors: Xiaowei Yuan, Jingyuan Hu, Xiaodan Zhang, Honglei Lv

    Abstract: With the explosive growth of social media, opinionated postings with emojis have increased explosively. Many emojis are used to express emotions, attitudes, and opinions. Emoji representation learning can be helpful to improve the performance of emoji-related natural language processing tasks, especially in text sentiment analysis. However, most studies have only utilized the fixed descriptions pr… ▽ More

    Submitted 23 May, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Camera-ready verison accepted by ICPR 2022

  33. Emoji-based Co-attention Network for Microblog Sentiment Analysis

    Authors: Xiaowei Yuan, Jingyuan Hu, Xiaodan Zhang, Honglei Lv, Hao Liu

    Abstract: Emojis are widely used in online social networks to express emotions, attitudes, and opinions. As emotional-oriented characters, emojis can be modeled as important features of emotions towards the recipient or subject for sentiment analysis. However, existing methods mainly take emojis as heuristic information that fails to resolve the problem of ambiguity noise. Recent researches have utilized em… ▽ More

    Submitted 14 January, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: There are technical details that need to be changed, and the replacement version will take time to complete

  34. arXiv:2110.03370  [pdf, other

    cs.SD cs.CL

    WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

    Authors: Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, Zhendong Peng

    Abstract: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition… ▽ More

    Submitted 23 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  35. arXiv:2110.00959  [pdf, other

    cs.LG cs.AI

    Boost Neural Networks by Checkpoints

    Authors: Feng Wang, Guoyizhe Wei, Qiao Liu, Jinxiang Ou, Xian Wei, Hairong Lv

    Abstract: Training multiple deep neural networks (DNNs) and averaging their outputs is a simple way to improve the predictive performance. Nevertheless, the multiplied training cost prevents this ensemble method to be practical and efficient. Several recent works attempt to save and ensemble the checkpoints of DNNs, which only requires the same computational cost as training a single network. However, these… ▽ More

    Submitted 25 October, 2021; v1 submitted 3 October, 2021; originally announced October 2021.

  36. arXiv:2109.07045  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Uncertainty Quantification in Medical Image Segmentation with Multi-decoder U-Net

    Authors: Yanwu Yang, Xutao Guo, Yiwei Pan, Pengcheng Shi, Haiyan Lv, Ting Ma

    Abstract: Accurate medical image segmentation is crucial for diagnosis and analysis. However, the models without calibrated uncertainty estimates might lead to errors in downstream analysis and exhibit low levels of robustness. Estimating the uncertainty in the measurement is vital to making definite, informed conclusions. Especially, it is difficult to make accurate predictions on ambiguous areas and focus… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: MICCAI_QUBIQ challenge, conference, Uncertainty qualification

  37. arXiv:2108.10623  [pdf, other

    cs.LG cs.GT

    Data-Free Evaluation of User Contributions in Federated Learning

    Authors: Hongtao Lv, Zhenzhe Zheng, Tie Luo, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv

    Abstract: Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources. A critical issues is to evaluate individual users' contributions so that (1) users' effort in model training can be compensated with proper incentives and (2) malicious and low-quality users can be detected and removed. The state-of-the-art sol… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: accepted by WiOpt 2021

  38. arXiv:2106.03593  [pdf, other

    cs.GT cs.AI cs.LG

    Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising

    Authors: Xiangyu Liu, Chuan Yu, Zhilin Zhang, Zhenzhe Zheng, Yu Rong, Hongtao Lv, Da Huo, Yiqing Wang, Dagui Chen, Jian Xu, Fan Wu, Guihai Chen, Xiaoqiang Zhu

    Abstract: In e-commerce advertising, it is crucial to jointly consider various performance metrics, e.g., user experience, advertiser utility, and platform revenue. Traditional auction mechanisms, such as GSP and VCG auctions, can be suboptimal due to their fixed allocation rules to optimize a single performance metric (e.g., revenue or social welfare). Recently, data-driven auctions, learned directly from… ▽ More

    Submitted 13 July, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: To appear in the Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021

  39. Global Information Guided Video Anomaly Detection

    Authors: Hui Lv, Chunyan Xu, Zhen Cui

    Abstract: Video anomaly detection (VAD) is currently a challenging task due to the complexity of anomaly as well as the lack of labor-intensive temporal annotations. In this paper, we propose an end-to-end Global Information Guided (GIG) anomaly detection framework for anomaly detection using the video-level annotations (i.e., weak labels). We propose to first mine the global pattern cues by leveraging the… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  40. arXiv:2104.06689  [pdf, other

    cs.CV

    Learning Normal Dynamics in Videos with Meta Prototype Network

    Authors: Hui Lv, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, Jian Yang

    Abstract: Frame reconstruction (current or future frame) based on Auto-Encoder (AE) is a popular method for video anomaly detection. With models trained on the normal data, the reconstruction errors of anomalous scenes are usually much larger than those of normal ones. Previous methods introduced the memory bank into AE, for encoding diverse normal patterns across the training videos. However, they are memo… ▽ More

    Submitted 10 May, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: 9 pages, 4 figures, 6 tables

  41. arXiv:2103.09063  [pdf, other

    cs.SD eess.AS

    An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

    Authors: Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur

    Abstract: We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: 5 pages, 5 figures, icassp

  42. arXiv:2102.04488  [pdf, other

    cs.CL cs.SD eess.AS

    Wake Word Detection with Streaming Transformers

    Authors: Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

    Abstract: Modern wake word detection systems usually rely on neural networks for acoustic modeling. Transformers has recently shown superior performance over LSTM and convolutional networks in various sequence modeling tasks with their better temporal modeling power. However it is not clear whether this advantage still holds for short-range temporal modeling like wake word detection. Besides, the vanilla Tr… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted at IEEE ICASSP 2021. 5 pages, 3 figures

  43. A Survey on Ensemble Learning under the Era of Deep Learning

    Authors: Yongquan Yang, Haijun Lv, Ning Chen

    Abstract: Due to the dominant position of deep learning (mostly deep neural networks) in various artificial intelligence applications, recently, ensemble learning based on deep neural networks (ensemble deep learning) has shown significant performances in improving the generalization of learning system. However, since modern deep neural networks usually have millions to billions of parameters, the time and… ▽ More

    Submitted 27 September, 2022; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: 47 pages, 8 figures, 15 tables

    ACM Class: A.1

    Journal ref: Artificial Intelligence Review, 2022

  44. arXiv:2012.01295  [pdf, other

    cs.CL cs.CV

    Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

    Authors: Jing Su, Chenghua Lin, Mian Zhou, Qingyun Dai, Haoyu Lv

    Abstract: In this paper, we propose an end-to-end CNN-LSTM model for generating descriptions for sequential images with a local-object attention mechanism. To generate coherent descriptions, we capture global semantic context using a multi-layer perceptron, which learns the dependencies between sequential images. A paralleled LSTM network is exploited for decoding the sequence descriptions. Experimental res… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted by INLG 2018

  45. arXiv:2011.09301  [pdf, other

    cs.SD eess.AS

    Context-aware RNNLM Rescoring for Conversational Speech Recognition

    Authors: Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie

    Abstract: Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new cont… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  46. Localizing Anomalies from Weakly-Labeled Videos

    Authors: Hui Lv, Chuanwei Zhou, Chunyan Xu, Zhen Cui, Jian Yang

    Abstract: Video anomaly detection under video-level labels is currently a challenging task. Previous works have made progresses on discriminating whether a video sequencecontains anomalies. However, most of them fail to accurately localize the anomalous events within videos in the temporal domain. In this paper, we propose a Weakly Supervised Anomaly Localization (WSAL) method focusing on temporally localiz… ▽ More

    Submitted 14 April, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

  47. arXiv:2007.13058  [pdf

    cs.IR cs.CL cs.LG

    Do recommender systems function in the health domain: a system review

    Authors: Jia Su, Yi Guan, Yuge Li, Weile Chen, He Lv, Yageng Yan

    Abstract: Recommender systems have fulfilled an important role in everyday life. Recommendations such as news by Google, videos by Netflix, goods by e-commerce providers, etc. have heavily changed everyones lifestyle. Health domains contain similar decision-making problems such as what to eat, how to exercise, and what is the proper medicine for a patient. Recently, studies focused on recommender systems to… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

    Comments: 32 pages, 1 table, 1 figure, 38 discussed articles

    MSC Class: 68U35 ACM Class: H.4.0

  48. arXiv:2005.08347  [pdf, other

    eess.AS cs.CL cs.SD

    Wake Word Detection with Alignment-Free Lattice-Free MMI

    Authors: Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

    Abstract: Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-tra… ▽ More

    Submitted 28 July, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Accepted at Interspeech 2020. 5 pages, 3 figures

  49. arXiv:1911.07706  [pdf, other

    cs.GT cs.MA

    Mechanism Design with Predicted Task Revenue for Bike Sharing Systems

    Authors: Hongtao Lv, Chaoli Zhang, Zhenzhe Zheng, Tie Luo, Fan Wu, Guihai Chen

    Abstract: Bike sharing systems have been widely deployed around the world in recent years. A core problem in such systems is to reposition the bikes so that the distribution of bike supply is reshaped to better match the dynamic bike demand. When the bike-sharing company or platform is able to predict the revenue of each reposition task based on historic data, an additional constraint is to cap the payment… ▽ More

    Submitted 3 July, 2023; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020; This is the full version that contains all the proofs

  50. arXiv:1909.08723  [pdf, other

    cs.CL cs.SD eess.AS

    Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

    Authors: Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language… ▽ More

    Submitted 14 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted to ASRU 2019