Skip to main content

Showing 1–50 of 64 results for author: Lei, M

  1. arXiv:2406.18849  [pdf, other

    cs.CV

    Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

    Authors: Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

    Abstract: Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2405.11846  [pdf, other

    cs.CV

    EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling

    Authors: Mengqi Lei, Xin Wang

    Abstract: Accurate segmentation of polyps in colonoscopy images is essential for early-stage diagnosis and management of colorectal cancer. Despite advancements in deep learning for polyp segmentation, enduring limitations persist. The edges of polyps are typically ambiguous, making them difficult to discern from the background, and the model performance is often compromised by the influence of irrelevant o… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2405.07553  [pdf

    cs.RO

    Space Domain based Ecological Cooperative and Adaptive Cruise Control on Rolling Terrain

    Authors: Mingyue Lei, Haoran Wang, Duo Li, Zhenning Li, Ashish Dhamaniya, Jia Hu

    Abstract: Ecological Cooperative and Adaptive Cruise Control (Eco-CACC) is widely focused to enhance sustainability of CACC. However, state-of-the-art Eco-CACC studies are still facing challenges in adopting on rolling terrain. Furthermore, they cannot ensure both ecology optimality and computational efficiency. Hence, this paper proposes a nonlinear optimal control based Eco-CACC controller. It has the fol… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  4. arXiv:2405.07543  [pdf

    cs.LG cs.RO

    Accelerating the Evolution of Personalized Automated Lane Change through Lesson Learning

    Authors: Jia Hu, Mingyue Lei, Duo Li, Zhenning Li, Jaehyun, So, Haoran Wang

    Abstract: Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: lea… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  5. arXiv:2405.01923  [pdf, other

    cs.RO

    Task-Driven Computational Framework for Simultaneously Optimizing Design and Mounted Pose of Modular Reconfigurable Manipulators

    Authors: Maolin Lei, Edoardo Romiti, Arturo Laurenz, Nikos G. Tsagarakis

    Abstract: Modular reconfigurable manipulators enable quick adaptation and versatility to address different application environments and tailor to the specific requirements of the tasks. Task performance significantly depends on the manipulator's mounted pose and morphology design, therefore posing the need of methodologies for selecting suitable modular robot configurations and mounted pose that can address… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  6. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  7. arXiv:2403.11091  [pdf, other

    cs.SD cs.CV eess.AS

    Multitask frame-level learning for few-shot sound event detection

    Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

    Abstract: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures, conference

  8. arXiv:2312.11547  [pdf, other

    cs.AI cs.LG

    A Unified Pre-training and Adaptation Framework for Combinatorial Optimization on Graphs

    Authors: Ruibin Zeng, Minglong Lei, Lingfeng Niu, Lan Cheng

    Abstract: Combinatorial optimization (CO) on graphs is a classic topic that has been extensively studied across many scientific and industrial fields. Recently, solving CO problems on graphs through learning methods has attracted great attention. Advanced deep learning methods, e.g., graph neural networks (GNNs), have been used to effectively assist the process of solving COs. However, current frameworks ba… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  9. arXiv:2311.08188  [pdf, ps, other

    cs.IT eess.SP

    Fast List Decoding of High-Rate Polar Codes

    Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

    Abstract: Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developi… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures

  10. arXiv:2308.03729  [pdf, other

    cs.CV cs.AI

    Tiny LVLM-eHub: Early Multimodal Experiments with Bard

    Authors: Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo

    Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promoting comprehensive comprehension and reasoning across various domains. This work presents an early and holistic evaluation of LVLMs' multimodal abilit… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 24 pages, 24 figures, 7 Tables. Project Page: http://lvlm-ehub.opengvlab.com/

  11. arXiv:2306.09265  [pdf, other

    cs.CV cs.AI

    LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

    Authors: Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo

    Abstract: Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation of their efficacy. This paper presents a comprehensive evaluation of publicly available large multimodal models by building a LVLM evaluation Hub (LVLM-eHub). Our LVLM-eHub consists of $8$ representative LVLMs such as InstructBL… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: 28 pages, 10 figures, a comprehensive evaluation of large vision-language models

  12. arXiv:2306.06877  [pdf, other

    cs.CV

    Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

    Authors: AnLan Sun, Zhao Zhang, Meng Lei, Yuting Dai, Dong Wang, Liwei Wang

    Abstract: Breast ultrasound videos contain richer information than ultrasound images, therefore it is more meaningful to develop video models for this diagnosis task. However, the collection of ultrasound video datasets is much harder. In this paper, we explore the feasibility of enhancing the performance of ultrasound video classification using the static image dataset. To this end, we propose KGA-Net and… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Medical Image Computing and Computer-Assisted Intervention 2023

  13. arXiv:2302.14510  [pdf, other

    stat.ML cs.LG

    Bayesian Kernelized Tensor Factorization as Surrogate for Bayesian Optimization

    Authors: Mengying Lei, Lijun Sun

    Abstract: Bayesian optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model, mostly with a simple stationary and separable kernel function such as the squared-exponential kernel with automatic relevance determination (SE-ARD). However, such simple kernel specifications are deficient in learning functions with complex features, such as being nonstationary, nonseparable, and multimo… ▽ More

    Submitted 26 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

  14. arXiv:2302.01109  [pdf, other

    cs.CV

    GraphReg: Dynamical Point Cloud Registration with Geometry-aware Graph Signal Processing

    Authors: Zhao Mingyang, Ma Lei, Jia Xiaohong, Yan Dong-Ming, Huang Tiejun

    Abstract: This study presents a high-accuracy, efficient, and physically induced method for 3D point cloud registration, which is the core of many important 3D vision problems. In contrast to existing physics-based methods that merely consider spatial point information and ignore surface geometry, we explore geometry aware rigid-body dynamics to regulate the particle (point) motion, which results in more pr… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  15. arXiv:2301.06051  [pdf, other

    cs.CV

    DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

    Authors: Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, Liwei Wang

    Abstract: Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D perception. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clo… ▽ More

    Submitted 20 March, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

    Comments: Accepted by CVPR2023

  16. arXiv:2211.01505  [pdf, other

    physics.ins-det cs.CV

    Implicit Neural Representation as a Differentiable Surrogate for Photon Propagation in a Monolithic Neutrino Detector

    Authors: Minjie Lei, Ka Vang Tsang, Sean Gasiorowski, Chuan Li, Youssef Nashed, Gianluca Petrillo, Olivia Piazza, Daniel Ratner, Kazuhiro Terao

    Abstract: Optical photons are used as signal in a wide variety of particle detectors. Modern neutrino experiments employ hundreds to tens of thousands of photon detectors to observe signal from millions to billions of scintillation photons produced from energy deposition of charged particles. These neutrino detectors are typically large, containing kilotons of target volume, with different optical propertie… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  17. arXiv:2210.01063   

    cs.LG

    On Stability and Generalization of Bilevel Optimization Problem

    Authors: Meng Ding, Mingxi Lei, Yunwen Lei, Di Wang, Jinhui Xu

    Abstract: (Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization… ▽ More

    Submitted 15 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: This paper currently contains unresolved technical flaws that have the potential to mislead readers. However, we are committed to addressing these issues and improving the quality of the paper in the future

  18. arXiv:2208.09978  [pdf, other

    stat.ML cs.LG

    Bayesian Complementary Kernelized Learning for Multidimensional Spatiotemporal Data

    Authors: Mengying Lei, Aurelie Labbe, Lijun Sun

    Abstract: Probabilistic modeling of multidimensional spatiotemporal data is critical to many real-world applications. As real-world spatiotemporal data often exhibits complex dependencies that are nonstationary and nonseparable, developing effective and computationally efficient statistical models to accommodate nonstationary/nonseparable processes containing both long-range and short-scale variations becom… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

  19. arXiv:2204.12115  [pdf, ps, other

    cs.IT eess.SP

    Fast Successive-Cancellation Decoding of Polar Codes with Sequence Nodes

    Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

    Abstract: Due to the sequential nature of the successive-cancellation (SC) algorithm, the decoding of polar codes suffers from significant decoding latencies. Fast SC decoding is able to speed up the SC decoding process, by implementing parallel decoders at the intermediate levels of the SC decoding tree for some special nodes with specific information and frozen bit patterns. To further improve the paralle… ▽ More

    Submitted 18 November, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: 30 pages, 6 figures, submitted for possible journal publication

  20. arXiv:2203.07691  [pdf, other

    cs.LG

    Supervised Contrastive Learning with Structure Inference for Graph Classification

    Authors: Hao Jia, Junzhong Ji, Minglong Lei

    Abstract: Advanced graph neural networks have shown great potentials in graph classification tasks recently. Different from node classification where node embeddings aggregated from local neighbors can be directly used to learn node labels, graph classification requires a hierarchical accumulation of different levels of topological information to generate discriminative graph embeddings. Still, how to fully… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  21. arXiv:2202.07816  [pdf, other

    eess.AS cs.CL cs.SD

    ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

    Authors: Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao

    Abstract: Expressive text-to-speech (TTS) has become a hot research topic recently, mainly focusing on modeling prosody in speech. Prosody modeling has several challenges: 1) the extracted pitch used in previous prosody modeling works have inevitable errors, which hurts the prosody modeling; 2) different attributes of prosody (e.g., pitch, duration and energy) are dependent on each other and produce the nat… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  22. arXiv:2112.01174  [pdf, other

    cs.LG

    Multi-task Self-distillation for Graph-based Semi-Supervised Learning

    Authors: Yating Ren, Junzhong Ji, Lingfeng Niu, Minglong Lei

    Abstract: Graph convolutional networks have made great progress in graph-based semi-supervised learning. Existing methods mainly assume that nodes connected by graph edges are prone to have similar attributes and labels, so that the features smoothed by local graph structures can reveal the class similarities. However, there often exist mismatches between graph structures and labels in many real-world scena… ▽ More

    Submitted 9 June, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

  23. arXiv:2111.13694  [pdf, other

    cs.SD cs.LG eess.AS

    Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

    Authors: Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

    Abstract: Overlapping speech diarization is always treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set. Specifically, we propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels according to the similarities between speech feat… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022, 5 pages, 2 figures

  24. arXiv:2111.12063  [pdf, other

    cs.PL

    Quantum Advantage for All

    Authors: Christoph M. Kirsch, Stefanie Muroya Lei

    Abstract: We show that the algorithmic complexity of any classical algorithm written in a Turing-complete programming language polynomially bounds the number of quantum bits that are required to run and even symbolically execute the algorithm on a quantum computer. In particular, we show that any classical algorithm $A$ that runs in $\mathcal{O}(f(n))$ time and $\mathcal{O}(g(n))$ space requires no more tha… ▽ More

    Submitted 6 November, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  25. arXiv:2110.13337  [pdf, other

    cs.CV

    Robust Ellipsoid-specific Fitting via Expectation Maximization

    Authors: Zhao Mingyang, Jia Xiaohong, Ma Lei, Qiu Xinlin, Jiang Xin, Yan Dong-Ming

    Abstract: Ellipsoid fitting is of general interest in machine vision, such as object detection and shape approximation. Most existing approaches rely on the least-squares fitting of quadrics, minimizing the algebraic or geometric distances, with additional constraints to enforce the quadric as an ellipsoid. However, they are susceptible to outliers and non-ellipsoid or biased results when the axis ratio exc… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  26. FedSpeech: Federated Text-to-Speech with Continual Learning

    Authors: Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

    Abstract: Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally. However, federated text-to-speech faces several challenges: very few training samples from each speaker are available, training samples are a… ▽ More

    Submitted 22 May, 2023; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by IJCAI 2021

    Journal ref: 2021. Main Track. Pages 3829-3835

  27. arXiv:2109.15257  [pdf, other

    cs.LG cs.SI

    Latent Network Embedding via Adversarial Auto-encoders

    Authors: Minglong Lei, Yong Shi, Lingfeng Niu

    Abstract: Graph auto-encoders have proved to be useful in network embedding task. However, current models only consider explicit structures and fail to explore the informative latent structures cohered in networks. To address this issue, we propose a latent network embedding model based on adversarial graph auto-encoders. Under this framework, the problem of discovering latent structures is formulated as in… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  28. arXiv:2109.12144  [pdf, other

    cs.LG

    Spatial Aggregation and Temporal Convolution Networks for Real-time Kriging

    Authors: Yuankai Wu, Dingyi Zhuang, Mengying Lei, Aurelie Labbe, Lijun Sun

    Abstract: Spatiotemporal kriging is an important application in spatiotemporal data analysis, aiming to recover/interpolate signals for unsampled/unobserved locations based on observed signals. The principle challenge for spatiotemporal kriging is how to effectively model and leverage the spatiotemporal dependencies within the data. Recently, graph neural networks (GNNs) have shown great promise for spatiot… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  29. arXiv:2109.04049  [pdf, other

    cs.SD cs.AI eess.AS

    BeamTransformer: Microphone Array-based Overlapping Speech Detection

    Authors: Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan

    Abstract: We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively ap… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

  30. arXiv:2109.00046  [pdf, other

    stat.ML cs.LG

    Scalable Spatiotemporally Varying Coefficient Modelling with Bayesian Kernelized Tensor Regression

    Authors: Mengying Lei, Aurelie Labbe, Lijun Sun

    Abstract: As a regression technique in spatial statistics, the spatiotemporally varying coefficient model (STVC) is an important tool for discovering nonstationary and interpretable response-covariate associations over both space and time. However, it is difficult to apply STVC for large-scale spatiotemporal analyses due to its high computational cost. To address this challenge, we summarize the spatiotempo… ▽ More

    Submitted 13 April, 2024; v1 submitted 31 August, 2021; originally announced September 2021.

    Journal ref: Bayesian Analysis (2024)

  31. arXiv:2108.04236  [pdf

    cs.CV physics.optics

    An optical biomimetic eyes with interested object imaging

    Authors: Jun Li, Shimei Chen, Shangyuan Wang, Miao Lei, Xiaofang Dai, Chuangxue Liang, Kunyuan Xu, Shuxin Lin, Yuhui Li, Yuer Fan, Ting Zhong

    Abstract: We presented an optical system to perform imaging interested objects in complex scenes, like the creature easy see the interested prey in the hunt for complex environments. It utilized Deep-learning network to learn the interested objects's vision features and designed the corresponding "imaging matrices", furthermore the learned matrixes act as the measurement matrix to complete compressive imagi… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

    Comments: 19pages,7 figures,3 tables

  32. arXiv:2106.09317  [pdf, other

    cs.CL cs.SD eess.AS

    EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

    Authors: Chenye Cui, Yi Ren, Jinglin Liu, Feiyang Chen, Rongjie Huang, Ming Lei, Zhou Zhao

    Abstract: Recently, there has been an increasing interest in neural speech synthesis. While the deep neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to generate a more emotional and more expressive speech is becoming a new challenge to researchers due to the scarcity of high-quality emotion speech dataset and the lack of advanced emotional TTS model. In this paper, we… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted by Interspeech 2021

  33. Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation

    Authors: Xinyu Chen, Mengying Lei, Nicolas Saunier, Lijun Sun

    Abstract: Spatiotemporal traffic time series (e.g., traffic volume/speed) collected from sensing systems are often incomplete with considerable corruption and large amounts of missing values, preventing users from harnessing the full power of the data. Missing data imputation has been a long-standing research topic and critical application for real-world intelligent transportation systems. A widely applied… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Journal ref: IEEE Transactions on Intelligent Transportation Systems (2022)

  34. arXiv:2104.05784  [pdf, other

    cs.SD eess.AS

    Extremely Low Footprint End-to-End ASR System for Smart Device

    Authors: Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin

    Abstract: Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models, e.g. Transformer, have emerged as being superior. Such models have opened the door to deployment of ASR on smart devices, however they still suffer… ▽ More

    Submitted 6 July, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, accepted by INTERSPEECH 2021

  35. arXiv:2011.11384  [pdf, other

    cs.CY

    Influence of Murder Incident of Ride-hailing Drivers on Ride-hailing User's Consuming Willingness in Nanchang

    Authors: Guangxin He, Shenghuan Yang, Miaomiao Lei, Xing Wu, Yixin Sun, Yimeng Dang

    Abstract: Due to the frequent murder incidents of ride-hailing drivers in China in 2018, ride-hailing companies took a series of measures to prevent such incidents and ensure ride-hailing passengers' safety. This study investigated users' willingness to use ride-hailing apps after murder incidents and users' attitudes toward Safety Rectification. We found that murder incidents of ride-hailing drivers had a… ▽ More

    Submitted 27 November, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

  36. arXiv:2011.10363  [pdf

    cs.AI cs.HC

    SophiaPop: Experiments in Human-AI Collaboration on Popular Music

    Authors: David Hanson, Frankie Storm, Wenwei Huang, Vytas Krisciunas, Tiger Darrow, Audrey Brown, Mengna Lei, Matthew Aylett, Adam Pickrell, Sophia the Robot

    Abstract: A diverse team of engineers, artists, and algorithms, collaborated to create songs for SophiaPop, via various neural networks, robotics technologies, and artistic tools, and animated the results on Sophia the Robot, a robotic celebrity and animated character. Sophia is a platform for arts, research, and other uses. To advance the art and technology of Sophia, we combine various AI with a fictional… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: 7 pages, 4 figures

  37. arXiv:2010.15311  [pdf, other

    eess.AS cs.SD

    DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech

    Authors: Zhiying Huang, Hao Li, Ming Lei

    Abstract: With the number of smart devices increasing, the demand for on-device text-to-speech (TTS) increases rapidly. In recent years, many prominent End-to-End TTS methods have been proposed, and have greatly improved the quality of synthesized speech. However, to ensure the qualified speech, most TTS systems depend on large and complex neural network models, and it's hard to deploy these TTS systems on-… ▽ More

    Submitted 14 January, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, Submitted to ICASSP2021

  38. arXiv:2010.14099  [pdf, other

    cs.SD eess.AS

    Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model

    Authors: Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

    Abstract: Recently, online end-to-end ASR has gained increasing attention. However, the performance of online systems still lags far behind that of offline systems, with a large gap in quality of recognition. For specific scenarios, we can trade-off between performance and latency, and can train multiple systems with different delays to match the performance and latency requirements of various application s… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2021

  39. arXiv:2006.12761  [pdf, other

    cs.CV eess.IV

    Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative

    Authors: Mingxi Lei, Bino Varghese, Darryl Hwang, Steven Cen, Xiaomeng Lei, Afshin Azadikhah, Bhushan Desai, Assad Oberai, Vinay Duddalwar

    Abstract: There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchm… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 21 pages, 8 figures

  40. arXiv:2006.06240  [pdf, ps, other

    eess.SP cs.IT cs.LG

    A PDD Decoder for Binary Linear Codes With Neural Check Polytope Projection

    Authors: Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei

    Abstract: Linear Programming (LP) is an important decoding technique for binary linear codes. However, the advantages of LP decoding, such as low error floor and strong theoretical guarantee, etc., come at the cost of high computational complexity and poor performance at the low signal-to-noise ratio (SNR) region. In this letter, we adopt the penalty dual decomposition (PDD) framework and propose a PDD algo… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: This pape has been accepted for publication in IEEE wireless communications letters

  41. arXiv:2006.01713  [pdf, other

    cs.SD eess.AS

    SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

    Authors: Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

    Abstract: End-to-end speech recognition has become popular in recent years, since it can integrate the acoustic, pronunciation and language models into a single neural network. Among end-to-end approaches, attention-based methods have emerged as being superior. For example, Transformer, which adopts an encoder-decoder architecture. The key improvement introduced by Transformer is the utilization of self-att… ▽ More

    Submitted 20 May, 2020; originally announced June 2020.

    Comments: submitted to INTERSPEECH2020

  42. arXiv:2006.01712  [pdf, other

    cs.SD eess.AS

    Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

    Authors: Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

    Abstract: Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention. Many efforts have been paid to turn the non-streaming attention-based E2E-ASR system into streaming architecture. In this work, we propose a novel online E2E-ASR system by using Streaming Chunk-Aware Multihead Attention(SCAMA) and a latency control memory equipped self-attention network (LC-SA… ▽ More

    Submitted 20 May, 2020; originally announced June 2020.

    Comments: submitted to INTERSPEECH2020

  43. arXiv:2005.10463  [pdf, other

    cs.SD cs.CL eess.AS

    Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

    Authors: Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie

    Abstract: Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules - position-wise feedforward layers and self-attention (SAN) layers.… ▽ More

    Submitted 17 November, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: Accepted to SLT 2021

  44. arXiv:2002.07601  [pdf, other

    cs.IT cs.LG eess.SP stat.ML

    ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning

    Authors: Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei

    Abstract: Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms th… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE communications letters

  45. arXiv:1911.11354  [pdf, other

    cs.DS

    Finding Route Hotspots in Large Labeled Networks

    Authors: Mingtao Lei, Xi Zhang, Lingyang Chu, Zhefeng Wang, Philip S. Yu, Binxing Fang

    Abstract: In many advanced network analysis applications, like social networks, e-commerce, and network security, hotspots are generally considered as a group of vertices that are tightly connected owing to the similar characteristics, such as common habits and location proximity. In this paper, we investigate the formation of hotspots from an alternative perspective that considers the routes along the netw… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  46. arXiv:1906.03814  [pdf, other

    eess.SP cs.IT cs.LG stat.ML

    Learned Conjugate Gradient Descent Network for Massive MIMO Detection

    Authors: Yi Wei, Ming-Min Zhao, Mingyi Hong, Min-jian Zhao, Ming Lei

    Abstract: In this work, we consider the use of model-driven deep learning techniques for massive multiple-input multiple-output (MIMO) detection. Compared with conventional MIMO systems, massive MIMO promises improved spectral efficiency, coverage and range. Unfortunately, these benefits are coming at the cost of significantly increased computational complexity. To reduce the complexity of signal detection… ▽ More

    Submitted 1 June, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Part of this work has been accepted by IEEE ICC 2020

  47. arXiv:1904.10045  [pdf, other

    eess.AS cs.NE cs.SD

    Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

    Authors: Shiliang Zhang, Ming Lei, Zhijie Yan

    Abstract: Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced b… ▽ More

    Submitted 27 March, 2019; originally announced April 2019.

    Comments: 6pages, 5 figures

  48. arXiv:1811.02353  [pdf

    eess.SP cs.HC cs.LG

    An amplitudes-perturbation data augmentation method in convolutional neural networks for EEG decoding

    Authors: Xian-Rui Zhang, Meng-Ying Lei, Yang Li

    Abstract: Brain-Computer Interface (BCI) system provides a pathway between humans and the outside world by analyzing brain signals which contain potential neural information. Electroencephalography (EEG) is one of most commonly used brain signals and EEG recognition is an important part of BCI system. Recently, convolutional neural networks (ConvNet) in deep learning are becoming the new cutting edge tools… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  49. arXiv:1810.10353  [pdf

    cs.CV cs.HC

    Boosted Convolutional Neural Networks for Motor Imagery EEG Decoding with Multiwavelet-based Time-Frequency Conditional Granger Causality Analysis

    Authors: Yang Li, Mengying Lei, Xianrui Zhang, Weigang Cui, Yuzhu Guo, Ting-Wen Huang, Hua-Liang Wei

    Abstract: Decoding EEG signals of different mental states is a challenging task for brain-computer interfaces (BCIs) due to nonstationarity of perceptual decision processes. This paper presents a novel boosted convolutional neural networks (ConvNets) decoding scheme for motor imagery (MI) EEG signals assisted by the multiwavelet-based time-frequency (TF) causality analysis. Specifically, multiwavelet basis… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

  50. arXiv:1805.03504  [pdf, other

    cs.LG stat.ML

    Diffusion Based Network Embedding

    Authors: Yong Shi, Minglong Lei, Peng Zhang, Lingfeng Niu

    Abstract: In network embedding, random walks play a fundamental role in preserving network structures. However, random walk based embedding methods have two limitations. First, random walk methods are fragile when the sampling frequency or the number of node sequences changes. Second, in disequilibrium networks such as highly biases networks, random walk methods often perform poorly due to the lack of globa… ▽ More

    Submitted 11 May, 2018; v1 submitted 9 May, 2018; originally announced May 2018.