Skip to main content

Showing 1–50 of 147 results for author: Lai, J

  1. arXiv:2407.11588  [pdf, other

    cs.CV

    Progressive Pretext Task Learning for Human Trajectory Prediction

    Authors: Xiaotong Lin, Tianming Liang, Jianhuang Lai, Jian-Fang Hu

    Abstract: Human trajectory prediction is a practical task of predicting the future positions of pedestrians on the road, which typically covers all temporal ranges from short-term to long-term within a trajectory. However, existing works attempt to address the entire trajectory prediction with a singular, uniform training paradigm, neglecting the distinction between short-term and long-term dynamics in huma… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.04368  [pdf, other

    cs.CL cs.SD eess.AS

    Romanization Encoding For Multilingual ASR

    Authors: Wen Ding, Fei Jia, Hainan Xu, Yu Xi, Junjie Lai, Boris Ginsburg

    Abstract: We introduce romanization encoding for script-heavy languages to optimize multilingual and code-switching Automatic Speech Recognition (ASR) systems. By adopting romanization encoding alongside a balanced concatenated tokenizer within a FastConformer-RNNT framework equipped with a Roman2Char module, we significantly reduce vocabulary and output dimensions, enabling larger training batches and redu… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.01905  [pdf, other

    cs.CV

    Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning

    Authors: Jiawei Zhan, Jinxiang Lai, Bin-Bin Gao, Jun Liu, Xiaochen Chen, Chengjie Wang

    Abstract: Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2407.01894  [pdf, other

    cs.CV cs.HC

    Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

    Authors: Zixing Li, Chao Yan, Zhen Lan, Xiaojia Xiang, Han Zhou, Jun Lai, Dengqing Tang

    Abstract: Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and ver… ▽ More

    Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages,15 figures

  5. arXiv:2406.14976  [pdf, other

    eess.IV cs.CV

    CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of developing cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.14964  [pdf, other

    cs.CV

    VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

    Authors: Zixuan Chen, Ruijie Su, Jiahao Zhu, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2405.19736  [pdf, other

    cs.AI

    Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications

    Authors: Dayang Liang, Jinyang Lai, Yunlong Liu

    Abstract: How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications, and current approaches usually learn task-relevant state representations within visual reinforcement learning to address this problem. While prior work typically introduces one-step behavioral similarity metrics with elements (e.g., rewards and actions) to extract task-relevant state… ▽ More

    Submitted 30 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  8. arXiv:2405.13053  [pdf, other

    cs.CL cs.AI

    MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models

    Authors: Jingwei Xu, Junyu Lai, Yunpeng Huang

    Abstract: The pretrain+fine-tune paradigm is foundational in deploying large language models (LLMs) across a diverse range of downstream applications. Among these, Low-Rank Adaptation (LoRA) stands out for its parameter-efficient fine-tuning (PEFT), producing numerous off-the-shelf task-specific LoRA adapters. However, this approach requires explicit task intention selection, posing challenges for automatic… ▽ More

    Submitted 24 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: 23 pages

    ACM Class: I.2.7

  9. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  10. arXiv:2403.14513  [pdf, other

    cs.CV

    View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

    Authors: Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai

    Abstract: Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dr… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  11. arXiv:2403.11463  [pdf, other

    cs.CV

    Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

    Authors: Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Video Paragraph Grounding (VPG) is an emerging task in video-language understanding, which aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video. However, existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire. In this work, we introduce and explore Weakly-Supervised… ▽ More

    Submitted 14 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. v2: fix a typo in figure 1

  12. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  13. arXiv:2402.18078  [pdf, other

    cs.CV

    Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

    Authors: Yanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

    Abstract: Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance to the target pose, they are prone to overfitting due to the lack of a high-level semantic understanding on the source person image. In this paper, we propose a novel Coarse-to-Fine L… ▽ More

    Submitted 9 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024 (Highlight)

  14. arXiv:2402.01148  [pdf, other

    math.ST cs.LG stat.ML

    The Optimality of Kernel Classifiers in Sobolev Space

    Authors: Jianfa Lai, Zhifan Li, Dongming Huang, Qian Lin

    Abstract: Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability $η(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper bound on the classification excess risk of a k… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 21 pages, 2 figures

    MSC Class: 62G08 (Primary); 68T07; 46E22 (secondary) ACM Class: G.3

  15. arXiv:2401.01755  [pdf, other

    cs.SD cs.AI eess.AS

    Incremental FastPitch: Chunk-based High Quality Text to Speech

    Authors: Muyang Du, Chuan Liu, Junjie Lai

    Abstract: Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table

  16. arXiv:2312.17508  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion

    Authors: Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie

    Abstract: Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. Existing approaches cannot well express fine-grained emotional attributes. In this paper, we propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion. We introduce a two-stage pipeline to effect… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by INTERSPEECH 2023

  17. arXiv:2312.16954  [pdf, other

    cs.CR

    Blockchain-based Privacy-Preserving Public Key Searchable Encryption with Strong Traceability

    Authors: Yue Han, Jinguang Han, Weizhi Meng, Jianchang Lai, Ge Wu

    Abstract: Public key searchable encryption (PKSE) scheme allows data users to search over encrypted data. To identify illegal users, many traceable PKSE schemes have been proposed. However, existing schemes cannot trace the keywords which illegal users searched and protect users' privacy simultaneously. In some practical applications, tracing both illegal users' identities and the keywords which they search… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  18. arXiv:2312.12049  [pdf, other

    cs.CR cs.LG

    EncryIP: A Practical Encryption-Based Framework for Model Intellectual Property Protection

    Authors: Xin Mu, Yu Wang, Zhengan Huang, Junzuo Lai, Yehong Zhang, Hui Wang, Yue Yu

    Abstract: In the rapidly growing digital economy, protecting intellectual property (IP) associated with digital products has become increasingly important. Within this context, machine learning (ML) models, being highly valuable digital assets, have gained significant attention for IP protection. This paper introduces a practical encryption-based framework called \textit{EncryIP}, which seamlessly integrate… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  19. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  20. arXiv:2312.10983  [pdf, other

    cs.CV

    MatchDet: A Collaborative Framework for Image Matching and Object Detection

    Authors: Jinxiang Lai, Wenlong Wu, Bin-Bin Gao, Jun Liu, Jiawei Zhan, Congchong Nie, Yi Zeng, Chengjie Wang

    Abstract: Image matching and object detection are two fundamental and challenging tasks, while many related applications consider them two individual tasks (i.e. task-individual). In this paper, a collaborative framework called MatchDet (i.e. task-collaborative) is proposed for image matching and object detection to obtain mutual improvements. To achieve the collaborative learning of the two tasks, we propo… ▽ More

    Submitted 4 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Journal ref: AAAI 2024

  21. arXiv:2312.07871  [pdf, other

    cs.CV

    MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain Adaptation

    Authors: Yanzuo Lu, Meng Shen, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

    Abstract: Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the similar known and unknown class. To address these is… ▽ More

    Submitted 27 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024 (Poster)

  22. arXiv:2312.03718  [pdf, other

    cs.CL cs.AI

    Large Language Models in Law: A Survey

    Authors: Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, Philip S. Yu

    Abstract: The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI wi… ▽ More

    Submitted 25 November, 2023; originally announced December 2023.

    Comments: Preprint

  23. arXiv:2311.12351  [pdf, other

    cs.CL cs.LG

    Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

    Authors: Yunpeng Huang, Jingwei Xu, Junyu Lai, Zixu Jiang, Taolue Chen, Zenan Li, Yuan Yao, Xiaoxing Ma, Lijuan Yang, Hao Chen, Shupeng Li, Penghao Zhao

    Abstract: Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encou… ▽ More

    Submitted 23 February, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 40 pages, 3 figures, 4 tables

    ACM Class: I.2.7; I.2.6; I.2.11

  24. arXiv:2310.13259  [pdf

    eess.IV cs.CV

    Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

    Authors: Jeremy Lai, Faruk Ahmed, Supriya Vijay, Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Fayaz Jamil, Yossi Matias, Greg S. Corrado, Dale R. Webster, Jonathan Krause, Yun Liu, Po-Hsuan Cameron Chen, Ellery Wulczyn, David F. Steiner

    Abstract: Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 4 main tables, 3 main figures, additional supplemental tables and figures

  25. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  26. arXiv:2309.09843  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Instruction-Following Speech Recognition

    Authors: Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

    Abstract: Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions. With the advent of Large Language Models (LLMs) in speech processing, more organic, text-prompt-based interactions have become possible. However, the mechanisms behind these models' speech understanding and "reasoning" capabilities remai… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  27. arXiv:2307.05270  [pdf, other

    eess.IV cs.CV

    APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie

    Abstract: Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based mapping between sinograms and CT images. However, these methods have not considered the correlation between adjacent projection views… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  28. arXiv:2306.13923  [pdf, other

    cs.LG cs.AI

    Active Data Acquisition in Autonomous Driving Simulation

    Authors: Jianyu Lai, Zexuan Jia, Boao Li

    Abstract: Autonomous driving algorithms rely heavily on learning-based models, which require large datasets for training. However, there is often a large amount of redundant information in these datasets, while collecting and processing these datasets can be time-consuming and expensive. To address this issue, this paper proposes the concept of an active data-collecting strategy. For high-quality data, incr… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  29. arXiv:2306.09614  [pdf, other

    cs.LG cs.SI

    HomoGCL: Rethinking Homophily in Graph Contrastive Learning

    Authors: Wen-Zhi Li, Chang-Dong Wang, Hui Xiong, Jian-Huang Lai

    Abstract: Contrastive learning (CL) has become the de-facto learning paradigm in self-supervised learning on graphs, which generally follows the "augmenting-contrasting" learning scheme. However, we observe that unlike CL in computer vision domain, CL in graph domain performs decently even without augmentation. We conduct a systematic analysis of this phenomenon and argue that homophily, i.e., the principle… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to KDD 2023 Research Track

  30. arXiv:2306.09612  [pdf, other

    cs.LG

    GraphSHA: Synthesizing Harder Samples for Class-Imbalanced Node Classification

    Authors: Wen-Zhi Li, Chang-Dong Wang, Hui Xiong, Jian-Huang Lai

    Abstract: Class imbalance is the phenomenon that some classes have much fewer instances than others, which is ubiquitous in real-world graph-structured scenarios. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would under-represent minor class samples. We investigate this phenomenon and discover that the subspaces of minor classes being squeezed by those of the major ones in the latent… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to KDD 2023 Research Track

  31. arXiv:2306.06288  [pdf, other

    cs.CV

    SAGE-NDVI: A Stereotype-Breaking Evaluation Metric for Remote Sensing Image Dehazing Using Satellite-to-Ground NDVI Knowledge

    Authors: Zepeng Liu, Zhicheng Yang, Mingye Zhu, Andy Wong, Yibing Wei, Mei Han, Jun Yu, Jui-Hsin Lai

    Abstract: Image dehazing is a meaningful low-level computer vision task and can be applied to a variety of contexts. In our industrial deployment scenario based on remote sensing (RS) images, the quality of image dehazing directly affects the grade of our crop identification and growth monitoring products. However, the widely used peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) prov… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted by ICME 2023 Industry Track

  32. arXiv:2305.18506  [pdf, other

    stat.ML cs.LG

    Generalization Ability of Wide Residual Networks

    Authors: Jianfa Lai, Zixiong Yu, Songtao Tian, Qian Lin

    Abstract: In this paper, we study the generalization ability of the wide residual network on $\mathbb{S}^{d-1}$ with the ReLU activation function. We first show that as the width $m\rightarrow\infty$, the residual network kernel (RNK) uniformly converges to the residual neural tangent kernel (RNTK). This uniform convergence further guarantees that the generalization error of the residual network converges t… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 28 pages, 3 figures

    MSC Class: 62G08 (Primary); 68T07; 46E22 (secondary) ACM Class: G.3

  33. arXiv:2305.14091  [pdf, other

    cs.CL cs.AI

    Revisiting Acceptability Judgements

    Authors: Hai Hu, Ziyin Zhang, Weifang Huang, Jackie Yan-Ki Lai, Aini Li, Yina Patterson, Jiahui Huang, Peng Zhang, Chien-Jer Charles Lin, Rui Wang

    Abstract: In this work, we revisit linguistic acceptability in the context of large language models. We introduce CoLAC - Corpus of Linguistic Acceptability in Chinese, the first large-scale acceptability dataset for a non-Indo-European language. It is verified by native speakers and is the first acceptability dataset that comes with two sets of labels: a linguist label and a crowd label. Our experiments sh… ▽ More

    Submitted 27 September, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  34. arXiv:2305.11686  [pdf, other

    eess.IV cs.CV cs.RO

    Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

    Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

    Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual envi… ▽ More

    Submitted 27 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms). arXiv admin note: text overlap with arXiv:2305.10883

  35. arXiv:2305.10883  [pdf, other

    cs.AI cs.CV eess.IV

    Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

    Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

    Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real d… ▽ More

    Submitted 27 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release

  36. arXiv:2305.07386  [pdf, other

    cs.LG

    One-step Bipartite Graph Cut: A Normalized Formulation and Its Application to Scalable Subspace Clustering

    Authors: Si-Guo Fang, Dong Huang, Chang-Dong Wang, Jian-Huang Lai

    Abstract: The bipartite graph structure has shown its promising ability in facilitating the subspace clustering and spectral clustering algorithms for large-scale datasets. To avoid the post-processing via k-means during the bipartite graph partitioning, the constrained Laplacian rank (CLR) is often utilized for constraining the number of connected components (i.e., clusters) in the bipartite graph, which,… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  37. arXiv:2304.10093  [pdf, other

    cs.CV

    Clustered-patch Element Connection for Few-shot Learning

    Authors: Jinxiang Lai, Siqian Yang, Junhong Zhou, Wenlong Wu, Xiaochen Chen, Jun Liu, Bin-Bin Gao, Chengjie Wang

    Abstract: Weak feature representation problem has influenced the performance of few-shot classification task for a long time. To alleviate this problem, recent researchers build connections between support and query instances through embedding patch features to generate discriminative representations. However, we observe that there exists semantic mismatches (foreground/ background) among these local patche… ▽ More

    Submitted 10 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Journal ref: IJCAI 2023

  38. arXiv:2303.16242  [pdf, other

    eess.IV cs.CV

    CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution

    Authors: Zixuan Chen, Jian-Huang Lai, Lingxiao Yang, Xiaohua Xie

    Abstract: Medical image arbitrary-scale super-resolution (MIASSR) has recently gained widespread attention, aiming to super sample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their application in various scenarios. To overcome these li… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: This paper is accepted by the International Conference on Computer Vision (ICCV) 2023

  39. arXiv:2303.16191  [pdf, other

    cs.CV

    Hard Nominal Example-aware Template Mutual Matching for Industrial Anomaly Detection

    Authors: Zixuan Chen, Xiaohua Xie, Lingxiao Yang, Jianhuang Lai

    Abstract: Anomaly detectors are widely used in industrial production to detect and localize unknown defects in query images. These detectors are trained on nominal images and have shown success in distinguishing anomalies from most normal samples. However, hard-nominal examples are scattered and far apart from most normalities, they are often mistaken for anomalies by existing anomaly detectors. To address… ▽ More

    Submitted 4 April, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  40. arXiv:2303.14133  [pdf, other

    eess.IV cs.CR cs.CV

    Adversarial Attack and Defense for Medical Image Analysis: Methods and Applications

    Authors: Junhao Dong, Junxi Chen, Xiaohua Xie, Jianhuang Lai, Hao Chen

    Abstract: Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in deep medical diagnosis systems. In this expositio… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  41. arXiv:2303.09281  [pdf, other

    cs.CV

    SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

    Authors: Jinxiang Lai, Siqian Yang, Wenlong Wu, Tao Wu, Guannan Jiang, Xi Wang, Jun Liu, Bin-Bin Gao, Wei Zhang, Yuan Xie, Chengjie Wang

    Abstract: Recent Few-Shot Learning (FSL) methods put emphasis on generating a discriminative embedding features to precisely measure the similarity between support and query sets. Current CNN-based cross-attention approaches generate discriminative representations via enhancing the mutually semantic similar regions of support and query pairs. However, it suffers from two problems: CNN structure produces ina… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Journal ref: AAAI 2023

  42. arXiv:2303.09085  [pdf, other

    cs.LG

    Preoperative Prognosis Assessment of Lumbar Spinal Surgery for Low Back Pain and Sciatica Patients based on Multimodalities and Multimodal Learning

    Authors: Li-Chin Chen, Jung-Nien Lai, Hung-En Lin, Hsien-Te Chen, Kuo-Hsuan Hung, Yu Tsao

    Abstract: Low back pain (LBP) and sciatica may require surgical therapy when they are symptomatic of severe pain. However, there is no effective measures to evaluate the surgical outcomes in advance. This work combined elements of Eastern medicine and machine learning, and developed a preoperative assessment tool to predict the prognosis of lumbar spinal surgery in LBP and sciatica patients. Standard operat… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  43. arXiv:2302.05933  [pdf, other

    stat.ML cs.LG

    Generalization Ability of Wide Neural Networks on $\mathbb{R}$

    Authors: Jianfa Lai, Manyun Xu, Rui Chen, Qian Lin

    Abstract: We perform a study on the generalization ability of the wide two-layer ReLU neural network on $\mathbb{R}$. We first establish some spectral properties of the neural tangent kernel (NTK): $a)$ $K_{d}$, the NTK defined on $\mathbb{R}^{d}$, is positive definite; $b)$ $λ_{i}(K_{1})$, the $i$-th largest eigenvalue of $K_{1}$, is proportional to $i^{-2}$. We then show that: $i)$ when the width… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: 47 pages, 4 figures

    MSC Class: 62G08 (Primary); 68T07 (secondary); 46E22 ACM Class: G.3

  44. arXiv:2302.00564  [pdf, other

    cs.LG stat.ML

    Automatically Marginalized MCMC in Probabilistic Programming

    Authors: Jinlin Lai, Javier Burroni, Hui Guan, Daniel Sheldon

    Abstract: Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of… ▽ More

    Submitted 1 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted to the 40th International Conference on Machine Learning (ICML 2023)

  45. arXiv:2211.13939  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Incremental Text-to-Speech on GPUs

    Authors: Muyang Du, Chuan Liu, Jiaxing Qi, Junjie Lai

    Abstract: Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end… ▽ More

    Submitted 5 December, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures

  46. arXiv:2211.04717  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

    Authors: Yu Chen, Wen Ding, Junjie Lai

    Abstract: Noisy Student Training (NST) has recently demonstrated extremely strong performance in Automatic Speech Recognition(ASR). In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks. Hypotheses with and without a Language Model are generated and the CER differences between them are utilized as a filter threshold. Resu… ▽ More

    Submitted 1 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: This paper is accepted by the ICASSP 2023 conference

  47. arXiv:2211.00890  [pdf, other

    cs.CV

    Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

    Authors: Jinxiang Lai, Siqian Yang, Guannan Jiang, Xi Wang, Yuxi Li, Zihui Jia, Xiaochen Chen, Jun Liu, Bin-Bin Gao, Wei Zhang, Yuan Xie, Chengjie Wang

    Abstract: Few-shot learning problem focuses on recognizing unseen classes given a few labeled images. In recent effort, more attention is paid to fine-grained feature embedding, ignoring the relationship among different distance metrics. In this paper, for the first time, we investigate the contributions of different distance metrics, and propose an adaptive fusion scheme, bringing significant improvements… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Journal ref: Proceedings of the 30th ACM International Conference on Multimedia 2022

  48. arXiv:2211.00868  [pdf, other

    cs.CV

    tSF: Transformer-based Semantic Filter for Few-Shot Learning

    Authors: Jinxiang Lai, Siqian Yang, Wenlong Liu, Yi Zeng, Zhongyi Huang, Wenlong Wu, Jun Liu, Bin-Bin Gao, Chengjie Wang

    Abstract: Few-Shot Learning (FSL) alleviates the data shortage challenge via embedding discriminative target-aware features among plenty seen (base) and few unseen (novel) labeled samples. Most feature embedding modules in recent FSL methods are specially designed for corresponding learning tasks (e.g., classification, segmentation, and object detection), which limits the utility of embedding features. To t… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Journal ref: European Conference on Computer Vision (ECCV 2022)

  49. arXiv:2211.00525  [pdf, other

    cs.CV cs.LG

    The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training

    Authors: Junhao Dong, Seyed-Mohsen Moosavi-Dezfooli, Jianhuang Lai, Xiaohua Xie

    Abstract: Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. These methods usually regularize the difference between output probabilities for an adversarial and its c… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  50. arXiv:2209.07809  [pdf, other

    cs.LG cs.AI

    M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

    Authors: Zhe Zhang, Yukun Zou, Junjie Lai, Qing Xu

    Abstract: Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep neural networks and leads to a widespread application of reinforcement learning. One challenging problem when applying DQN or other reinforcement learning algorithms to real world problem is data collection. Therefore, how to improve data efficiency is one of the most important problems in the researc… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: 5 pages