Skip to main content

Showing 1–50 of 85 results for author: Zhai, S

  1. arXiv:2406.17532  [pdf, other

    cs.AI cs.CL cs.LO

    Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study

    Authors: Keyu Wang, Guilin Qi, Jiaqi Li, Songlin Zhai

    Abstract: Large language models (LLMs) have shown significant achievements in solving a wide range of tasks. Recently, LLMs' capability to store, retrieve and infer with symbolic knowledge has drawn a great deal of attention, showing their potential to understand structured information. However, it is not yet known whether LLMs can understand Description Logic (DL) ontologies. In this work, we empirically a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.06521  [pdf, other

    cs.CV

    PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction

    Authors: Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, Guofeng Zhang

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on sur… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: project page: https://zju3dv.github.io/pgsr/

  3. arXiv:2406.04523  [pdf, other

    cs.CL cs.LG

    Proofread: Fixes All Errors with One Tap

    Authors: Renjie Liu, Yanxiang Zhang, Yun Zhu, Haicheng Sun, Yuanbo Zhang, Michael Xuelin Huang, Shanqing Cai, Lei Meng, Shumin Zhai

    Abstract: The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to mode… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 2 tables

  4. arXiv:2406.01528  [pdf, other

    cs.LG

    Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data

    Authors: Mehmet Velioglu, Song Zhai, Sophia Rupprecht, Alexander Mitsos, Andreas Jupke, Manuel Dahmen

    Abstract: In chemical engineering, process data are expensive to acquire, and complex phenomena are difficult to fully model. We explore the use of physics-informed neural networks (PINNs) for dynamic processes with incomplete mechanistic semi-explicit differential-algebraic equation systems and scarce process data. In particular, we focus on estimating states for which neither direct observational data nor… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: manuscript (32 pages, 9 figures, 11 tables), supporting materials (14 pages, 4 figures, 5 tables)

  5. arXiv:2406.00633  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Improving GFlowNets for Text-to-Image Diffusion Alignment

    Authors: Dinghuai Zhang, Yizhe Zhang, Jiatao Gu, Ruixiang Zhang, Josh Susskind, Navdeep Jaitly, Shuangfei Zhai

    Abstract: Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as alignment to a text description, which can be specified with a black-box reward function. Prior works fine-tune pretrained diffusion models to achieve this goal throu… ▽ More

    Submitted 16 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  6. arXiv:2405.21048  [pdf, other

    cs.CV

    Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

    Authors: Jiatao Gu, Ying Shen, Shuangfei Zhai, Yizhe Zhang, Navdeep Jaitly, Joshua M. Susskind

    Abstract: Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To address this issue, we present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 22 pages, 14 figures

  7. arXiv:2405.14800  [pdf, other

    cs.CR cs.CV

    Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

    Authors: Shengfang Zhai, Huanran Chen, Yinpeng Dong, Jiajun Li, Qingni Shen, Yansong Gao, Hang Su, Yang Liu

    Abstract: Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a potential auditing method for detecting unauthorized data usage. While some efforts have been made on diffusion models, they are not applicable to text-to-image d… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 17 pages, 5 figures; minor typos corrected

  8. arXiv:2404.03109  [pdf, other

    cs.CV

    Many-to-many Image Generation with Auto-regressive Diffusion Models

    Authors: Ying Shen, Yizhe Zhang, Shuangfei Zhai, Lifu Huang, Joshua M. Susskind, Jiatao Gu

    Abstract: Recent advancements in image generation have made significant progress, yet existing models present limitations in perceiving and generating an arbitrary number of interrelated images within a broad context. This limitation becomes increasingly critical as the demand for multi-image scenarios, such as multi-view images and visual narratives, grows with the expansion of multimedia platforms. This p… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  9. arXiv:2403.04732  [pdf, other

    cs.AI cs.CL cs.CV

    How Far Are We from Intelligent Visual Deductive Reasoning?

    Authors: Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

    Abstract: Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven's Progressive Matrices (RPMs), to assess VLMs' abilities to perform multi-hop relati… ▽ More

    Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 AGI workshop. https://github.com/apple/ml-rpm-bench

  10. arXiv:2402.07562  [pdf, other

    cs.CR cs.AI

    Discovering Universal Semantic Triggers for Text-to-Image Synthesis

    Authors: Shengfang Zhai, Weilong Wang, Jiajun Li, Yinpeng Dong, Hang Su, Qingni Shen

    Abstract: Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can indu… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures. Work in progress

  11. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

    Authors: Susan Lin, Jeremy Warner, J. D. Zamfirescu-Pereira, Matthew G. Lee, Sauhard Jain, Michael Xuelin Huang, Piyawat Lertvittayakumjorn, Shanqing Cai, Shumin Zhai, Björn Hartmann, Can Liu

    Abstract: Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates key… ▽ More

    Submitted 7 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: To appear at ACM CHI 2024

  12. arXiv:2401.08541  [pdf, other

    cs.CV

    Scalable Pre-training of Large Autoregressive Image Models

    Authors: Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

    Abstract: This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value o… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: https://github.com/apple/ml-aim

  13. arXiv:2401.05431  [pdf, other

    eess.SP cs.AI cs.LG

    TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal Processing

    Authors: Luyuan Xie, Cong Li, Xin Zhang, Shengfang Zhai, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get m… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: This paper is accept by ICASSP 2024. This is a more detailed version

  14. arXiv:2401.00006  [pdf, other

    cs.AI

    Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

    Authors: Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao, Yu Liu

    Abstract: Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterp… ▽ More

    Submitted 6 February, 2024; v1 submitted 12 December, 2023; originally announced January 2024.

  15. arXiv:2312.14408  [pdf

    cs.CY

    Extended p-median problems for balancing service efficiency and equality

    Authors: Yunfeng Kong, Chenchen Lian, Guangli Zhang, Shiyan Zhai

    Abstract: This article deals with the location problem for balancing the service efficiency and equality. In public service systems, some people may feel envy in case that they need longer travel distance to access services than others. The strength of the envy can be measured by comparing one's travel distance to service facility with a threshold distance. Using the total envy function, four extended p-med… ▽ More

    Submitted 25 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 38 pages, 4 tables, 5 figures

    MSC Class: 90C27 ACM Class: J.6

  16. arXiv:2311.05075  [pdf

    cs.LG cs.AI cs.CL

    Mental Health Diagnosis in the Digital Age: Harnessing Sentiment Analysis on Social Media Platforms upon Ultra-Sparse Feature Content

    Authors: Haijian Shao, Ming Zhu, Shengjie Zhai

    Abstract: Amid growing global mental health concerns, particularly among vulnerable groups, natural language processing offers a tremendous potential for early detection and intervention of people's mental disorders via analyzing their postings and discussions on social media platforms. However, ultra-sparse training data, often due to vast vocabularies and low-frequency words, hinders the analysis accuracy… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  17. arXiv:2310.15111  [pdf, other

    cs.CV cs.LG

    Matryoshka Diffusion Models

    Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

    Abstract: Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion M… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 28 pages, 18 figures

  18. arXiv:2310.07805  [pdf, other

    cs.LG cs.AI

    Generative Modeling with Phase Stochastic Bridges

    Authors: Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai

    Abstract: Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented spac… ▽ More

    Submitted 12 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  19. arXiv:2309.10077  [pdf

    cs.LG cs.AI

    GAME: Generalized deep learning model towards multimodal data integration for early screening of adolescent mental disorders

    Authors: Zhicheng Du, Chenyao Jiang, Xi Yuan, Shiyao Zhai, Zhengyang Lei, Shuyue Ma, Yang Liu, Qihui Ye, Chufan Xiao, Qiming Huang, Ming Xu, Dongmei Yu, Peiwu Qin

    Abstract: The timely identification of mental disorders in adolescents is a global public health challenge.Single factor is difficult to detect the abnormality due to its complex and subtle nature. Additionally, the generalized multimodal Computer-Aided Screening (CAS) systems with interactive robots for adolescent mental disorders are not available. Here, we design an android application with mini-games an… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  20. arXiv:2309.04145  [pdf, other

    cs.CV

    Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM

    Authors: Weijian Xie, Guanyi Chu, Quanhao Qian, Yihao Yu, Hai Li, Danpeng Chen, Shangjin Zhai, Nan Wang, Hujun Bao, Guofeng Zhang

    Abstract: Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense mapping can be performed online even on a mobile phone. Specifica… ▽ More

    Submitted 20 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

  21. arXiv:2308.16552  [pdf, other

    cs.CV

    Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation

    Authors: Yang Liu, Xiaoyun Zhong, Shiyao Zhai, Zhicheng Du, Zhenyuan Gao, Qiming Huang, Canyang Zhang, Bin Jiang, Vijay Kumar Pandey, Sanyang Han, Runming Wang, Yuxing Han, Peiwu Qin

    Abstract: The vast majority of people who suffer unexpected cardiac arrest are performed cardiopulmonary resuscitation (CPR) by passersby in a desperate attempt to restore life, but endeavors turn out to be fruitless on account of disqualification. Fortunately, many pieces of research manifest that disciplined training will help to elevate the success rate of resuscitation, which constantly desires a seamle… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Transformer for Cardiopulmonary Resuscitation

  22. arXiv:2308.16551  [pdf

    eess.IV cs.CV

    Object Detection for Caries or Pit and Fissure Sealing Requirement in Children's First Permanent Molars

    Authors: Chenyao Jiang, Shiyao Zhai, Hengrui Song, Yuqing Ma, Yachen Fan, Yancheng Fang, Dongmei Yu, Canyang Zhang, Sanyang Han, Runming Wang, Yong Liu, Jianbo Li, Peiwu Qin

    Abstract: Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  23. arXiv:2306.14793  [pdf, other

    cs.CR

    Private Federated Learning in Gboard

    Authors: Yuanbo Zhang, Daniel Ramage, Zheng Xu, Yanxiang Zhang, Shumin Zhai, Peter Kairouz

    Abstract: This white paper describes recent advances in Gboard(Google Keyboard)'s use of federated learning, DP-Follow-the-Regularized-Leader (DP-FTRL) algorithm, and secure aggregation techniques to train machine learning (ML) models for suggestion, prediction and correction intelligence from many users' typing data. Gboard's investment in those privacy technologies allows users' typing data to be processe… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  24. arXiv:2306.05544  [pdf, other

    cs.CV cs.LG

    BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

    Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind

    Abstract: Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require signi… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: In progress

  25. arXiv:2306.02531  [pdf, other

    cs.CL

    PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

    Authors: Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

    Abstract: Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they… ▽ More

    Submitted 22 March, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted by NeurIPS 2023, code at https://github.com/apple/ml-planner

  26. arXiv:2305.04175  [pdf, other

    cs.CR cs.CV cs.MM

    Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

    Authors: Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang, Hang Su

    Abstract: With the help of conditioning mechanisms, the state-of-the-art diffusion models have achieved tremendous success in guided image generation, particularly in text-to-image synthesis. To gain a better understanding of the training process and potential risks of text-to-image synthesis, we perform a systematic investigation of backdoor attack on text-to-image diffusion models and propose BadT2I, a ge… ▽ More

    Submitted 22 October, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Carmera-ready version. To appear in ACM MM 2023. Code will be released at: https://github.com/sf-zhai/BadT2I

  27. arXiv:2304.12406  [pdf, other

    cs.CV

    AutoFocusFormer: Image Segmentation off the Grid

    Authors: Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

    Abstract: Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tas… ▽ More

    Submitted 25 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

    ACM Class: I.4.6; I.4.8

  28. arXiv:2304.06700  [pdf, other

    cs.CV cs.LG

    Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images

    Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind

    Abstract: Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets… ▽ More

    Submitted 26 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted by 3DV24

  29. arXiv:2303.06296  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Stabilizing Transformer Training by Preventing Attention Entropy Collapse

    Authors: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

    Abstract: Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low at… ▽ More

    Submitted 25 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Journal ref: In International Conference on Machine Learning (pp. 40770-40803). PMLR. 2023

  30. arXiv:2303.04248  [pdf, other

    cs.LG cs.CV

    TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

    Authors: David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, Eric Gu

    Abstract: Denoising Diffusion models have demonstrated their proficiency for generative sampling. However, generating good samples often requires many iterations. Consequently, techniques such as binary time-distillation (BTD) have been proposed to reduce the number of network calls for a fixed architecture. In this paper, we introduce TRAnsitive Closure Time-distillation (TRACT), a new method that extends… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  31. arXiv:2303.01742  [pdf, other

    cs.CR cs.CL

    NCL: Textual Backdoor Defense Using Noise-augmented Contrastive Learning

    Authors: Shengfang Zhai, Qingni Shen, Xiaoyi Chen, Weilong Wang, Cong Li, Yuejian Fang, Zhonghai Wu

    Abstract: At present, backdoor attacks attract attention as they do great harm to deep learning models. The adversary poisons the training data making the model being injected with a backdoor after being trained unconsciously by victims using the poisoned dataset. In the field of text, however, existing works do not provide sufficient defense against backdoor attacks. In this paper, we propose a Noise-augme… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 6 pages, 5 figures. To appear in ICASSP 2023

  32. arXiv:2302.11165  [pdf, other

    cs.AI

    DNG: Taxonomy Expansion by Exploring the Intrinsic Directed Structure on Non-gaussian Space

    Authors: Songlin Zhai, Weiqing Wang, Yuanfang Li, Yuan Meng

    Abstract: Taxonomy expansion is the process of incorporating a large number of additional nodes (i.e., "queries") into an existing taxonomy (i.e., "seed"), with the most important step being the selection of appropriate positions for each query. Enormous efforts have been made by exploring the seed's structure. However, existing approaches are deficient in their mining of structural information in two ways:… ▽ More

    Submitted 21 March, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: 7figures

  33. arXiv:2211.14247  [pdf, other

    cs.IR

    Group Buying Recommendation Model Based on Multi-task Learning

    Authors: Shuoyao Zhai, Baichuan Liu, Deqing Yang, Yanghua Xiao

    Abstract: In recent years, group buying has become one popular kind of online shopping activity, thanks to its larger sales and lower unit price. Unfortunately, research seldom focuses on recommendations specifically for group buying by now. Although some recommendation models have been proposed for group recommendation, they can not be directly used to achieve real-world group buying recommendation, due to… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  34. arXiv:2210.11082  [pdf, other

    cs.CL cs.CR

    Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

    Authors: Xiaoyi Chen, Baisong Xin, Shengfang Zhai, Shiqing Ma, Qingni Shen, Zhonghai Wu

    Abstract: This paper finds that contrastive learning can produce superior sentence embeddings for pre-trained models but is also vulnerable to backdoor attacks. We present the first backdoor attack framework, BadCSE, for state-of-the-art sentence embeddings under supervised and unsupervised learning settings. The attack manipulates the construction of positive and negative pairs so that the backdoored sampl… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  35. arXiv:2210.04955  [pdf, other

    cs.CV cs.LG

    f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

    Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind

    Abstract: Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains. Standard DMs can be viewed as an instantiation of hierarchical variational autoencoders (VAEs) where the latent variables are inferred from input-centered Gaussian distributions with fixed scales and variances. Unlike VAEs, this formulation limits DMs from changing the latent spaces and learning… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 28 pages, 21 figures, work in progress

  36. PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation

    Authors: Guotai Wang, Xiangde Luo, Ran Gu, Shuojue Yang, Yijie Qu, Shuwei Zhai, Qianfei Zhao, Kang Li, Shaoting Zhang

    Abstract: Background and Objective: Open-source deep learning toolkits are one of the driving forces for developing medical image segmentation models. Existing toolkits mainly focus on fully supervised segmentation and require full and accurate pixel-level annotations that are time-consuming and difficult to acquire for segmentation tasks, which makes learning from imperfect labels highly desired for reduci… ▽ More

    Submitted 4 February, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: 12 pages, 6 figures

    Journal ref: Computer Methods and Programs in Biomedicine, Volume 231, April 2023, 107398

  37. arXiv:2208.05669  [pdf, other

    cs.CV

    PA-Seg: Learning from Point Annotations for 3D Medical Image Segmentation using Contextual Regularization and Cross Knowledge Distillation

    Authors: Shuwei Zhai, Guotai Wang, Xiangde Luo, Qiang Yue, Kang Li, Shaoting Zhang

    Abstract: The success of Convolutional Neural Networks (CNNs) in 3D medical image segmentation relies on massive fully annotated 3D volumes for training that are time-consuming and labor-intensive to acquire. In this paper, we propose to annotate a segmentation target with only seven points in 3D medical images, and design a two-stage weakly supervised learning framework PA-Seg. In the first stage, we emplo… ▽ More

    Submitted 13 February, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: 12 pages, 10 figures, 4 tables; Accepted by IEEE TMI

  38. arXiv:2207.13751  [pdf, other

    cs.CV cs.GR cs.LG

    GAUDI: A Neural Architect for Immersive 3D Scene Generation

    Authors: Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

    Abstract: We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generati… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Project webpage: https://github.com/apple/ml-gaudi

  39. arXiv:2207.07611  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Position Prediction as an Effective Pretraining Strategy

    Authors: Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

    Abstract: Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Tr… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML 2022

  40. arXiv:2207.01158  [pdf, other

    cs.RO cs.CV

    VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

    Authors: Danpeng Chen, Shuai Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Hujun Bao, Guofeng Zhang

    Abstract: In this paper, we propose a tightly-coupled SLAM system fused with RGB, Depth, IMU and structured plane information. Traditional sparse points based SLAM systems always maintain a mass of map points to model the environment. Huge number of map points bring us a high computational complexity, making it difficult to be deployed on mobile devices. On the other hand, planes are common structures in ma… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

  41. arXiv:2206.04817  [pdf, other

    cs.LG math.OC

    The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

    Authors: Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind

    Abstract: The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 ) refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking via a series of empirical studies. Specifically, we uncover an optimization anomaly plaguing adaptive optimizers at extremely late stag… ▽ More

    Submitted 13 June, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Removed Tex formatting commands in title Title and Abstract

  42. arXiv:2206.01832  [pdf, other

    cs.CR cs.CL

    Kallima: A Clean-label Framework for Textual Backdoor Attacks

    Authors: Xiaoyi Chen, Yinpeng Dong, Zeyu Sun, Shengfang Zhai, Qingni Shen, Zhonghai Wu

    Abstract: Although Deep Neural Network (DNN) has led to unprecedented progress in various natural language processing (NLP) tasks, research shows that deep models are extremely vulnerable to backdoor attacks. The existing backdoor attacks mainly inject a small number of poisoned samples into the training dataset with the labels changed to the target one. Such mislabeled samples would raise suspicion upon hu… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  43. arXiv:2205.04230  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    RCMNet: A deep learning model assists CAR-T therapy for leukemia

    Authors: Ruitao Zhang, Xueying Han, Ijaz Gul, Shiyao Zhai, Ying Liu, Yongbing Zhang, Yuhan Dong, Lan Ma, Dongmei Yu, Jin Zhou, Peiwu Qin

    Abstract: Acute leukemia is a type of blood cancer with a high mortality rate. Current therapeutic methods include bone marrow transplantation, supportive therapy, and chemotherapy. Although a satisfactory remission of the disease can be achieved, the risk of recurrence is still high. Therefore, novel treatments are demanding. Chimeric antigen receptor-T (CAR-T) therapy has emerged as a promising approach t… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  44. arXiv:2203.02106  [pdf, other

    eess.IV cs.CV

    Scribble-Supervised Medical Image Segmentation via Dual-Branch Network and Dynamically Mixed Pseudo Labels Supervision

    Authors: Xiangde Luo, Minhao Hu, Wenjun Liao, Shuwei Zhai, Tao Song, Guotai Wang, Shaoting Zhang

    Abstract: Medical image segmentation plays an irreplaceable role in computer-assisted diagnosis, treatment planning, and following-up. Collecting and annotating a large-scale dataset is crucial to training a powerful segmentation model, but producing high-quality segmentation masks is an expensive and time-consuming procedure. Recently, weakly-supervised learning that uses sparse annotations (points, scribb… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: 11 pages, 4 figures,code is available: https://github.com/HiLab-git/WSL4MIS.This is a comprehensive study about scribble-supervised medical image segmentation based on the ACDC dataset

  45. arXiv:2202.08812  [pdf, other

    cs.IR cs.LG

    Should I send this notification? Optimizing push notifications decision making by modeling the future

    Authors: Conor O'Brien, Huasen Wu, Shaodan Zhai, Dalin Guo, Wenzhe Shi, Jonathan J Hunt

    Abstract: Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  46. arXiv:2202.01944  [pdf, other

    cs.LG

    Learning Representation from Neural Fisher Kernel with Low-rank Approximation

    Authors: Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind

    Abstract: In this paper, we study the representation of neural networks from the view of kernels. We first define the Neural Fisher Kernel (NFK), which is the Fisher Kernel applied to neural networks. We show that NFK can be computed for both supervised and unsupervised learning models, which can serve as a unified tool for representation extraction. Furthermore, we show that practical NFKs exhibit low-rank… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  47. arXiv:2201.07681  [pdf, ps, other

    cs.IR cs.LG

    Learning to Rank For Push Notifications Using Pairwise Expected Regret

    Authors: Yuguang Yue, Yuanpu Xie, Huasen Wu, Haofeng Jia, Shaodan Zhai, Wenzhe Shi, Jonathan J Hunt

    Abstract: Listwise ranking losses have been widely studied in recommender systems. However, new paradigms of content consumption present new challenges for ranking methods. In this work we contribute an analysis of learning to rank for personalized mobile push notifications and discuss the unique challenges this presents compared to traditional ranking problems. To address these challenges, we introduce a n… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  48. arXiv:2201.03186  [pdf, other

    eess.IV cs.CV

    MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining Three-Sequence Cardiac Magnetic Resonance Images

    Authors: Lei Li, Fuping Wu, Sihan Wang, Xinzhe Luo, Carlos Martin-Isla, Shuwei Zhai, Jianpeng Zhang, Yanfei Liu7, Zhen Zhang, Markus J. Ankenbrand, Haochuan Jiang, Xiaoran Zhang, Linhong Wang, Tewodros Weldebirhan Arega, Elif Altunok, Zhou Zhao, Feiyan Li, Jun Ma, Xiaoping Yang, Elodie Puybareau, Ilkay Oksuz, Stephanie Bricq, Weisheng Li, Kumaradevan Punithakumar, Sotirios A. Tsaftaris , et al. (7 additional authors not shown)

    Abstract: Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  49. arXiv:2112.01163  [pdf, other

    cs.LG cs.AI cs.RO

    Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

    Authors: Nitish Srivastava, Walter Talbott, Martin Bertran Lopez, Shuangfei Zhai, Josh Susskind

    Abstract: Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space. However, learning world models in unconstrained environments over high-dimensional observation spaces such as images is challenging. One source of difficulty is the presence of irrelevant but hard-to-model background distractions, and unimportant visual details of task-relev… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: NeurIPS Deep Reinforcement Learning Workshop 2021. Code can be found at https://github.com/apple/ml-core

  50. arXiv:2111.04264  [pdf, other

    cs.CV

    Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

    Authors: Chenglong Li, Tianhao Zhu, Lei Liu, Xiaonan Si, Zilin Fan, Sulan Zhai

    Abstract: In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and ca… ▽ More

    Submitted 11 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: In Submission