Skip to main content

Showing 1–50 of 163 results for author: Lan, L

  1. arXiv:2407.03648  [pdf, other

    eess.AS cs.SD

    High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

    Authors: Gael Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra

    Abstract: We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2406.15007  [pdf, other

    cs.AI

    RouteFinder: Towards Foundation Models for Vehicle Routing Problems

    Authors: Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels Wouda, Leon Lan, Kevin Tierney, Jinkyoo Park

    Abstract: Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.11933  [pdf, other

    cs.CV

    Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

    Authors: Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Masked Image Modeling (MIM) has emerged as a pivotal approach for developing foundational visual models in the field of remote sensing (RS). However, current RS datasets are limited in volume and diversity, which significantly constrains the capacity of MIM methods to learn generalizable representations. In this study, we introduce \textbf{RS-4M}, a large-scale dataset designed to enable highly ef… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2405.15056  [pdf, other

    cs.LG cs.CV cs.GR

    ElastoGen: 4D Generative Elastodynamics

    Authors: Yutao Feng, Yintong Shang, Xiang Feng, Lei Lan, Shandian Zhe, Tianjia Shao, Hongzhi Wu, Kun Zhou, Hao Su, Chenfanfu Jiang, Yin Yang

    Abstract: We present ElastoGen, a knowledge-driven model that generates physically accurate and coherent 4D elastodynamics. Instead of relying on petabyte-scale data-driven learning, ElastoGen leverages the principles of physics-in-the-loop and learns from established physical knowledge, such as partial differential equations and their numerical solutions. The core idea of ElastoGen is converting the global… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.12484  [pdf, other

    cs.GR

    Meta-Homogenization for Knitwear Simulation

    Authors: Chun Yuan, Kui Wu, Haoyang Shi, Lei Lan, Yuxing Qiu, Cem Yuksel, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: This paper presents meta-homogenization, a spatially varying homogenization scheme for knitwear simulation. We are motivated by the observation that macro-scale fabric dynamics is strongly correlated with its underlying knitting patterns. Therefore, homogenization towards a single material is less effective when the knitting is complex and non-repetitive. Our method tackles this challenge by homog… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  6. arXiv:2405.11694  [pdf, other

    cs.GR

    PBI: Position-Based Dynamics Handles Updated Lagrangian Inelasticity

    Authors: Chang Yu, Xuan Li, Lei Lan, Yin Yang, Chenfanfu Jiang

    Abstract: Position-based Dynamics (PBD) and its extension, eXtended Position-based Dynamics (XPBD), have been predominantly applied to compliant constrained dynamics, with their potential in finite strain inelasticity remaining underexplored. XPBD stands in contrast to other meshless methods, such as the Material Point Method (MPM). MPM is based on discretizing the weak form of governing partial differentia… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  7. arXiv:2405.07186  [pdf, other

    stat.ME stat.ML

    Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data

    Authors: Mark van der Laan, Sky Qiu, Lars van der Laan

    Abstract: We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted minimum l… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  8. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  9. arXiv:2403.19272  [pdf, other

    cs.GR

    Mil2: Efficient Cloth Simulation Using Non-distance Barriers and Subspace Reuse

    Authors: Lei Lan, Zixuan Lu, Jingyi Long, Chun Yuan, Xuan Li, Xiaowei He, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: Mil2 pushes the performance of high-resolution cloth simulation, making the simulation interactive (in milliseconds) for models with one million degrees of freedom (DOFs) while keeping every triangle untangled. The guarantee of being penetration-free is inspired by the interior-point method, which converts the inequality constraints to barrier potentials. Nevertheless, we propose a major overhaul… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  10. PyVRP: a high-performance VRP solver package

    Authors: Niels A. Wouda, Leon Lan, Wouter Kool

    Abstract: We introduce PyVRP, a Python package that implements hybrid genetic search in a state-of-the-art vehicle routing problem (VRP) solver. The package is designed for the VRP with time windows (VRPTW), but can be easily extended to support other VRP variants. PyVRP combines the flexibility of Python with the performance of C++, by implementing (only) performance critical parts of the algorithm in C++,… ▽ More

    Submitted 21 March, 2024; v1 submitted 22 November, 2023; originally announced March 2024.

    Comments: Pre-print of accepted paper in INFORMS Journal on Computing. 24 pages, 1 figure, 2 listings

  11. arXiv:2403.13241  [pdf, other

    cs.LG

    Tackling Noisy Labels with Network Parameter Additive Decomposition

    Authors: Jingyi Wang, Xiaobo Xia, Long Lan, Xinghao Wu, Jun Yu, Wenjing Yang, Bo Han, Tongliang Liu

    Abstract: Given data with noisy labels, over-parameterized deep networks suffer overfitting mislabeled data, resulting in poor generalization. The memorization effect of deep networks shows that although the networks have the ability to memorize all noisy data, they would first memorize clean training data, and then gradually memorize mislabeled training data. A simple and effective method that exploits the… ▽ More

    Submitted 28 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE T-PAMI

  12. arXiv:2403.08635  [pdf, other

    cs.LG cs.AI stat.ML

    Human Alignment of Large Language Models through Online Preference Optimisation

    Authors: Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

    Abstract: Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  13. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  14. arXiv:2403.08271  [pdf, other

    cs.CV cs.AI

    Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification

    Authors: Long Lan, Fengxiang Wang, Shuyan Li, Xiangtao Zheng, Zengmao Wang, Xinwang Liu

    Abstract: Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data, limiting the effectiveness of traditional supervised classification methods. Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learn… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  15. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  16. arXiv:2402.07307  [pdf, other

    stat.ML cs.LG stat.ME

    Self-Consistent Conformal Prediction

    Authors: Lars van der Laan, Ahmed M. Alaa

    Abstract: In decision-making guided by machine learning, decision-makers may take identical actions in contexts with identical predicted outcomes. Conformal prediction helps decision-makers quantify uncertainty in point predictions of outcomes, allowing for better risk management for actions. Motivated by this perspective, we introduce \textit{Self-Consistent Conformal Prediction} for regression, which comb… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  17. arXiv:2402.01972  [pdf, other

    stat.ML cs.LG stat.ME

    Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

    Authors: Lars van der Laan, Marco Carone, Alex Luedtke

    Abstract: We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  18. arXiv:2402.01057  [pdf, other

    cs.LG

    Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

    Authors: Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun, Chien Feng, Cho-Jui Hsieh, Chun-Yi Lee

    Abstract: In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible and the ground truth reward function is not available. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory.… ▽ More

    Submitted 7 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Published at ICML 2024. Code: https://github.com/stanl1y/tdil

  19. arXiv:2401.04999  [pdf, ps, other

    astro-ph.HE hep-ph

    Can fallback accretion on magnetar model power the X-ray flares simultaneously observed with gamma-rays of Gamma-ray bursts?

    Authors: Wen-Yuan Yu, Hou-Jun Lü, Xing Yang, Lin Lan, Zhe Yang

    Abstract: The prompt emission, X-ray plateau, and X-ray flares of Gamma-ray bursts (GRB) are thought to be from internal dissipation, and the magnetar as the central engine with propeller fallback accretion is proposed to interpret the observed phenomena of GRBs. In this paper, by systematically searching for X-ray emission observed by Swift/Xry Telescope, we find that seven robust GRBs include both X-ray f… ▽ More

    Submitted 16 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: 13 pages, 5 figures, 1 table, updated references, match with published version

  20. arXiv:2401.04577  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Masked Audio Generation using a Single Non-Autoregressive Transformer

    Authors: Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

    Abstract: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we predict spans of masked tokens obtained from a masking scheduler, while during inference we gradually construct the output sequence using several decoding steps. T… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  21. arXiv:2401.00722  [pdf, other

    cs.CV

    BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation

    Authors: Libin Lan, Pengzhou Cai, Lu Jiang, Xiaojuan Liu, Yongmei Li, Yudong Zhang

    Abstract: Accurate medical image segmentation is essential for clinical quantification, disease diagnosis, treatment planning and many other applications. Both convolution-based and transformer-based u-shaped architectures have made significant success in various medical image segmentation tasks. The former can efficiently learn local information of images while requiring much more image-specific inductive… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, 9 tables code: https://github.com/Caipengzhou/BRAU-Netplusplus

  22. arXiv:2312.15136  [pdf, other

    physics.comp-ph cs.AI cs.CV

    Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning

    Authors: Gabe Guo, Judah Goldfeder, Ling Lan, Aniv Ray, Albert Hanming Yang, Boyuan Chen, Simon JL Billinge, Hod Lipson

    Abstract: The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to o… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  23. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  24. arXiv:2312.07919  [pdf, ps, other

    astro-ph.HE

    Double Neutron Star Mergers: Are Late-time Radio Signals Overestimated?

    Authors: Shao-Ze Li, Yun-Wei Yu, He Gao, Lin Lan

    Abstract: The coalescence of binary neutron stars can yield the expulsion of a fast-moving, quasi-isotropic material, which may induce thermal radiation and give rise to kilonova emission. Moreover, the interaction between the ejected material and the surrounding environment generates an external shock, which can result in a long-lasting radio signal that persists for several decades following the merger. I… ▽ More

    Submitted 28 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 8 pages, 7 figures, accepted for publication in ApJ

  25. arXiv:2312.02715  [pdf, other

    math.OC math.PR

    A queueing-based approach for integrated routing and appointment scheduling

    Authors: René Bekker, Bharti Bharti, Leon Lan, Michel Mandjes

    Abstract: This paper aims to address the integrated routing and appointment scheduling (RAS) problem for a single service provider. The RAS problem is an operational challenge faced by operators that provide services requiring home attendance, such as grocery delivery, home healthcare, or maintenance services. While considering the inherently random nature of service and travel times, the goal is to minimiz… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 25 pages, 10 figures

  26. arXiv:2311.15540  [pdf

    cs.CV cs.MM

    EAFP-Med: An Efficient Adaptive Feature Processing Module Based on Prompts for Medical Image Detection

    Authors: Xiang Li, Long Lan, Husam Lahza, Shaowu Yang, Shuihua Wang, Wenjing Yang, Hengzhu Liu, Yudong Zhang

    Abstract: In the face of rapid advances in medical imaging, cross-domain adaptive medical image detection is challenging due to the differences in lesion representations across various medical imaging technologies. To address this issue, we draw inspiration from large language models to propose EAFP-Med, an efficient adaptive feature processing module based on prompts for medical image detection. EAFP-Med c… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  27. arXiv:2311.15539  [pdf

    stat.CO

    A Novel Human-Based Meta-Heuristic Algorithm: Dragon Boat Optimization

    Authors: Xiang Li, Long Lan, Husam Lahza, Shaowu Yang, Shuihua Wang, Wenjing Yang, Hengzhu Liu, Yudong Zhang

    Abstract: (Aim) Dragon Boat Racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human-based meta-heuristic algorithm called dragon boat optimization (DBO) in this paper. (Method) It models the unique behaviors of each crew member on the dragon boat during the race by introducing social psychology mechanisms (social… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  28. arXiv:2311.15173  [pdf, other

    cond-mat.mtrl-sci

    Stretched Non-negative Matrix Factorization

    Authors: Ran Gu, Yevgeny Rakita, Ling Lan, Zach Thatcher, Gabrielle E. Kamm, Daniel O'Nolan, Brennan Mcbride, Allison Wustrow, James R. Neilson, Karena W. Chapman, Qiang Du, Simon J. L. Billinge

    Abstract: An algorithm is described and tested that carries out a non negative matrix factorization (NMF) ignoring any stretching of the signal along the axis of the independent variable. This extended NMF model is called StretchedNMF. Variability in a set of signals due to this stretching is then ignored in the decomposition. This can be used, for example, to study sets of powder diffraction data collected… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 39 pages, 16 figures

  29. arXiv:2311.09654  [pdf, other

    gr-qc astro-ph.HE

    On the possibility to detect gravitational waves from post-merger super-massive neutron stars with a kilohertz detector

    Authors: Yikang Chen, Bin Liu, Shunke Ai, Lin Lan, He Gao, Yong Yuan, Zong-Hong Zhu

    Abstract: The detection of a secular post-merger gravitational wave (GW) signal in a binary neutron star (BNS) merger serves as strong evidence for the formation of a long-lived post-merger neutron star (NS), which can help constrain the maximum mass of NSs and differentiate NS equation of states. We specifically focus on the detection of GW emissions from rigidly rotating NSs formed through BNS mergers, us… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 10 pages, 6 figures, 4 tables, accepted for publication on MNRAS

  30. arXiv:2311.00895  [pdf, other

    cs.SD cs.CL eess.AS

    In-Context Prompt Editing For Conditional Audio Generation

    Authors: Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional au… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 2 tables

  31. arXiv:2310.11093  [pdf, other

    cs.LG cs.CV

    SODA: Robust Training of Test-Time Data Adaptors

    Authors: Zige Wang, Yonggang Zhang, Zhen Fang, Long Lan, Wenjing Yang, Bo Han

    Abstract: Adapting models deployed to test distributions can mitigate the performance degradation caused by distribution shifts. However, privacy concerns may render model parameters inaccessible. One promising approach involves utilizing zeroth-order optimization (ZOO) to train a data adaptor to adapt the test data to fit the deployed models. Nevertheless, the data adaptor trained with ZOO typically brings… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  32. arXiv:2310.09926  [pdf, other

    cs.AI

    Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data

    Authors: Shiladitya Dutta, Hongbo Wei, Lars van der Laan, Ahmed M. Alaa

    Abstract: Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a h… ▽ More

    Submitted 26 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  33. arXiv:2310.07133  [pdf, other

    astro-ph.HE

    What constraints can one pose on the maximum mass of neutron stars from multi-messenger observations?

    Authors: Shunke Ai, He Gao, Yong Yuan, Bing Zhang, Lin Lan

    Abstract: The maximum mass of neutron stars ($M_{\rm TOV}$) plays a crucial role in understanding their equation of state (EoS). Previous studies have used the measurements for the compactness of massive pulsars and the tidal deformability of neutron stars in binary neutron star (BNS) mergers to constrain the EoS and thus the $M_{\rm TOV}$. The discovery of the most massive pulsar, PSR J0952-0607, with a ma… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 14 pages, 10 figures, accepted for publication on MNRAS

  34. arXiv:2310.00289  [pdf, other

    eess.IV cs.CV

    Pubic Symphysis-Fetal Head Segmentation Using Pure Transformer with Bi-level Routing Attention

    Authors: Pengzhou Cai, Jiang Lu, Yanxin Li, Libin Lan

    Abstract: In this paper, we propose a method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task. The method adopts a U-Net-like pure Transformer architecture with bi-level routing attention and skip connections, which effectively learns local-global semantic information. The proposed BRAU-Net was evaluated on transperineal Ultrasound images dataset from the pubic symphysis-fetal head… ▽ More

    Submitted 7 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  35. arXiv:2309.10795  [pdf, other

    eess.AS

    Exploring Speech Enhancement for Low-resource Speech Synthesis

    Authors: Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

    Abstract: High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive. Applying speech enhancement on Automatic Speech Recognition (ASR) corpus mitigates the issue by augmenting the training data, while how the nonlinear speech distortion brought by speech enhancement models affects TTS… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  36. arXiv:2309.10537  [pdf, other

    eess.AS cs.MM cs.SD

    FoleyGen: Visually-Guided Audio Generation

    Authors: Xinhao Mei, Varun Nagaraja, Gael Le Lan, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

    Abstract: Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  37. arXiv:2309.08804  [pdf, other

    eess.AS cs.SD

    Stack-and-Delay: a new codebook pattern for music generation

    Authors: Gael Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  38. arXiv:2309.08773  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Enhance audio generation controllability through representation similarity regularization

    Authors: Yangyang Shi, Gael Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest Iandola, Yang Liu, Vikas Chandra

    Abstract: This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regula… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages

  39. An iterative sample scenario approach for the dynamic dispatch waves problem

    Authors: Leon Lan, Jasper van Doorn, Niels A. Wouda, Arpan Rijal, Sandjai Bhulai

    Abstract: A challenge in same-day delivery operations is that delivery requests are typically not known beforehand, but are instead revealed dynamically during the day. This uncertainty introduces a trade-off between dispatching vehicles to serve requests as soon as they are revealed to ensure timely delivery, and delaying the dispatching decision to consolidate routing decisions with future, currently unkn… ▽ More

    Submitted 21 March, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

  40. arXiv:2307.13981  [pdf, other

    cs.CV cs.MM eess.IV

    Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

    Authors: Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

    Abstract: Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to proper… ▽ More

    Submitted 3 April, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

  41. arXiv:2307.12544  [pdf, other

    stat.ME math.ST stat.ML

    Adaptive debiased machine learning using data-driven model selection techniques

    Authors: Lars van der Laan, Marco Carone, Alex Luedtke, Mark van der Laan

    Abstract: Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecif… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 32 pages + appendix

  42. arXiv:2307.00997  [pdf, other

    cs.CV cs.AI

    RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

    Authors: Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

    Abstract: The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which explores the potential of… ▽ More

    Submitted 1 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: The code and models will be made publicly at https://github.com/LancasterLi/RefSAM

  43. arXiv:2306.16885  [pdf, other

    physics.bio-ph

    Optimality in superselective surface binding by multivalent DNA nanostars

    Authors: Christine Linne, Eva Heemskerk, Jos Zwanikken, Daniela J. Kraft, Liedewij Laan

    Abstract: Weak multivalent interactions govern a large variety of biological processes like cell-cell adhesion and virus-host interactions. These systems distinguish sharply between surfaces based on receptor density, known as superselectivity. Earlier experimental and theoretical work provided insights into the control of selectivity: Weak interactions and a high number of ligands facilitate superselectivi… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 14 pages, 4 figures

  44. arXiv:2306.10171  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

    Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  45. arXiv:2306.04979  [pdf, other

    cs.LG cs.AI

    CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

    Authors: Nan Yin, Li Shen, Mengzhu Wang, Long Lan, Zeyu Ma, Chong Chen, Xian-Sheng Hua, Xiao Luo

    Abstract: Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient… ▽ More

    Submitted 10 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

  46. arXiv:2306.02585  [pdf, other

    cs.CV

    MotionTrack: Learning Motion Predictor for Multiple Object Tracking

    Authors: Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao

    Abstract: Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predomi… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  47. arXiv:2305.09090  [pdf, other

    stat.ME

    BOSS -- Biomarker Optimal Segmentation System

    Authors: Liuyi Lan, Xuanjin Cheng, Li Xing, Xuekui Zhang

    Abstract: Motivation: Precision medicine is a major trend in the future of medicine. It aims to provide tailored medical treatment and prevention strategies based on an individual's unique characteristics and needs. Biomarker is the primary source of patients' unique features used in precision medicine. We often need to investigate many cutoff values of a continuous biomarker to find the optimal one and tes… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  48. arXiv:2305.06710  [pdf, other

    cs.CV cs.AI

    Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

    Authors: Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, Wenjing Yang

    Abstract: Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoo… ▽ More

    Submitted 3 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted by ACM MM 2023

    Journal ref: ACM MM 2023

  49. arXiv:2304.13424  [pdf, other

    cs.LG cs.AI cs.RO

    Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

    Authors: Li-Cheng Lan, Huan Zhang, Cho-Jui Hsieh

    Abstract: In this paper, we define, evaluate, and improve the ``relay-generalization'' performance of reinforcement learning (RL) agents on the out-of-distribution ``controllable'' states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. For example, a self-driving system should… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: ICRL 2023

  50. arXiv:2304.12567  [pdf, other

    cs.LG cs.AI stat.ML

    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

    Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures