-
Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
Authors:
Biqing Qi,
Kaiyan Zhang,
Kai Tian,
Haoxiang Li,
Zhang-Ren Chen,
Sihang Zeng,
Ermo Hua,
Hu Jinfang,
Bowen Zhou
Abstract:
The rapid growth of biomedical knowledge has outpaced our ability to efficiently extract insights and generate novel hypotheses. Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction and potentially accelerate biomedical discovery. In this paper, we present a comprehensive evaluation of LLMs as biomedical hypothesis generators. We construct a dataset…
▽ More
The rapid growth of biomedical knowledge has outpaced our ability to efficiently extract insights and generate novel hypotheses. Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction and potentially accelerate biomedical discovery. In this paper, we present a comprehensive evaluation of LLMs as biomedical hypothesis generators. We construct a dataset of background-hypothesis pairs from biomedical literature, carefully partitioned into training, seen, and unseen test sets based on publication date to mitigate data contamination. Using this dataset, we assess the hypothesis generation capabilities of top-tier instructed models in zero-shot, few-shot, and fine-tuning settings. To enhance the exploration of uncertainty, a crucial aspect of scientific discovery, we incorporate tool use and multi-agent interactions in our evaluation framework. Furthermore, we propose four novel metrics grounded in extensive literature review to evaluate the quality of generated hypotheses, considering both LLM-based and human assessments. Our experiments yield two key findings: 1) LLMs can generate novel and validated hypotheses, even when tested on literature unseen during training, and 2) Increasing uncertainty through multi-agent interactions and tool use can facilitate diverse candidate generation and improve zero-shot hypothesis generation performance. However, we also observe that the integration of additional knowledge through few-shot learning and tool use may not always lead to performance gains, highlighting the need for careful consideration of the type and scope of external knowledge incorporated. These findings underscore the potential of LLMs as powerful aids in biomedical hypothesis generation and provide valuable insights to guide further research in this area.
△ Less
Submitted 15 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Not all explicit cues help communicate: Pedestrians' perceptions, fixations, and decisions toward automated vehicles with varied appearance
Authors:
Wei Lyu,
Yaqin Cao,
Yi Ding,
Jingyu Li,
Kai Tian,
Hui Zhang
Abstract:
Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with…
▽ More
Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with 61 participants who responded to video stimuli depicting a manipulated vehicle approaching a predefined road-crossing location on an unsignalized, two-way road. The vehicle's kinematic pattern was manipulated into yielding and non-yielding, and its external appearances were varied across five types: with a human driver (as a conventional vehicle), with no driver (as an AV), with text-based identity indications, with roof radar sensors, with dynamic eHMIs adjusted to vehicle kinematics. Participants' perceived clarity, crossing initiation distance (CID), crossing decision time (CDT), and gaze behaviour, during interactions were recorded and reported. The results indicated that AVs' kinematic profiles play a dominant role in pedestrians' road-crossing decisions, supported by their subjective evaluations, CID, CDT, and gaze patterns during interactions. Moreover, the use of clear eHMI, such as dynamic pedestrian icons, reduced pedestrians' visual load, enhanced their perceptual clarity, expedited road-crossing decisions, and thereby improved overall crossing efficiency. However, it was found that both textual identity indications and roof radar sensors have no significant effect on pedestrians' decisions but negatively impact pedestrians' visual attention, as evidenced by heightened fixation counts and prolonged fixation durations, particularly under yielding conditions. Excessive visual and cognitive resource occupation suggests that not all explicit cues facilitate human-vehicle communication.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach
Authors:
Kexin Tian,
Haotian Shi,
Yang Zhou,
Sixu Li
Abstract:
Given the complexity and nonlinearity inherent in traffic dynamics within vehicular platoons, there exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability. Currently, there are two predominant approaches: the physics model-based approach and the Artificial Intelligence (AI)--based approach. Knowing the facts that the physical-based…
▽ More
Given the complexity and nonlinearity inherent in traffic dynamics within vehicular platoons, there exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability. Currently, there are two predominant approaches: the physics model-based approach and the Artificial Intelligence (AI)--based approach. Knowing the facts that the physical-based model usually lacks sufficient modeling accuracy and potential function mismatches and the pure-AI-based method lacks analyzability, this paper innovatively proposes an AI-based Koopman approach to model the unknown nonlinear platoon dynamics harnessing the power of AI and simultaneously maintain physical analyzability, with a particular focus on periods of traffic oscillation. Specifically, this research first employs a deep learning framework to generate the embedding function that lifts the original space into the embedding space. Given the embedding space descriptiveness, the platoon dynamics can be expressed as a linear dynamical system founded by the Koopman theory. Based on that, the routine of linear dynamical system analysis can be conducted on the learned traffic linear dynamics in the embedding space. By that, the physical interpretability and analyzability of model-based methods with the heightened precision inherent in data-driven approaches can be synergized. Comparative experiments have been conducted with existing modeling approaches, which suggests our method's superiority in accuracy. Additionally, a phase plane analysis is performed, further evidencing our approach's effectiveness in replicating the complex dynamic patterns. Moreover, the proposed methodology is proven to feature the capability of analyzing the stability, attesting to the physical analyzability.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization
Authors:
Arun Jambulapati,
Aaron Sidford,
Kevin Tian
Abstract:
We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state…
▽ More
We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state-of-the-art methods our result closes a gap between the best-known query depth and the best-known computational depth of parallel algorithms.
Our method starts with a ball acceleration framework of previous parallel methods, i.e., [CJJJLST20, ACJJS21], which reduce the problem to minimizing a regularized Gaussian convolution of the function constrained to Euclidean balls. By developing and leveraging new stability properties of the Hessian of this induced function, we depart from prior parallel algorithms and reduce these ball-constrained optimization problems to stochastic unconstrained quadratic minimization problems. Although we are unable to prove concentration of the asymmetric matrices that we use to approximate this Hessian, we nevertheless develop an efficient parallel method for solving these quadratics. Interestingly, our algorithms can be improved using fast matrix multiplication and use nearly-linear work if the matrix multiplication exponent is 2.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Exploring Adversarial Robustness of Deep State Space Models
Authors:
Biqing Qi,
Yang Luo,
Junqi Gao,
Pengfei Li,
Kai Tian,
Zhiyuan Ma,
Bowen Zhou
Abstract:
Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs r…
▽ More
Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions
Authors:
Hilal Asi,
Daogao Liu,
Kevin Tian
Abstract:
We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving er…
▽ More
We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $G_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{nε})^{1 - \frac 1 k}$ under $(ε, δ)$-approximate differential privacy, up to a mild $\textup{polylog}(\frac{1}δ)$ factor, where $G_2^2$ and $G_k^k$ are the $2^{\text{nd}}$ and $k^{\text{th}}$ moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023].
We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation
Authors:
Cong Wang,
Kuan Tian,
Jun Zhang,
Yonghang Guan,
Feng Luo,
Fei Shen,
Zhiwei Jiang,
Qing Gu,
Xiao Han,
Wei Yang
Abstract:
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effecti…
▽ More
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as facial pose and reference image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through the progressive training and the conditional dropout operation. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account the facial pose, reference image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, a potential solution is provided for the simultaneous and effective use of conditions of varying strengths.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Ensembling Diffusion Models via Adaptive Feature Aggregation
Authors:
Cong Wang,
Kuan Tian,
Yonghang Guan,
Jun Zhang,
Zhiwei Jiang,
Fei Shen,
Xiao Han,
Qing Gu,
Wei Yang
Abstract:
The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. E…
▽ More
The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. Existing methods primarily adopt parameter merging strategies to produce a new static model. However, they overlook the fact that the divergent denoising capabilities of the models may dynamically change across different states, such as when experiencing different prompts, initial noises, denoising steps, and spatial locations. In this paper, we propose a novel ensembling method, Adaptive Feature Aggregation (AFA), which dynamically adjusts the contributions of multiple models at the feature level according to various states (i.e., prompts, initial noises, denoising steps, and spatial locations), thereby keeping the advantages of multiple diffusion models, while suppressing their disadvantages. Specifically, we design a lightweight Spatial-Aware Block-Wise (SABW) feature aggregator that adaptive aggregates the block-wise intermediate features from multiple U-Net denoisers into a unified one. The core idea lies in dynamically producing an individual attention map for each model's features by comprehensively considering various states. It is worth noting that only SABW is trainable with about 50 million parameters, while other models are frozen. Both the quantitative and qualitative experiments demonstrate the effectiveness of our proposed Adaptive Feature Aggregation method. The code is available at https://github.com/tenvence/afa/.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
Authors:
Ermo Hua,
Biqing Qi,
Kaiyan Zhang,
Yue Yu,
Ning Ding,
Xingtai Lv,
Kai Tian,
Bowen Zhou
Abstract:
Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their…
▽ More
Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their optimization objectives, ignoring the opportunities to bridge their paradigm gap and take the strengths from both. To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of PO with inferior estimation and optimization. PO evaluates the quality of model's entire generated answer, whereas SFT only scores predicted tokens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-Tuning (IFT) to integrate SFT and Preference Optimization into a single process. IFT captures LMs' intuitive sense of the entire answers through a temporal residual connection, but it solely relies on a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show that IFT performs comparably or even superiorly to sequential recipes of SFT and some typical Preference Optimization methods across several tasks, particularly those requires generation, reasoning, and fact-following abilities. An explainable Frozen Lake game further validates the effectiveness of IFT for getting competitive policy.
△ Less
Submitted 28 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Authors:
Keyu Tian,
Yi Jiang,
Zehuan Yuan,
Bingyue Peng,
Liwei Wang
Abstract:
We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". This simple, intuitive methodology allows autoregressive (AR) transformers to learn visual distributions fast and generalize well: V…
▽ More
We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". This simple, intuitive methodology allows autoregressive (AR) transformers to learn visual distributions fast and generalize well: VAR, for the first time, makes GPT-like AR models surpass diffusion transformers in image generation. On ImageNet 256x256 benchmark, VAR significantly improve AR baseline by improving Frechet inception distance (FID) from 18.65 to 1.73, inception score (IS) from 80.4 to 350.2, with around 20x faster inference speed. It is also empirically verified that VAR outperforms the Diffusion Transformer (DiT) in multiple dimensions including image quality, inference speed, data efficiency, and scalability. Scaling up VAR models exhibits clear power-law scaling laws similar to those observed in LLMs, with linear correlation coefficients near -0.998 as solid evidence. VAR further showcases zero-shot generalization ability in downstream tasks including image in-painting, out-painting, and editing. These results suggest VAR has initially emulated the two important properties of LLMs: Scaling Laws and zero-shot task generalization. We have released all models and codes to promote the exploration of AR/VAR models for visual generation and unified learning.
△ Less
Submitted 10 June, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Reusable Architecture Growth for Continual Stereo Matching
Authors:
Chenghao Zhang,
Gaofeng Meng,
Bin Fan,
Kun Tian,
Zhaoxiang Zhang,
Shiming Xiang,
Chunhong Pan
Abstract:
The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to lear…
▽ More
The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications
Authors:
Arun Jambulapati,
Syamantak Kumar,
Jerry Li,
Shourya Pandey,
Ankit Pensia,
Kevin Tian
Abstract:
The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box def…
▽ More
The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing $k$-PCA algorithms, where we model access to the unknown target matrix via a black-box $1$-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to $k$-PCA algorithm design, such black-box methods, which recursively call a $1$-PCA oracle $k$ times, were previously poorly-understood.
Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, $k$-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant $k$. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a $\mathsf{poly}(k)$ factor.
△ Less
Submitted 11 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Testing Calibration in Nearly-Linear Time
Authors:
Lunjia Hu,
Arun Jambulapati,
Kevin Tian,
Chutong Yang
Abstract:
In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we…
▽ More
In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we initiate the algorithmic study of calibration through the lens of property testing. We define the problem of calibration testing from samples where given $n$ draws from a distribution $\mathcal{D}$ on $(predictions, binary outcomes)$, our goal is to distinguish between the case where $\mathcal{D}$ is perfectly calibrated, and the case where $\mathcal{D}$ is $\varepsilon$-far from calibration.
We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph, and design an exact dynamic programming-based solver for it which runs in time $O(n\log^2(n))$, and solves the calibration testing problem information-theoretically optimally in the same time. This improves upon state-of-the-art black-box linear program solvers requiring $Ω(n^ω)$ time, where $ω> 2$ is the exponent of matrix multiplication. We also develop algorithms for tolerant variants of our testing problem improving upon black-box linear program solvers, and give sample complexity lower bounds for alternative calibration measures to the one considered in this work. Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.
△ Less
Submitted 21 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
Authors:
Zhaokun Zhou,
Kaiwei Che,
Wei Fang,
Keyu Tian,
Yuesheng Zhu,
Shuicheng Yan,
Yonghong Tian,
Li Yuan
Abstract:
Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties…
▽ More
Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self-Attention (SSA) and Spiking Transformer (Spikformer). The SSA mechanism eliminates the need for softmax and captures the sparse visual feature employing spike-based Query, Key, and Value. This sparse computation without multiplication makes SSA efficient and energy-saving. Further, we develop a Spiking Convolutional Stem (SCS) with supplementary convolutional layers to enhance the architecture of Spikformer. The Spikformer enhanced with the SCS is referred to as Spikformer V2. To train larger and deeper Spikformer V2, we introduce a pioneering exploration of Self-Supervised Learning (SSL) within the SNN. Specifically, we pre-train Spikformer V2 with masking and reconstruction style inspired by the mainstream self-supervised Transformer, and then finetune the Spikformer V2 on the image classification on ImageNet. Extensive experiments show that Spikformer V2 outperforms other previous surrogate training and ANN2SNN methods. An 8-layer Spikformer V2 achieves an accuracy of 80.38% using 4 time steps, and after SSL, a 172M 16-layer Spikformer V2 reaches an accuracy of 81.10% with just 1 time step. To the best of our knowledge, this is the first time that the SNN achieves 80+% accuracy on ImageNet. The code will be available at Spikformer V2.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Authors:
Kaibin Tian,
Yanhua Cheng,
Yi Liu,
Xinglin Hou,
Quan Chen,
Han Li
Abstract:
In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation com…
▽ More
In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation complexity. Nevertheless, these approaches are not optimal in terms of feature utilization and retrieval efficiency. To address this issue, we adopt multi-granularity visual feature learning, ensuring the model's comprehensiveness in capturing visual content features spanning from abstract to detailed levels during the training phase. To better leverage the multi-granularity features, we devise a two-stage retrieval architecture in the retrieval phase. This solution ingeniously balances the coarse and fine granularity of retrieval content. Moreover, it also strikes a harmonious equilibrium between retrieval effectiveness and efficiency. Specifically, in training phase, we design a parameter-free text-gated interaction block (TIB) for fine-grained video representation learning and embed an extra Pearson Constraint to optimize cross-modal representation learning. In retrieval phase, we use coarse-grained video representations for fast recall of top-k candidates, which are then reranked by fine-grained video representations. Extensive experiments on four benchmarks demonstrate the efficiency and effectiveness. Notably, our method achieves comparable performance with the current state-of-the-art methods while being nearly 50 times faster.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Fine-tuning Language Models for Factuality
Authors:
Katherine Tian,
Eric Mitchell,
Huaxiu Yao,
Christopher D. Manning,
Chelsea Finn
Abstract:
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual…
▽ More
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Large Language Models are Zero Shot Hypothesis Proposers
Authors:
Biqing Qi,
Kaiyan Zhang,
Haoxiang Li,
Kai Tian,
Sihang Zeng,
Zhang-Ren Chen,
Bowen Zhou
Abstract:
Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of s…
▽ More
Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of scientific discovery. However, the potential of LLMs for scientific discovery has not been formally explored. In this paper, we start from investigating whether LLMs can propose scientific hypotheses. To this end, we construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. The dataset is divided into training, seen, and unseen test sets based on the publication date to control visibility. We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs. Additionally, we introduce an LLM-based multi-agent cooperative framework with different role designs and external tools to enhance the capabilities related to generating hypotheses. We also design four metrics through a comprehensive review to evaluate the generated hypotheses for both ChatGPT-based and human evaluations. Through experiments and analyses, we arrive at the following findings: 1) LLMs surprisingly generate untrained yet validated hypotheses from testing literature. 2) Increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilities. These findings strongly support the potential of LLMs as catalysts for new scientific discoveries and guide further exploration.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Social Interaction-Aware Dynamical Models and Decision Making for Autonomous Vehicles
Authors:
Luca Crosato,
Kai Tian,
Hubert P. H Shum,
Edmond S. L. Ho,
Yafei Wang,
Chongfeng Wei
Abstract:
Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current s…
▽ More
Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current state of IAAD research is surveyed in this work. Commencing with an examination of terminology, attention is drawn to challenges and existing models employed for modelling the behaviour of drivers and pedestrians. Next, a comprehensive review is conducted on various techniques proposed for interaction modelling, encompassing cognitive methods, machine learning approaches, and game-theoretic methods. The conclusion is reached through a discussion of potential advantages and risks associated with IAAD, along with the illumination of pivotal research inquiries necessitating future exploration.
△ Less
Submitted 30 October, 2023; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Structured Semidefinite Programming for Recovering Structured Preconditioners
Authors:
Arun Jambulapati,
Jerry Li,
Christopher Musco,
Kirankumar Shiragur,
Aaron Sidford,
Kevin Tian
Abstract:
We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, com…
▽ More
We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $ε$-optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(κ^\star,ε^{-1}))$, where $κ^\star$ is the optimal condition number of the rescaled matrix. We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. Our diagonal preconditioning results improve state-of-the-art runtimes of $Ω(d^{3.5})$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $Ω(d^ω)$ where $ω> 2.3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Effortless Cross-Platform Video Codec: A Codebook-Based Method
Authors:
Kuan Tian,
Yonghang Guan,
Jinxi Xiang,
Jun Zhang,
Xiao Han,
Wei Yang
Abstract:
Under certain circumstances, advanced neural video codecs can surpass the most complex traditional codecs in their rate-distortion (RD) performance. One of the main reasons for the high performance of existing neural video codecs is the use of the entropy model, which can provide more accurate probability distribution estimations for compressing the latents. This also implies the rigorous requirem…
▽ More
Under certain circumstances, advanced neural video codecs can surpass the most complex traditional codecs in their rate-distortion (RD) performance. One of the main reasons for the high performance of existing neural video codecs is the use of the entropy model, which can provide more accurate probability distribution estimations for compressing the latents. This also implies the rigorous requirement that entropy models running on different platforms should use consistent distribution estimations. However, in cross-platform scenarios, entropy models running on different platforms usually yield inconsistent probability distribution estimations due to floating point computation errors that are platform-dependent, which can cause the decoding side to fail in correctly decoding the compressed bitstream sent by the encoding side. In this paper, we propose a cross-platform video compression framework based on codebooks, which avoids autoregressive entropy modeling and achieves video compression by transmitting the index sequence of the codebooks. Moreover, instead of using optical flow for context alignment, we propose to use the conditional cross-attention module to obtain the context between frames. Due to the absence of autoregressive modeling and optical flow alignment, we can design an extremely minimalist framework that can greatly benefit computational efficiency. Importantly, our framework no longer contains any distribution estimation modules for entropy modeling, and thus computations across platforms are not necessarily consistent. Experimental results show that our method can outperform the traditional H.265 (medium) even without any entropy constraints, while achieving the cross-platform property intrinsically.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information
Authors:
Kuan Tian,
Yonghang Guan,
Jinxi Xiang,
Jun Zhang,
Xiao Han,
Wei Yang
Abstract:
The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational com…
▽ More
The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational complexity of the encoding and decoding process poses a challenge in achieving real-time performance. In this paper, we propose a real-time cross-platform neural video codec, which is capable of efficiently decoding of 720P video bitstream from other encoding platforms on a consumer-grade GPU. First, to solve the problem of inconsistency of codec caused by the uncertainty of floating point calculations across platforms, we design a calibration transmitting system to guarantee the consistent quantization of entropy parameters between the encoding and decoding stages. The parameters that may have transboundary quantization between encoding and decoding are identified in the encoding stage, and their coordinates will be delivered by auxiliary transmitted bitstream. By doing so, these inconsistent parameters can be processed properly in the decoding stage. Furthermore, to reduce the bitrate of the auxiliary bitstream, we rectify the distribution of entropy parameters using a piecewise Gaussian constraint. Second, to match the computational limitations on the decoding side for real-time video codec, we design a lightweight model. A series of efficiency techniques enable our model to achieve 25 FPS decoding speed on NVIDIA RTX 2080 GPU. Experimental results demonstrate that our model can achieve real-time decoding of 720P videos while encoding on another platform. Furthermore, the real-time model brings up to a maximum of 24.2\% BD-rate improvement from the perspective of PSNR with the anchor H.265.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Matrix Completion in Almost-Verification Time
Authors:
Jonathan A. Kelner,
Jerry Li,
Allen Liu,
Aaron Sidford,
Kevin Tian
Abstract:
We give a new framework for solving the fundamental problem of low-rank matrix completion, i.e., approximating a rank-$r$ matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) from random observations. First, we provide an algorithm which completes $\mathbf{M}$ on $99\%$ of rows and columns under no further assumptions on $\mathbf{M}$ from $\approx mr$ samples and using $\approx mr^2$…
▽ More
We give a new framework for solving the fundamental problem of low-rank matrix completion, i.e., approximating a rank-$r$ matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) from random observations. First, we provide an algorithm which completes $\mathbf{M}$ on $99\%$ of rows and columns under no further assumptions on $\mathbf{M}$ from $\approx mr$ samples and using $\approx mr^2$ time. Then, assuming the row and column spans of $\mathbf{M}$ satisfy additional regularity properties, we show how to boost this partial completion guarantee to a full matrix completion algorithm by aggregating solutions to regression problems involving the observations.
In the well-studied setting where $\mathbf{M}$ has incoherent row and column spans, our algorithms complete $\mathbf{M}$ to high precision from $mr^{2+o(1)}$ observations in $mr^{3 + o(1)}$ time (omitting logarithmic factors in problem parameters), improving upon the prior state-of-the-art [JN15] which used $\approx mr^5$ samples and $\approx mr^7$ time. Under an assumption on the row and column spans of $\mathbf{M}$ we introduce (which is satisfied by random subspaces with high probability), our sample complexity improves to an almost information-theoretically optimal $mr^{1 + o(1)}$, and our runtime improves to $mr^{2 + o(1)}$. Our runtimes have the appealing property of matching the best known runtime to verify that a rank-$r$ decomposition $\mathbf{U}\mathbf{V}^\top$ agrees with the sampled observations. We also provide robust variants of our algorithms that, given random observations from $\mathbf{M} + \mathbf{N}$ with $\|\mathbf{N}\|_{F} \le Δ$, complete $\mathbf{M}$ to Frobenius norm distance $\approx r^{1.5}Δ$ in the same runtimes as the noiseless setting. Prior noisy matrix completion algorithms [CP10] only guaranteed a distance of $\approx \sqrt{n}Δ$.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Authors:
Kaibin Tian,
Ruixiang Zhao,
Hu Hu,
Runquan Xie,
Fengzong Lian,
Zhanhui Kang,
Xirong Li
Abstract:
For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, w…
▽ More
For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, we propose TeachCLIP with multi-grained teaching to let a CLIP4Clip based student network learn from more advanced yet computationally heavy models such as X-CLIP, TS2-Net and X-Pool . To improve the student's learning capability, we add an Attentional frame-Feature Aggregation (AFA) block, which by design adds no extra storage/computation overhead at the retrieval stage. While attentive weights produced by AFA are commonly used for combining frame-level features, we propose a novel use of the weights to let them imitate frame-text relevance estimated by the teacher network. As such, AFA provides a fine-grained learning (teaching) channel for the student (teacher). Extensive experiments on multiple public datasets justify the viability of the proposed method.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Authors:
Katherine Tian,
Eric Mitchell,
Allan Zhou,
Archit Sharma,
Rafael Rafailov,
Huaxiu Yao,
Chelsea Finn,
Christopher D. Manning
Abstract:
A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are re…
▽ More
A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are remarkably well-calibrated. However, the most widely-used LMs are fine-tuned with reinforcement learning from human feedback (RLHF-LMs), and some studies have suggested that RLHF-LMs produce conditional probabilities that are very poorly calibrated. In light of this perceived weakness, we conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs. For RLHF-LMs such as ChatGPT, GPT-4, and Claude, we find that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities on the TriviaQA, SciQ, and TruthfulQA benchmarks, often reducing the expected calibration error by a relative 50%.
△ Less
Submitted 24 October, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Linear-Sized Sparsifiers via Near-Linear Time Discrepancy Theory
Authors:
Arun Jambulapati,
Victor Reis,
Kevin Tian
Abstract:
Discrepancy theory provides powerful tools for producing higher-quality objects which "beat the union bound" in fundamental settings throughout combinatorics and computer science. However, this quality has often come at the price of more expensive algorithms. We introduce a new framework for bridging this gap, by allowing for the efficient implementation of discrepancy-theoretic primitives. Our fr…
▽ More
Discrepancy theory provides powerful tools for producing higher-quality objects which "beat the union bound" in fundamental settings throughout combinatorics and computer science. However, this quality has often come at the price of more expensive algorithms. We introduce a new framework for bridging this gap, by allowing for the efficient implementation of discrepancy-theoretic primitives. Our framework repeatedly solves regularized optimization problems to low accuracy to approximate the partial coloring method of [Rot17], and simplifies and generalizes recent work of [JSS23] on fast algorithms for Spencer's theorem. In particular, our framework only requires that the discrepancy body of interest has exponentially large Gaussian measure and is expressible as a sublevel set of a symmetric, convex function. We combine this framework with new tools for proving Gaussian measure lower bounds to give improved algorithms for a variety of sparsification and coloring problems.
As a first application, we use our framework to obtain an $\widetilde{O}(m \cdot ε^{-3.5})$ time algorithm for constructing an $ε$-approximate spectral sparsifier of an $m$-edge graph, matching the sparsity of [BSS14] up to constant factors and improving upon the $\widetilde{O}(m \cdot ε^{-6.5})$ runtime of [LeeS17]. We further give a state-of-the-art algorithm for constructing graph ultrasparsifiers and an almost-linear time algorithm for constructing linear-sized degree-preserving sparsifiers via discrepancy theory; in the latter case, such sparsifiers were not known to exist previously. We generalize these results to their analogs in sparsifying isotropic sums of positive semidefinite matrices. Finally, to demonstrate the versatility of our technique, we obtain a nearly-input-sparsity time constructive algorithm for Spencer's theorem (where we recover a recent result of [JSS23]).
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
ChinaOpen: A Dataset for Open-world Multimodal Learning
Authors:
Aozhu Chen,
Ziyuan Wang,
Chengbo Dong,
Kaibin Tian,
Ruixiang Zhao,
Xun Liang,
Zhanhui Kang,
Xirong Li
Abstract:
This paper introduces ChinaOpen, a dataset sourced from Bilibili, a popular Chinese video-sharing website, for open-world multimodal learning. While the state-of-the-art multimodal learning networks have shown impressive performance in automated video annotation and cross-modal video retrieval, their training and evaluation are primarily conducted on YouTube videos with English text. Their effecti…
▽ More
This paper introduces ChinaOpen, a dataset sourced from Bilibili, a popular Chinese video-sharing website, for open-world multimodal learning. While the state-of-the-art multimodal learning networks have shown impressive performance in automated video annotation and cross-modal video retrieval, their training and evaluation are primarily conducted on YouTube videos with English text. Their effectiveness on Chinese data remains to be verified. In order to support multimodal learning in the new context, we construct ChinaOpen-50k, a webly annotated training set of 50k Bilibili videos associated with user-generated titles and tags. Both text-based and content-based data cleaning are performed to remove low-quality videos in advance. For a multi-faceted evaluation, we build ChinaOpen-1k, a manually labeled test set of 1k videos. Each test video is accompanied with a manually checked user title and a manually written caption. Besides, each video is manually tagged to describe objects / actions / scenes shown in the visual content. The original user tags are also manually checked. Moreover, with all the Chinese text translated into English, ChinaOpen-1k is also suited for evaluating models trained on English data. In addition to ChinaOpen, we propose Generative Video-to-text Transformer (GVT) for Chinese video captioning. We conduct an extensive evaluation of the state-of-the-art single-task / multi-task models on the new dataset, resulting in a number of novel findings and insights.
△ Less
Submitted 6 August, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation
Authors:
Jaehwan Jeong,
Katherine Tian,
Andrew Li,
Sina Hartung,
Fardad Behzadi,
Juan Calle,
David Osayande,
Michael Pohlen,
Subathra Adithan,
Pranav Rajpurkar
Abstract:
Automated generation of clinically accurate radiology reports can improve patient care. Previous report generation methods that rely on image captioning models often generate incoherent and incorrect text due to their lack of relevant domain knowledge, while retrieval-based attempts frequently retrieve reports that are irrelevant to the input image. In this work, we propose Contrastive X-Ray REpor…
▽ More
Automated generation of clinically accurate radiology reports can improve patient care. Previous report generation methods that rely on image captioning models often generate incoherent and incorrect text due to their lack of relevant domain knowledge, while retrieval-based attempts frequently retrieve reports that are irrelevant to the input image. In this work, we propose Contrastive X-Ray REport Match (X-REM), a novel retrieval-based radiology report generation module that uses an image-text matching score to measure the similarity of a chest X-ray image and radiology report for report retrieval. We observe that computing the image-text matching score with a language-image model can effectively capture the fine-grained interaction between image and text that is often lost when using cosine similarity. X-REM outperforms multiple prior radiology report generation modules in terms of both natural language and clinical metrics. Human evaluation of the generated reports suggests that X-REM increased the number of zero-error reports and decreased the average error severity compared to the baseline retrieval approach. Our code is available at: https://github.com/rajpurkarlab/X-REM
△ Less
Submitted 2 May, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
A CS guide to the quantum singular value transformation
Authors:
Ewin Tang,
Kevin Tian
Abstract:
We present a simplified exposition of some pieces of [Gilyén, Su, Low, and Wiebe, STOC'19, arXiv:1806.01838], which introduced a quantum singular value transformation (QSVT) framework for applying polynomial functions to block-encoded matrices. The QSVT framework has garnered substantial recent interest from the quantum algorithms community, as it was demonstrated by [GSLW19] to encapsulate many e…
▽ More
We present a simplified exposition of some pieces of [Gilyén, Su, Low, and Wiebe, STOC'19, arXiv:1806.01838], which introduced a quantum singular value transformation (QSVT) framework for applying polynomial functions to block-encoded matrices. The QSVT framework has garnered substantial recent interest from the quantum algorithms community, as it was demonstrated by [GSLW19] to encapsulate many existing algorithms naturally phrased as an application of a matrix function. First, we posit that the lifting of quantum singular processing (QSP) to QSVT is better viewed not through Jordan's lemma (as was suggested by [GSLW19]) but as an application of the cosine-sine decomposition, which can be thought of as a more explicit and stronger version of Jordan's lemma. Second, we demonstrate that the constructions of bounded polynomial approximations given in [GSLW19], which use a variety of ad hoc approaches drawing from Fourier analysis, Chebyshev series, and Taylor series, can be unified under the framework of truncation of Chebyshev series, and indeed, can in large part be matched via a bounded variant of a standard meta-theorem from [Trefethen, 2013]. We hope this work finds use to the community as a companion guide for understanding and applying the powerful framework of [GSLW19].
△ Less
Submitted 29 October, 2023; v1 submitted 28 February, 2023;
originally announced February 2023.
-
Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler
Authors:
Sivakanth Gopi,
Yin Tat Lee,
Daogao Liu,
Ruoqi Shen,
Kevin Tian
Abstract:
The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a non-Euclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the log-Laplace transfo…
▽ More
The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a non-Euclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the log-Laplace transform (LLT) of a density. We prove new mathematical properties (with an algorithmic flavor) of the LLT, such as strong convexity-smoothness duality and an isoperimetric inequality, which are used to prove a mixing time on our proximal sampler matching [LST21] under a warm start. As our main application, we show our warm-started sampler improves the value oracle complexity of differentially private convex optimization in $\ell_p$ and Schatten-$p$ norms for $p \in [1, 2]$ to match the Euclidean setting [GLL22], while retaining state-of-the-art excess risk bounds [GLLST23]. We find our investigation of the LLT to be a promising proof-of-concept of its utility as a tool for designing samplers, and outline directions for future exploration.
△ Less
Submitted 22 February, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
An Unbounded Fully Homomorphic Encryption Scheme Based on Ideal Lattices and Chinese Remainder Theorem
Authors:
Zhiyong Zheng,
Fengxia Liu,
Kun Tian
Abstract:
We propose an unbounded fully homomorphic encryption scheme, i.e. a scheme that allows one to compute on encrypted data for any desired functions without needing to decrypt the data or knowing the decryption keys. This is a rational solution to an old problem proposed by Rivest, Adleman, and Dertouzos \cite{32} in 1978, and to some new problems appeared in Peikert \cite{28} as open questions 10 an…
▽ More
We propose an unbounded fully homomorphic encryption scheme, i.e. a scheme that allows one to compute on encrypted data for any desired functions without needing to decrypt the data or knowing the decryption keys. This is a rational solution to an old problem proposed by Rivest, Adleman, and Dertouzos \cite{32} in 1978, and to some new problems appeared in Peikert \cite{28} as open questions 10 and open questions 11 a few years ago. Our scheme is completely different from the breakthrough work \cite{14,15} of Gentry in 2009. Gentry's bootstrapping technique constructs a fully homomorphic encryption (FHE) scheme from a somewhat homomorphic one that is powerful enough to evaluate its own decryption function. To date, it remains the only known way of obtaining unbounded FHE. Our construction of unbounded FHE scheme is straightforward and noise-free that can handle unbounded homomorphic computation on any refreshed ciphertexts without bootstrapping transformation technique.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling
Authors:
Adam Bouland,
Yosheb Getachew,
Yujia Jin,
Aaron Sidford,
Kevin Tian
Abstract:
We give a quantum algorithm for computing an $ε$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot ε^{-2.5} + ε^{-3})$ and outputs a classical representation of the $ε$-approximate Nash equilibrium. This improves upon the be…
▽ More
We give a quantum algorithm for computing an $ε$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot ε^{-2.5} + ε^{-3})$ and outputs a classical representation of the $ε$-approximate Nash equilibrium. This improves upon the best prior quantum runtime of $\widetilde{O}(\sqrt{m + n} \cdot ε^{-3})$ obtained by [vAG19] and the classic $\widetilde{O}((m + n) \cdot ε^{-2})$ runtime due to [GK95] whenever $ε= Ω((m +n)^{-1})$. We obtain this result by designing new quantum data structures for efficiently sampling from a slowly-changing Gibbs distribution.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Authors:
Keyu Tian,
Yi Jiang,
Qishuai Diao,
Chen Lin,
Liwei Wang,
Zehuan Yuan
Abstract:
We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets): (i) convolution operation cannot handle irregular, random-masked input images; (ii) the single-scale nature of BERT pre-training is inconsistent with convnet's hierarchical structure. For (i), we treat unmasked pixels as sparse voxels o…
▽ More
We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets): (i) convolution operation cannot handle irregular, random-masked input images; (ii) the single-scale nature of BERT pre-training is inconsistent with convnet's hierarchical structure. For (i), we treat unmasked pixels as sparse voxels of 3D point clouds and use sparse convolution to encode. This is the first use of sparse convolution for 2D masked modeling. For (ii), we develop a hierarchical decoder to reconstruct images from multi-scale encoded features. Our method called Sparse masKed modeling (SparK) is general: it can be used directly on any convolutional model without backbone modifications. We validate it on both classical (ResNet) and modern (ConvNeXt) models: on three downstream tasks, it surpasses both state-of-the-art contrastive learning and transformer-based masked modeling by similarly large margins (around +1.0%). Improvements on object detection and instance segmentation are more substantial (up to +3.5%), verifying the strong transferability of features learned. We also find its favorable scaling behavior by observing more gains on larger models. All this evidence reveals a promising future of generative pre-training on convnets. Codes and models are released at https://github.com/keyu-tian/SparK.
△ Less
Submitted 10 January, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
ReSQueing Parallel and Private Stochastic Convex Optimization
Authors:
Yair Carmon,
Arun Jambulapati,
Yujia Jin,
Yin Tat Lee,
Daogao Liu,
Aaron Sidford,
Kevin Tian
Abstract:
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj…
▽ More
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $ε_{\text{opt}}$ with $d^{1/3}ε_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}ε_{\text{opt}}^{-2/3} + ε_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $ε_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if $n \gtrsim d ε_{\text{dp}}^{-2}$, $(ε_{\text{dp}}, δ)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gtrsim d^2 ε_{\text{dp}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime.
△ Less
Submitted 27 October, 2023; v1 submitted 1 January, 2023;
originally announced January 2023.
-
TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph Database
Authors:
Kai Huang,
Haibo Hu,
Qingqing Ye,
Kai Tian,
Bolong Zheng,
Xiaofang Zhou
Abstract:
With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (e.g., chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar…
▽ More
With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (e.g., chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar or homogenous topologies, which is undesirable in many graph applications. To address this limitation, we propose the Top-k Edge-Diversified Patterns Discovery problem to retrieve a set of subgraphs that cover the maximum number of edges in a database. To efficiently process such query, we present a generic and extensible framework called Ted which achieves a guaranteed approximation ratio to the optimal result. Two optimization strategies are further developed to improve the performance. Experimental studies on real-world datasets demonstrate the superiority of Ted to traditional techniques.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding
Authors:
Xirong Li,
Aozhu Chen,
Ziyue Wang,
Fan Hu,
Kaibin Tian,
Xinru Chen,
Chengbo Dong
Abstract:
We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text…
▽ More
We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text and video ends to exploit diverse (off-the-shelf) features. Compared to multi-head self attention, LAFF is much more compact yet more effective. Its attentional weights can also be used for selecting fewer features, with the retrieval performance mostly preserved. BNL trains a negation-aware video retrieval model by minimizing a bidirectionally constrained loss per triplet, where a triplet consists of a given training video, its original description and a partially negated description. For video feature extraction, we use pre-trained CLIP, BLIP, BEiT, ResNeXt-101 and irCSN. As for text features, we adopt bag-of-words, word2vec, CLIP and BLIP. Our training data consists of MSR-VTT, TGIF and VATEX that were used in our previous participation. In addition, we automatically caption the V3C1 collection for pre-training. The 2022 edition of the TRECVID benchmark has again been a fruitful participation for the RUCMM team. Our best run, with an infAP of 0.262, is ranked at the second place teamwise.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Doubly robust nearest neighbors in factor models
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for esti…
▽ More
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.
△ Less
Submitted 29 January, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
A Scalable Graph Neural Network Decoder for Short Block Codes
Authors:
Kou Tian,
Chentao Yue,
Changyang She,
Yonghui Li,
Branka Vucetic
Abstract:
In this work, we propose a novel decoding algorithm for short block codes based on an edge-weighted graph neural network (EW-GNN). The EW-GNN decoder operates on the Tanner graph with an iterative message-passing structure, which algorithmically aligns with the conventional belief propagation (BP) decoding method. In each iteration, the "weight" on the message passed along each edge is obtained fr…
▽ More
In this work, we propose a novel decoding algorithm for short block codes based on an edge-weighted graph neural network (EW-GNN). The EW-GNN decoder operates on the Tanner graph with an iterative message-passing structure, which algorithmically aligns with the conventional belief propagation (BP) decoding method. In each iteration, the "weight" on the message passed along each edge is obtained from a fully connected neural network that has the reliability information from nodes/edges as its input. Compared to existing deep-learning-based decoding schemes, the EW-GNN decoder is characterised by its scalability, meaning that 1) the number of trainable parameters is independent of the codeword length, and 2) an EW-GNN decoder trained with shorter/simple codes can be directly used for longer/sophisticated codes of different code rates. Furthermore, simulation results show that the EW-GNN decoder outperforms the BP and deep-learning-based BP methods from the literature in terms of the decoding error rate.
△ Less
Submitted 13 November, 2022;
originally announced November 2022.
-
Private Convex Optimization in General Norms
Authors:
Sivakanth Gopi,
Yin Tat Lee,
Daogao Liu,
Ruoqi Shen,
Kevin Tian
Abstract:
We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+μr))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\|\cdot\|$, generalizing a recent work o…
▽ More
We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+μr))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\|\cdot\|$, generalizing a recent work of [Gopi, Lee, Liu '22] to non-Euclidean settings. We show that this mechanism satisfies Gaussian differential privacy and solves both DP-ERM (empirical risk minimization) and DP-SCO (stochastic convex optimization) by using localization tools from convex geometry. Our framework is the first to apply to private convex optimization in general normed spaces and directly recovers non-private SCO rates achieved by mirror descent as the privacy parameter $ε\to \infty$. As applications, for Lipschitz optimization in $\ell_p$ norms for all $p \in (1, 2)$, we obtain the first optimal privacy-utility tradeoffs; for $p = 1$, we improve tradeoffs obtained by the recent works [Asi, Feldman, Koren, Talwar '21, Bassily, Guzman, Nandi '21] by at least a logarithmic factor. Our $\ell_p$ norm and Schatten-$p$ norm optimization frameworks are complemented with polynomial-time samplers whose query complexity we explicitly bound.
△ Less
Submitted 10 November, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
-
MVP: Robust Multi-View Practice for Driving Action Localization
Authors:
Jingjie Shang,
Kunchang Li,
Kaibin Tian,
Haisheng Su,
Yangguang Li
Abstract:
Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the…
▽ More
Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Regularized Box-Simplex Games and Dynamic Decremental Bipartite Matching
Authors:
Arun Jambulapati,
Yujia Jin,
Aaron Sidford,
Kevin Tian
Abstract:
Box-simplex games are a family of bilinear minimax objectives which encapsulate graph-structured problems such as maximum flow [She17], optimal transport [JST19], and bipartite matching [AJJ+22]. We develop efficient near-linear time, high-accuracy solvers for regularized variants of these games. Beyond the immediate applications of such solvers for computing Sinkhorn distances, a prominent tool i…
▽ More
Box-simplex games are a family of bilinear minimax objectives which encapsulate graph-structured problems such as maximum flow [She17], optimal transport [JST19], and bipartite matching [AJJ+22]. We develop efficient near-linear time, high-accuracy solvers for regularized variants of these games. Beyond the immediate applications of such solvers for computing Sinkhorn distances, a prominent tool in machine learning, we show that these solvers can be used to obtain improved running times for maintaining a (fractional) $ε$-approximate maximum matching in a dynamic decremental bipartite graph against an adaptive adversary. We give a generic framework which reduces this dynamic matching problem to solving regularized graph-structured optimization problems to high accuracy. Through our reduction framework, our regularized box-simplex game solver implies a new algorithm for dynamic decremental bipartite matching in total time $\tilde{O}(m \cdot ε^{-3})$, from an initial graph with $m$ edges and $n$ nodes. We further show how to use recent advances in flow optimization [CKL+22] to improve our runtime to $m^{1 + o(1)} \cdot ε^{-2}$, thereby demonstrating the versatility of our reduction-based approach. These results improve upon the previous best runtime of $\tilde{O}(m \cdot ε^{-4})$ [BGS20] and illustrate the utility of using regularized optimization problem solvers for designing dynamic algorithms.
△ Less
Submitted 13 June, 2022; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Co-Teaching for Unsupervised Domain Adaptation and Expansion
Authors:
Kaibin Tian,
Qijie Wei,
Xirong Li
Abstract:
Unsupervised Domain Adaptation (UDA) essentially trades a model's performance on a source domain for improving its performance on a target domain. To resolve the issue, Unsupervised Domain Expansion (UDE) has been proposed recently. UDE tries to adapt the model for the target domain as UDA does, and in the meantime maintains its source-domain performance. In both UDA and UDE settings, a model tail…
▽ More
Unsupervised Domain Adaptation (UDA) essentially trades a model's performance on a source domain for improving its performance on a target domain. To resolve the issue, Unsupervised Domain Expansion (UDE) has been proposed recently. UDE tries to adapt the model for the target domain as UDA does, and in the meantime maintains its source-domain performance. In both UDA and UDE settings, a model tailored to a given domain, let it be the source or the target domain, is assumed to well handle samples from the given domain. We question the assumption by reporting the existence of cross-domain visual ambiguity: Given the lack of a crystally clear boundary between the two domains, samples from one domain can be visually close to the other domain. Such sorts of samples are typically in minority in their host domain, so they tend to be overlooked by the domain-specific model, but can be better handled by a model from the other domain. We exploit this finding, and accordingly propose Co-Teaching (CT). The CT method is instantiated with knowledge distillation based CT (kdCT) plus mixup based CT (miCT). Specifically, kdCT transfers knowledge from a leading-teacher network and an assistant-teacher network to a student network, so the cross-domain ambiguity will be better handled by the student. Meanwhile, miCT further enhances the generalization ability of the student. Extensive experiments on two image classification datasets and two driving-scene segmentation datasets justify the viability of CT for UDA and UDE.
△ Less
Submitted 13 September, 2023; v1 submitted 3 April, 2022;
originally announced April 2022.
-
Quadratic Speedups in Parallel Sampling from Determinantal Distributions
Authors:
Nima Anari,
Callum Burgess,
Kevin Tian,
Thuy-Duong Vuong
Abstract:
We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel…
▽ More
We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel counting does not automatically translate to parallel sampling, as classic reductions between the two are inherently sequential. We show that a nearly quadratic parallel speedup over sequential sampling can be achieved for all the aforementioned distributions. If the distribution is supported on subsets of size $k$ of a ground set, we show how to approximately produce a sample in $\widetilde{O}(k^{\frac{1}{2} + c})$ time with polynomially many processors for any constant $c>0$. In the two special cases of symmetric determinantal point processes and planar perfect matchings, our bound improves to $\widetilde{O}(\sqrt k)$ and we show how to sample exactly in these cases.
As our main technical contribution, we fully characterize the limits of batching for the steps of sampling-to-counting reductions. We observe that only $O(1)$ steps can be batched together if we strive for exact sampling, even in the case of nonsymmetric determinantal point processes. However, we show that for approximate sampling, $\widetildeΩ(k^{\frac{1}{2}-c})$ steps can be batched together, for any entropically independent distribution, which includes all mentioned classes of determinantal point processes. Entropic independence and related notions have been the source of breakthroughs in Markov chain analysis in recent years, so we expect our framework to prove useful for distributions beyond those studied in this work.
△ Less
Submitted 28 April, 2023; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Semi-Random Sparse Recovery in Nearly-Linear Time
Authors:
Jonathan A. Kelner,
Jerry Li,
Allen Liu,
Aaron Sidford,
Kevin Tian
Abstract:
Sparse recovery is one of the most fundamental and well-studied inverse problems. Standard statistical formulations of the problem are provably solved by general convex programming techniques and more practical, fast (nearly-linear time) iterative methods. However, these latter "fast algorithms" have previously been observed to be brittle in various real-world settings.
We investigate the brittl…
▽ More
Sparse recovery is one of the most fundamental and well-studied inverse problems. Standard statistical formulations of the problem are provably solved by general convex programming techniques and more practical, fast (nearly-linear time) iterative methods. However, these latter "fast algorithms" have previously been observed to be brittle in various real-world settings.
We investigate the brittleness of fast sparse recovery algorithms to generative model changes through the lens of studying their robustness to a "helpful" semi-random adversary, a framework which tests whether an algorithm overfits to input assumptions. We consider the following basic model: let $\mathbf{A} \in \mathbb{R}^{n \times d}$ be a measurement matrix which contains an unknown subset of rows $\mathbf{G} \in \mathbb{R}^{m \times d}$ which are bounded and satisfy the restricted isometry property (RIP), but is otherwise arbitrary. Letting $x^\star \in \mathbb{R}^d$ be $s$-sparse, and given either exact measurements $b = \mathbf{A} x^\star$ or noisy measurements $b = \mathbf{A} x^\star + ξ$, we design algorithms recovering $x^\star$ information-theoretically optimally in nearly-linear time. We extend our algorithm to hold for weaker generative models relaxing our planted RIP assumption to a natural weighted variant, and show that our method's guarantees naturally interpolate the quality of the measurement matrix to, in some parameter regimes, run in sublinear time.
Our approach differs from prior fast iterative methods with provable guarantees under semi-random generative models: natural conditions on a submatrix which make sparse recovery tractable are NP-hard to verify. We design a new iterative method tailored to the geometry of sparse recovery which is provably robust to our semi-random model. We hope our approach opens the door to new robust, efficient algorithms for natural statistical inverse problems.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Counterfactual inference for sequential experiments
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumpt…
▽ More
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
△ Less
Submitted 16 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods
Authors:
Yujia Jin,
Aaron Sidford,
Kevin Tian
Abstract:
We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by [CST21].
(1) Separable minimax optimization. We study separable minimax optimization problems $\min_x \max_y f(x) - g(y) + h(x, y)$, wher…
▽ More
We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by [CST21].
(1) Separable minimax optimization. We study separable minimax optimization problems $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, μ^x)$, $(L^y, μ^y)$, and $h$ is convex-concave with a $(Λ^{xx}, Λ^{xy}, Λ^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm with gradient query complexity $\tilde{O}\left(\sqrt{\frac{L^{x}}{μ^{x}}} + \sqrt{\frac{L^{y}}{μ^{y}}} + \frac{Λ^{xx}}{μ^{x}} + \frac{Λ^{xy}}{\sqrt{μ^{x}μ^{y}}} + \frac{Λ^{yy}}{μ^{y}}\right)$. Notably, for convex-concave minimax problems with bilinear coupling (e.g.\ quadratics), where $Λ^{xx} = Λ^{yy} = 0$, our rate matches a lower bound of [ZHZ19].
(2) Finite sum optimization. We study finite sum optimization problems $\min_x \frac{1}{n}\sum_{i\in[n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $μ$-strongly convex. We provide an algorithm with gradient query complexity $\tilde{O}\left(n + \sum_{i\in[n]} \sqrt{\frac{L_i}{nμ}} \right)$. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG [LMH15, FGKS15] and Katyusha [All17] by up to a $\sqrt{n}$ factor.
(3) Minimax finite sums. We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
A Generalization of Cyclic Code and Applications to Public Key Cryptosystems
Authors:
Zhiyong Zheng,
Wenlin Huang,
Jie Xu,
Kun Tian
Abstract:
In this paper, we define and discuss φ-cyclic code, which may be regarded as a general form of the ordinary cyclic code. As applications, we explain how to extend two public key encryption schemes, one is McEliece and Niederriter's cryptosystem of which is based on error-correcting code theory. Another one is NTRU public key cryptosystem of which is based on polynomial ring theory. The main purpos…
▽ More
In this paper, we define and discuss φ-cyclic code, which may be regarded as a general form of the ordinary cyclic code. As applications, we explain how to extend two public key encryption schemes, one is McEliece and Niederriter's cryptosystem of which is based on error-correcting code theory. Another one is NTRU public key cryptosystem of which is based on polynomial ring theory. The main purpose of this paper is to give a more general construction of NTRU based on ideal matrices and q-ary lattice theory.
△ Less
Submitted 28 December, 2021;
originally announced December 2021.
-
Cyclic Lattices, Ideal Lattices and Bounds for the Smoothing Parameter
Authors:
Zhiyong Zheng,
Fengxia Liu,
Yunfan Lu,
Kun Tian
Abstract:
Cyclic lattices and ideal lattices were introduced by Micciancio in \cite{D2}, Lyubashevsky and Micciancio in \cite{L1} respectively, which play an efficient role in Ajtai's construction of a collision resistant Hash function (see \cite{M1} and \cite{M2}) and in Gentry's construction of fully homomorphic encryption (see \cite{G}). Let $R=Z[x]/\langle φ(x)\rangle$ be a quotient ring of the integer…
▽ More
Cyclic lattices and ideal lattices were introduced by Micciancio in \cite{D2}, Lyubashevsky and Micciancio in \cite{L1} respectively, which play an efficient role in Ajtai's construction of a collision resistant Hash function (see \cite{M1} and \cite{M2}) and in Gentry's construction of fully homomorphic encryption (see \cite{G}). Let $R=Z[x]/\langle φ(x)\rangle$ be a quotient ring of the integer coefficients polynomials ring, Lyubashevsky and Micciancio regarded an ideal lattice as the correspondence of an ideal of $R$, but they neither explain how to extend this definition to whole Euclidean space $\mathbb{R}^n$, nor exhibit the relationship of cyclic lattices and ideal lattices.
In this paper, we regard the cyclic lattices and ideal lattices as the correspondences of finitely generated $R$-modules, so that we may show that ideal lattices are actually a special subclass of cyclic lattices, namely, cyclic integer lattices. In fact, there is a one to one correspondence between cyclic lattices in $\mathbb{R}^n$ and finitely generated $R$-modules (see Theorem \ref{th4} below). On the other hand, since $R$ is a Noether ring, each ideal of $R$ is a finitely generated $R$-module, so it is natural and reasonable to regard ideal lattices as a special subclass of cyclic lattices (see corollary \ref{co3.4} below). It is worth noting that we use more general rotation matrix here, so our definition and results on cyclic lattices and ideal lattices are more general forms. As application, we provide cyclic lattice with an explicit and countable upper bound for the smoothing parameter (see Theorem \ref{th5} below). It is an open problem that is the shortest vector problem on cyclic lattice NP-hard? (see \cite{D2}). Our results may be viewed as a substantial progress in this direction.
△ Less
Submitted 25 December, 2021;
originally announced December 2021.
-
Living-Off-The-Land Command Detection Using Active Learning
Authors:
Talha Ongun,
Jack W. Stokes,
Jonathan Bar Or,
Ke Tian,
Farid Tajaddodianfar,
Joshua Neil,
Christian Seifert,
Alina Oprea,
John C. Platt
Abstract:
In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distrib…
▽ More
In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called "Living-Off-The-Land". Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them. We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 0.96 at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Robust Regression Revisited: Acceleration and Improved Estimation Rates
Authors:
Arun Jambulapati,
Jerry Li,
Tselil Schramm,
Kevin Tian
Abstract:
We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of…
▽ More
We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlier-removal method calling a stationary point finder.
We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature.
For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Daniel Kongsgaard,
Jerry Li,
Kevin Tian
Abstract:
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $\mathbb{R}^d$ and a parameter $0< α<\frac 1 2$ such that an $α$-fraction of the points in $T$ are i.i.d. samples from a well-behaved distribution $\mathcal{D}$ and the remaining $(1-α)$-fraction are arbitrary. The goal is to output…
▽ More
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $\mathbb{R}^d$ and a parameter $0< α<\frac 1 2$ such that an $α$-fraction of the points in $T$ are i.i.d. samples from a well-behaved distribution $\mathcal{D}$ and the remaining $(1-α)$-fraction are arbitrary. The goal is to output a small list of vectors, at least one of which is close to the mean of $\mathcal{D}$. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees, with running time $O(n^{1 + ε_0} d)$, for any fixed $ε_0 > 0$. All prior algorithms for this problem had additional polynomial factors in $\frac 1 α$. We leverage this result, together with additional techniques, to obtain the first almost-linear time algorithms for clustering mixtures of $k$ separated well-behaved distributions, nearly-matching the statistical guarantees of spectral methods. Prior clustering algorithms inherently relied on an application of $k$-PCA, thereby incurring runtimes of $Ω(n d k)$. This marks the first runtime improvement for this basic statistical problem in nearly two decades.
The starting point of our approach is a novel and simpler near-linear time robust mean estimation algorithm in the $α\to 1$ regime, based on a one-shot matrix multiplicative weights-inspired potential decrease. We crucially leverage this new algorithmic framework in the context of the iterative multi-filtering technique of Diakonikolas et al. '18, '20, providing a method to simultaneously cluster and downsample points using one-dimensional projections -- thus, bypassing the $k$-PCA subroutines required by prior algorithms.
△ Less
Submitted 12 November, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.