Skip to main content

Showing 1–50 of 86 results for author: Tian, K

  1. arXiv:2407.08940  [pdf, other

    cs.CL

    Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

    Authors: Biqing Qi, Kaiyan Zhang, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, Bowen Zhou

    Abstract: The rapid growth of biomedical knowledge has outpaced our ability to efficiently extract insights and generate novel hypotheses. Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction and potentially accelerate biomedical discovery. In this paper, we present a comprehensive evaluation of LLMs as biomedical hypothesis generators. We construct a dataset… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to COLM 2024. This is an extended version of the paper at arXiv:2311.05965

  2. arXiv:2407.06505  [pdf

    cs.HC

    Not all explicit cues help communicate: Pedestrians' perceptions, fixations, and decisions toward automated vehicles with varied appearance

    Authors: Wei Lyu, Yaqin Cao, Yi Ding, Jingyu Li, Kai Tian, Hui Zhang

    Abstract: Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 37 pages, 13 figures, 4 tables

  3. arXiv:2406.14696  [pdf, other

    eess.SY cs.AI

    Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach

    Authors: Kexin Tian, Haotian Shi, Yang Zhou, Sixu Li

    Abstract: Given the complexity and nonlinearity inherent in traffic dynamics within vehicular platoons, there exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability. Currently, there are two predominant approaches: the physics model-based approach and the Artificial Intelligence (AI)--based approach. Knowing the facts that the physical-based… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.07373  [pdf, other

    math.OC cs.DS cs.LG

    Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization

    Authors: Arun Jambulapati, Aaron Sidford, Kevin Tian

    Abstract: We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.05532  [pdf, other

    cs.LG cs.AI

    Exploring Adversarial Robustness of Deep State Space Models

    Authors: Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou

    Abstract: Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs r… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  6. arXiv:2406.02789  [pdf, other

    cs.DS cs.CR cs.LG stat.ML

    Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions

    Authors: Hilal Asi, Daogao Liu, Kevin Tian

    Abstract: We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving er… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  7. arXiv:2406.02511  [pdf, other

    cs.CV cs.AI

    V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

    Authors: Cong Wang, Kuan Tian, Jun Zhang, Yonghang Guan, Feng Luo, Fei Shen, Zhiwei Jiang, Qing Gu, Xiao Han, Wei Yang

    Abstract: In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effecti… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  8. arXiv:2405.17082  [pdf, other

    cs.CV

    Ensembling Diffusion Models via Adaptive Feature Aggregation

    Authors: Cong Wang, Kuan Tian, Yonghang Guan, Jun Zhang, Zhiwei Jiang, Fei Shen, Xiao Han, Qing Gu, Wei Yang

    Abstract: The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. E… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  9. arXiv:2405.11870  [pdf, other

    cs.CL cs.AI

    Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

    Authors: Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, Bowen Zhou

    Abstract: Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  10. arXiv:2404.02905  [pdf, other

    cs.CV cs.AI

    Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

    Authors: Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

    Abstract: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". This simple, intuitive methodology allows autoregressive (AR) transformers to learn visual distributions fast and generalize well: V… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Demo website: https://var.vision/

  11. arXiv:2404.00360  [pdf, other

    cs.CV

    Reusable Architecture Growth for Continual Stereo Matching

    Authors: Chenghao Zhang, Gaofeng Meng, Bin Fan, Kun Tian, Zhaoxiang Zhang, Shiming Xiang, Chunhong Pan

    Abstract: The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to lear… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Extended version of CVPR 2022 paper "Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture" - Accepted to TPAMI in 2024

  12. arXiv:2403.03905  [pdf, other

    math.NA cs.DS cs.LG stat.ML

    Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications

    Authors: Arun Jambulapati, Syamantak Kumar, Jerry Li, Shourya Pandey, Ankit Pensia, Kevin Tian

    Abstract: The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box def… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  13. arXiv:2402.13187  [pdf, other

    cs.LG cs.DS stat.CO stat.ML

    Testing Calibration in Nearly-Linear Time

    Authors: Lunjia Hu, Arun Jambulapati, Kevin Tian, Chutong Yang

    Abstract: In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  14. arXiv:2401.02020  [pdf, other

    cs.NE cs.CV cs.LG

    Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket

    Authors: Zhaokun Zhou, Kaiwei Che, Wei Fang, Keyu Tian, Yuesheng Zhu, Shuicheng Yan, Yonghong Tian, Li Yuan

    Abstract: Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  15. arXiv:2401.00701  [pdf, other

    cs.CV

    Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

    Authors: Kaibin Tian, Yanhua Cheng, Yi Liu, Xinglin Hou, Quan Chen, Han Li

    Abstract: In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation com… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  16. arXiv:2311.08401  [pdf, other

    cs.CL cs.AI cs.LG

    Fine-tuning Language Models for Factuality

    Authors: Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelsea Finn

    Abstract: The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  17. arXiv:2311.05965  [pdf, other

    cs.CL

    Large Language Models are Zero Shot Hypothesis Proposers

    Authors: Biqing Qi, Kaiyan Zhang, Haoxiang Li, Kai Tian, Sihang Zeng, Zhang-Ren Chen, Bowen Zhou

    Abstract: Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of s… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: Instruction Workshop @ NeurIPS 2023

  18. arXiv:2310.18891  [pdf, other

    cs.HC cs.CY cs.RO eess.SY

    Social Interaction-Aware Dynamical Models and Decision Making for Autonomous Vehicles

    Authors: Luca Crosato, Kai Tian, Hubert P. H Shum, Edmond S. L. Ho, Yafei Wang, Chongfeng Wei

    Abstract: Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current s… ▽ More

    Submitted 30 October, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

  19. arXiv:2310.18265  [pdf, other

    cs.DS cs.LG math.OC stat.ML

    Structured Semidefinite Programming for Recovering Structured Preconditioners

    Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian

    Abstract: We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, com… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Merge of arXiv:1812.06295 and arXiv:2008.01722

  20. arXiv:2310.10292  [pdf, other

    cs.CV cs.MM

    Effortless Cross-Platform Video Codec: A Codebook-Based Method

    Authors: Kuan Tian, Yonghang Guan, Jinxi Xiang, Jun Zhang, Xiao Han, Wei Yang

    Abstract: Under certain circumstances, advanced neural video codecs can surpass the most complex traditional codecs in their rate-distortion (RD) performance. One of the main reasons for the high performance of existing neural video codecs is the use of the entropy model, which can provide more accurate probability distribution estimations for compressing the latents. This also implies the rigorous requirem… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 15 pages, 11 figures

  21. Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information

    Authors: Kuan Tian, Yonghang Guan, Jinxi Xiang, Jun Zhang, Xiao Han, Wei Yang

    Abstract: The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational com… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 14 pages

  22. arXiv:2308.03661  [pdf, other

    cs.LG cs.DS math.OC math.ST

    Matrix Completion in Almost-Verification Time

    Authors: Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian

    Abstract: We give a new framework for solving the fundamental problem of low-rank matrix completion, i.e., approximating a rank-$r$ matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) from random observations. First, we provide an algorithm which completes $\mathbf{M}$ on $99\%$ of rows and columns under no further assumptions on $\mathbf{M}$ from $\approx mr$ samples and using $\approx mr^2$… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: FOCS 2023

  23. arXiv:2308.01217  [pdf, other

    cs.CV

    TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval

    Authors: Kaibin Tian, Ruixiang Zhao, Hu Hu, Runquan Xie, Fengzong Lian, Zhanhui Kang, Xirong Li

    Abstract: For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, w… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  24. arXiv:2305.14975  [pdf, other

    cs.CL

    Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

    Authors: Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

    Abstract: A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are re… ▽ More

    Submitted 24 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Camera Ready

  25. arXiv:2305.08434  [pdf, other

    cs.DS

    Linear-Sized Sparsifiers via Near-Linear Time Discrepancy Theory

    Authors: Arun Jambulapati, Victor Reis, Kevin Tian

    Abstract: Discrepancy theory provides powerful tools for producing higher-quality objects which "beat the union bound" in fundamental settings throughout combinatorics and computer science. However, this quality has often come at the price of more expensive algorithms. We introduce a new framework for bridging this gap, by allowing for the efficient implementation of discrepancy-theoretic primitives. Our fr… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  26. ChinaOpen: A Dataset for Open-world Multimodal Learning

    Authors: Aozhu Chen, Ziyuan Wang, Chengbo Dong, Kaibin Tian, Ruixiang Zhao, Xun Liang, Zhanhui Kang, Xirong Li

    Abstract: This paper introduces ChinaOpen, a dataset sourced from Bilibili, a popular Chinese video-sharing website, for open-world multimodal learning. While the state-of-the-art multimodal learning networks have shown impressive performance in automated video annotation and cross-modal video retrieval, their training and evaluation are primarily conducted on YouTube videos with English text. Their effecti… ▽ More

    Submitted 6 August, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted by ACMMM 2023

  27. arXiv:2303.17579  [pdf, other

    cs.CL cs.AI cs.CV

    Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation

    Authors: Jaehwan Jeong, Katherine Tian, Andrew Li, Sina Hartung, Fardad Behzadi, Juan Calle, David Osayande, Michael Pohlen, Subathra Adithan, Pranav Rajpurkar

    Abstract: Automated generation of clinically accurate radiology reports can improve patient care. Previous report generation methods that rely on image captioning models often generate incoherent and incorrect text due to their lack of relevant domain knowledge, while retrieval-based attempts frequently retrieve reports that are irrelevant to the input image. In this work, we propose Contrastive X-Ray REpor… ▽ More

    Submitted 2 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Journal ref: Medical Imaging with Deep Learning 2023

  28. arXiv:2302.14324  [pdf, other

    quant-ph cs.DS

    A CS guide to the quantum singular value transformation

    Authors: Ewin Tang, Kevin Tian

    Abstract: We present a simplified exposition of some pieces of [Gilyén, Su, Low, and Wiebe, STOC'19, arXiv:1806.01838], which introduced a quantum singular value transformation (QSVT) framework for applying polynomial functions to block-encoded matrices. The QSVT framework has garnered substantial recent interest from the quantum algorithms community, as it was demonstrated by [GSLW19] to encapsulate many e… ▽ More

    Submitted 29 October, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 32 pages; v2 QSVT proofs more self-contained, additional result separating bounded and unbounded polynomial approximations

  29. arXiv:2302.06085  [pdf, ps, other

    cs.DS cs.CR cs.LG math.PR stat.CO

    Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler

    Authors: Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

    Abstract: The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a non-Euclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the log-Laplace transfo… ▽ More

    Submitted 22 February, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: Comments welcome! v2 improves constant in duality result, adds citations

  30. arXiv:2301.12060  [pdf, other

    cs.CR math.RA

    An Unbounded Fully Homomorphic Encryption Scheme Based on Ideal Lattices and Chinese Remainder Theorem

    Authors: Zhiyong Zheng, Fengxia Liu, Kun Tian

    Abstract: We propose an unbounded fully homomorphic encryption scheme, i.e. a scheme that allows one to compute on encrypted data for any desired functions without needing to decrypt the data or knowing the decryption keys. This is a rational solution to an old problem proposed by Rivest, Adleman, and Dertouzos \cite{32} in 1978, and to some new problems appeared in Peikert \cite{28} as open questions 10 an… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: NO

    MSC Class: Security and privacy - Cryptography and other attacks; Theory of computation - Computational complexity and cryptography

  31. arXiv:2301.03763  [pdf, ps, other

    quant-ph cs.DS math.OC

    Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling

    Authors: Adam Bouland, Yosheb Getachew, Yujia Jin, Aaron Sidford, Kevin Tian

    Abstract: We give a quantum algorithm for computing an $ε$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot ε^{-2.5} + ε^{-3})$ and outputs a classical representation of the $ε$-approximate Nash equilibrium. This improves upon the be… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  32. arXiv:2301.03580  [pdf, other

    cs.CV cs.AI cs.LG

    Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling

    Authors: Keyu Tian, Yi Jiang, Qishuai Diao, Chen Lin, Liwei Wang, Zehuan Yuan

    Abstract: We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets): (i) convolution operation cannot handle irregular, random-masked input images; (ii) the single-scale nature of BERT pre-training is inconsistent with convnet's hierarchical structure. For (i), we treat unmasked pixels as sparse voxels o… ▽ More

    Submitted 10 January, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: v2: fixed some formatting errors

  33. arXiv:2301.00457  [pdf, other

    math.OC cs.CR cs.DS cs.LG stat.ML

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

    Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  34. arXiv:2212.07612  [pdf

    cs.DB

    TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph Database

    Authors: Kai Huang, Haibo Hu, Qingqing Ye, Kai Tian, Bolong Zheng, Xiaofang Zhou

    Abstract: With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (e.g., chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: This paper is accepted by SIGMOD 2023

  35. arXiv:2211.15039  [pdf, other

    cs.CV cs.MM

    Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding

    Authors: Xirong Li, Aozhu Chen, Ziyue Wang, Fan Hu, Kaibin Tian, Xinru Chen, Chengbo Dong

    Abstract: We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  36. arXiv:2211.14297  [pdf, ps, other

    stat.ML cs.LG

    Doubly robust nearest neighbors in factor models

    Authors: Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

    Abstract: We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for esti… ▽ More

    Submitted 29 January, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

  37. arXiv:2211.06962  [pdf, other

    cs.IT cs.AI

    A Scalable Graph Neural Network Decoder for Short Block Codes

    Authors: Kou Tian, Chentao Yue, Changyang She, Yonghui Li, Branka Vucetic

    Abstract: In this work, we propose a novel decoding algorithm for short block codes based on an edge-weighted graph neural network (EW-GNN). The EW-GNN decoder operates on the Tanner graph with an iterative message-passing structure, which algorithmically aligns with the conventional belief propagation (BP) decoding method. In each iteration, the "weight" on the message passed along each edge is obtained fr… ▽ More

    Submitted 13 November, 2022; originally announced November 2022.

    Comments: Submitted to IEEE conference for possible publication

  38. arXiv:2207.08347  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private Convex Optimization in General Norms

    Authors: Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

    Abstract: We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+μr))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\|\cdot\|$, generalizing a recent work o… ▽ More

    Submitted 10 November, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: SODA 2023

  39. arXiv:2207.02042  [pdf, other

    cs.CV

    MVP: Robust Multi-View Practice for Driving Action Localization

    Authors: Jingjie Shang, Kunchang Li, Kaibin Tian, Haisheng Su, Yangguang Li

    Abstract: Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  40. arXiv:2204.12721  [pdf, other

    cs.DS math.OC

    Regularized Box-Simplex Games and Dynamic Decremental Bipartite Matching

    Authors: Arun Jambulapati, Yujia Jin, Aaron Sidford, Kevin Tian

    Abstract: Box-simplex games are a family of bilinear minimax objectives which encapsulate graph-structured problems such as maximum flow [She17], optimal transport [JST19], and bipartite matching [AJJ+22]. We develop efficient near-linear time, high-accuracy solvers for regularized variants of these games. Beyond the immediate applications of such solvers for computing Sinkhorn distances, a prominent tool i… ▽ More

    Submitted 13 June, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: Accepted at ICALP'22

  41. arXiv:2204.01210  [pdf, other

    cs.CV

    Co-Teaching for Unsupervised Domain Adaptation and Expansion

    Authors: Kaibin Tian, Qijie Wei, Xirong Li

    Abstract: Unsupervised Domain Adaptation (UDA) essentially trades a model's performance on a source domain for improving its performance on a target domain. To resolve the issue, Unsupervised Domain Expansion (UDE) has been proposed recently. UDE tries to adapt the model for the target domain as UDA does, and in the meantime maintains its source-domain performance. In both UDA and UDE settings, a model tail… ▽ More

    Submitted 13 September, 2023; v1 submitted 3 April, 2022; originally announced April 2022.

  42. arXiv:2203.11190  [pdf, ps, other

    cs.DS

    Quadratic Speedups in Parallel Sampling from Determinantal Distributions

    Authors: Nima Anari, Callum Burgess, Kevin Tian, Thuy-Duong Vuong

    Abstract: We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel… ▽ More

    Submitted 28 April, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: 33 pages, SPAA 2023

  43. arXiv:2203.04002  [pdf, ps, other

    cs.DS cs.LG math.OC stat.ML

    Semi-Random Sparse Recovery in Nearly-Linear Time

    Authors: Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian

    Abstract: Sparse recovery is one of the most fundamental and well-studied inverse problems. Standard statistical formulations of the problem are provably solved by general convex programming techniques and more practical, fast (nearly-linear time) iterative methods. However, these latter "fast algorithms" have previously been observed to be brittle in various real-world settings. We investigate the brittl… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: 42 pages, comments welcome!

  44. arXiv:2202.06891  [pdf, other

    stat.ML cs.LG

    Counterfactual inference for sequential experiments

    Authors: Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

    Abstract: We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumpt… ▽ More

    Submitted 16 April, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

  45. arXiv:2202.04640  [pdf, ps, other

    math.OC cs.DS cs.LG

    Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

    Authors: Yujia Jin, Aaron Sidford, Kevin Tian

    Abstract: We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by [CST21]. (1) Separable minimax optimization. We study separable minimax optimization problems $\min_x \max_y f(x) - g(y) + h(x, y)$, wher… ▽ More

    Submitted 9 February, 2022; originally announced February 2022.

  46. arXiv:2112.14115  [pdf, ps, other

    cs.IT

    A Generalization of Cyclic Code and Applications to Public Key Cryptosystems

    Authors: Zhiyong Zheng, Wenlin Huang, Jie Xu, Kun Tian

    Abstract: In this paper, we define and discuss φ-cyclic code, which may be regarded as a general form of the ordinary cyclic code. As applications, we explain how to extend two public key encryption schemes, one is McEliece and Niederriter's cryptosystem of which is based on error-correcting code theory. Another one is NTRU public key cryptosystem of which is based on polynomial ring theory. The main purpos… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  47. arXiv:2112.13185  [pdf, other

    cs.IT

    Cyclic Lattices, Ideal Lattices and Bounds for the Smoothing Parameter

    Authors: Zhiyong Zheng, Fengxia Liu, Yunfan Lu, Kun Tian

    Abstract: Cyclic lattices and ideal lattices were introduced by Micciancio in \cite{D2}, Lyubashevsky and Micciancio in \cite{L1} respectively, which play an efficient role in Ajtai's construction of a collision resistant Hash function (see \cite{M1} and \cite{M2}) and in Gentry's construction of fully homomorphic encryption (see \cite{G}). Let $R=Z[x]/\langle φ(x)\rangle$ be a quotient ring of the integer… ▽ More

    Submitted 25 December, 2021; originally announced December 2021.

    Comments: 27pages

    MSC Class: H.4

  48. Living-Off-The-Land Command Detection Using Active Learning

    Authors: Talha Ongun, Jack W. Stokes, Jonathan Bar Or, Ke Tian, Farid Tajaddodianfar, Joshua Neil, Christian Seifert, Alina Oprea, John C. Platt

    Abstract: In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distrib… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: 14 pages, published in RAID 2021

  49. arXiv:2106.11938  [pdf, ps, other

    cs.DS cs.LG math.OC stat.ML

    Robust Regression Revisited: Acceleration and Improved Estimation Rates

    Authors: Arun Jambulapati, Jerry Li, Tselil Schramm, Kevin Tian

    Abstract: We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: 47 pages

  50. arXiv:2106.08537  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

    Authors: Ilias Diakonikolas, Daniel M. Kane, Daniel Kongsgaard, Jerry Li, Kevin Tian

    Abstract: We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $\mathbb{R}^d$ and a parameter $0< α<\frac 1 2$ such that an $α$-fraction of the points in $T$ are i.i.d. samples from a well-behaved distribution $\mathcal{D}$ and the remaining $(1-α)$-fraction are arbitrary. The goal is to output… ▽ More

    Submitted 12 November, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 64 pages, 1 figure. v2 improves results on bounded-covariance clustering, polishes exposition