-
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Authors:
Junying Chen,
Ruyi Ouyang,
Anningzhe Gao,
Shunian Chen,
Guiming Hardy Chen,
Xidong Wang,
Ruifei Zhang,
Zhenyang Cai,
Ke Ji,
Guangjun Yu,
Xiang Wan,
Benyou Wang
Abstract:
The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-i…
▽ More
The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-identified medical image-text pairs to address these limitations, they still fall short due to inherent data noise. To tackle this, we refined medical image-text pairs from PubMed and employed MLLMs (GPT-4V) in an 'unblinded' capacity to denoise and reformat the data, resulting in the creation of the PubMedVision dataset with 1.3 million medical VQA samples. Our validation demonstrates that: (1) PubMedVision can significantly enhance the medical multimodal capabilities of current MLLMs, showing significant improvement in benchmarks including the MMMU Health & Medicine track; (2) manual checks by medical experts and empirical results validate the superior data quality of our dataset compared to other data construction methods. Using PubMedVision, we train a 34B medical MLLM HuatuoGPT-Vision, which shows superior performance in medical multimodal scenarios among open-source MLLMs.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
MileBench: Benchmarking MLLMs in Long Context
Authors:
Dingjie Song,
Shunian Chen,
Guiming Hardy Chen,
Fei Yu,
Xiang Wan,
Benyou Wang
Abstract:
Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task…
▽ More
Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task (e.g time-series captioning), potentially obscuring the performance challenges of MLLMs. To address these limitations, we introduce MileBench, a pioneering benchmark designed to test the MultImodal Long-contExt capabilities of MLLMs. This benchmark comprises not only multimodal long contexts, but also multiple tasks requiring both comprehension and generation. We establish two distinct evaluation sets, diagnostic and realistic, to systematically assess MLLMs' long-context adaptation capacity and their ability to complete tasks in long-context scenarios. Our experimental results, obtained from testing 22 models, revealed that while the closed-source GPT-4o outperforms others, most open-source MLLMs struggle in long-context situations. Interestingly, the performance gap tends to widen with an increase in the number of images. We strongly encourage an intensification of research efforts towards enhancing MLLMs' long-context capabilities, especially in scenarios involving multiple images.
△ Less
Submitted 15 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
Authors:
Guiming Hardy Chen,
Shunian Chen,
Ruifei Zhang,
Junying Chen,
Xiangbo Wu,
Zhiyi Zhang,
Zhihong Chen,
Jianquan Li,
Xiang Wan,
Benyou Wang
Abstract:
Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and deployment. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data.…
▽ More
Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and deployment. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To this end, we propose a comprehensive pipeline for generating a synthetic dataset. The key idea is to leverage strong proprietary models to generate (i) fine-grained image annotations for vision-language alignment and (ii) complex reasoning visual question-answering pairs for visual instruction fine-tuning, yielding 1.3M samples in total. We train a series of lite VLMs on the synthetic dataset and experimental results demonstrate the effectiveness of the proposed scheme, where they achieve competitive performance on 17 benchmarks among 4B LVLMs, and even perform on par with 7B/13B-scale models on various benchmarks. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. We name our dataset \textit{ALLaVA}, and open-source it to research community for developing better resource-efficient LVLMs for wider usage.
△ Less
Submitted 17 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Humans or LLMs as the Judge? A Study on Judgement Biases
Authors:
Guiming Hardy Chen,
Shunian Chen,
Ziche Liu,
Feng Jiang,
Benyou Wang
Abstract:
Adopting human and large language models (LLM) as judges (a.k.a human- and LLM-as-a-judge) for evaluating the performance of LLMs has recently gained attention. Nonetheless, this approach concurrently introduces potential biases from human and LLMs, questioning the reliability of the evaluation results. In this paper, we propose a novel framework that is free from referencing groundtruth annotatio…
▽ More
Adopting human and large language models (LLM) as judges (a.k.a human- and LLM-as-a-judge) for evaluating the performance of LLMs has recently gained attention. Nonetheless, this approach concurrently introduces potential biases from human and LLMs, questioning the reliability of the evaluation results. In this paper, we propose a novel framework that is free from referencing groundtruth annotations for investigating Misinformation Oversight Bias, Gender Bias, Authority Bias and Beauty Bias on LLM and human judges. We curate a dataset referring to the revised Bloom's Taxonomy and conduct thousands of evaluations. Results show that human and LLM judges are vulnerable to perturbations to various degrees, and that even the cutting-edge judges possess considerable biases. We further exploit these biases to conduct attacks on LLM judges. We hope that our work can notify the community of the bias and vulnerability of human- and LLM-as-a-judge, as well as the urgency of developing robust evaluation systems.
△ Less
Submitted 16 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Temporal Supervised Contrastive Learning for Modeling Patient Risk Progression
Authors:
Shahriar Noroozizadeh,
Jeremy C. Weiss,
George H. Chen
Abstract:
We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nea…
▽ More
We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nearby points in the embedding space have similar predicted class probabilities, (2) adjacent time steps of the same time series map to nearby points in the embedding space, and (3) time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to data augmentation, a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
Authors:
Wentao Ge,
Shunian Chen,
Guiming Hardy Chen,
Zhihong Chen,
Junying Chen,
Shuo Yan,
Chenghao Zhu,
Ziyue Lin,
Wenya Xie,
Xinyi Zhang,
Yichen Chai,
Xiaoyu Liu,
Dingjie Song,
Xidong Wang,
Anningzhe Gao,
Zhiyi Zhang,
Jianquan Li,
Xiang Wan,
Benyou Wang
Abstract:
Multimodal large language models (MLLMs) (e.g., GPT-4V, LLaVA, and Claude-3) have broadened the scope of AI applications. Yet, evaluating their performance presents a significant challenge owing to the inherently subjective nature of tasks that do not yield clear-cut solutions especially for those open-ended queries. Existing automatic evaluation methodologies are mainly limited in evaluating obje…
▽ More
Multimodal large language models (MLLMs) (e.g., GPT-4V, LLaVA, and Claude-3) have broadened the scope of AI applications. Yet, evaluating their performance presents a significant challenge owing to the inherently subjective nature of tasks that do not yield clear-cut solutions especially for those open-ended queries. Existing automatic evaluation methodologies are mainly limited in evaluating objective queries without considering real-world user experiences, inadequately addressing the nuances of creative and associative multimodal tasks. In our paper, we propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with \textit{per-sample criteria} using potent MLLM as the judge. To validate the feasibility and effectiveness of this paradigm, we design a benchmark, dubbed \textit{MLLM-Bench}, with the evaluation samples across six critical levels following the revised Bloom's Taxonomy with the ethical consideration. We benchmark 21 popular MLLMs in a pairwise-comparison fashion, showing diverse performance across models. Moreover, the validity of our benchmark manifests itself in reaching 88.02\% agreement with human evaluation. We contend that the proposed paradigm explores the potential of MLLMs as effective evaluation tools with the help of per-sample criteria, and that MLLM-Bench will serve as a catalyst for encouraging the development of user-centric MLLMs tailored to real-world applications. Our benchmark data, online leaderboard and submission entry are at https://mllm-bench.llmzoo.com.
△ Less
Submitted 27 April, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation
Authors:
Hongcheng Wang,
Andy Guan Hong Chen,
Xiaoqi Li,
Mingdong Wu,
Hao Dong
Abstract:
The task of Visual Object Navigation (VON) involves an agent's ability to locate a particular object within a given scene. In order to successfully accomplish the VON task, two essential conditions must be fulfilled:1) the user must know the name of the desired object; and 2) the user-specified object must actually be present within the scene. To meet these conditions, a simulator can incorporate…
▽ More
The task of Visual Object Navigation (VON) involves an agent's ability to locate a particular object within a given scene. In order to successfully accomplish the VON task, two essential conditions must be fulfilled:1) the user must know the name of the desired object; and 2) the user-specified object must actually be present within the scene. To meet these conditions, a simulator can incorporate pre-defined object names and positions into the metadata of the scene. However, in real-world scenarios, it is often challenging to ensure that these conditions are always met. Human in an unfamiliar environment may not know which objects are present in the scene, or they may mistakenly specify an object that is not actually present. Nevertheless, despite these challenges, human may still have a demand for an object, which could potentially be fulfilled by other objects present within the scene in an equivalent manner. Hence, we propose Demand-driven Navigation (DDN), which leverages the user's demand as the task instruction and prompts the agent to find the object matches the specified demand. DDN aims to relax the stringent conditions of VON by focusing on fulfilling the user's demand rather than relying solely on predefined object categories or names. We propose a method first acquire textual attribute features of objects by extracting common knowledge from a large language model. These textual attribute features are subsequently aligned with visual attribute features using Contrastive Language-Image Pre-training (CLIP). By incorporating the visual attribute features as prior knowledge, we enhance the navigation process. Experiments on AI2Thor with the ProcThor dataset demonstrate the visual attribute features improve the agent's navigation performance and outperform the baseline methods commonly used in VON.
△ Less
Submitted 6 November, 2023; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks
Authors:
Xiaobin Shen,
Jonathan Elmer,
George H. Chen
Abstract:
Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (the task of neurological prognostication) could help with treatment decisions. In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our fra…
▽ More
Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (the task of neurological prognostication) could help with treatment decisions. In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our framework makes predictions for a patient over time as more EEG data become available, and different training patients' available EEG time series could vary in length. Predictions are phrased in terms of either time-to-event outcomes (time-to-awakening or time-to-death) or as the patient's probability of awakening or of dying across multiple time horizons. Our framework uses any dynamic survival analysis model that supports competing risks in the form of estimating patient-level cumulative incidence functions. We consider three competing risks as to what happens first to a patient: awakening, being withdrawn from life-sustaining therapies (and thus deterministically dying), or dying (by other causes). We demonstrate our framework by benchmarking three existing dynamic survival analysis models that support competing risks on a real dataset of 922 patients. Our main experimental findings are that: (1) the classical Fine and Gray model which only uses a patient's static features and summary statistics from the patient's latest hour's worth of EEG data is highly competitive, achieving accuracy scores as high as the recently developed Dynamic-DeepHit model that uses substantially more of the patient's EEG data; and (2) in an ablation study, we show that our choice of modeling three competing risks results in a model that is at least as accurate while learning more information than simpler models (using two competing risks or a standard survival analysis setup with no competing risks).
△ Less
Submitted 30 November, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
CMB: A Comprehensive Medical Benchmark in Chinese
Authors:
Xidong Wang,
Guiming Hardy Chen,
Dingjie Song,
Zhiyi Zhang,
Zhihong Chen,
Qingying Xiao,
Feng Jiang,
Jianquan Li,
Xiang Wan,
Benyou Wang,
Haizhou Li
Abstract:
Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translatin…
▽ More
Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB.
△ Less
Submitted 4 April, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Improving Fairness in Deepfake Detection
Authors:
Yan Ju,
Shu Hu,
Shan Jia,
George H. Chen,
Siwei Lyu
Abstract:
Despite the development of effective deepfake detectors in recent years, recent studies have demonstrated that biases in the data used to train these detectors can lead to disparities in detection accuracy across different races and genders. This can result in different groups being unfairly targeted or excluded from detection, allowing undetected deepfakes to manipulate public opinion and erode t…
▽ More
Despite the development of effective deepfake detectors in recent years, recent studies have demonstrated that biases in the data used to train these detectors can lead to disparities in detection accuracy across different races and genders. This can result in different groups being unfairly targeted or excluded from detection, allowing undetected deepfakes to manipulate public opinion and erode trust in a deepfake detection model. While existing studies have focused on evaluating fairness of deepfake detectors, to the best of our knowledge, no method has been developed to encourage fairness in deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions that handle both the setting where demographic information (eg, annotations of race and gender) is available as well as the case where this information is absent. Fundamentally, both approaches can be used to convert many existing deepfake detectors into ones that encourages fairness. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving deepfake detection fairness. Our code is available at https://github.com/littlejuyan/DF_Fairness.
△ Less
Submitted 8 November, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
On the Difference of BERT-style and CLIP-style Text Encoders
Authors:
Zhihong Chen,
Guiming Hardy Chen,
Shizhe Diao,
Xiang Wan,
Benyou Wang
Abstract:
Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e.g., BERT, one of the representative models. Recently, contrastive language-image pretraining (CLIP) has also attracted attention, especially its vision models that achieve excellent performance on a broad range of vision tasks. However, few studies are dedicated to studying the tex…
▽ More
Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e.g., BERT, one of the representative models. Recently, contrastive language-image pretraining (CLIP) has also attracted attention, especially its vision models that achieve excellent performance on a broad range of vision tasks. However, few studies are dedicated to studying the text encoders learned by CLIP. In this paper, we analyze the difference between BERT-style and CLIP-style text encoders from three experiments: (i) general text understanding, (ii) vision-centric text understanding, and (iii) text-to-image generation. Experimental analyses show that although CLIP-style text encoders underperform BERT-style ones for general text understanding tasks, they are equipped with a unique ability, i.e., synesthesia, for the cross-modal association, which is more similar to the senses of humans.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
A General Framework for Visualizing Embedding Spaces of Neural Survival Analysis Models Based on Angular Information
Authors:
George H. Chen
Abstract:
We propose a general framework for visualizing any intermediate embedding representation used by any neural survival analysis model. Our framework is based on so-called anchor directions in an embedding space. We show how to estimate these anchor directions using clustering or, alternatively, using user-supplied "concepts" defined by collections of raw inputs (e.g., feature vectors all from female…
▽ More
We propose a general framework for visualizing any intermediate embedding representation used by any neural survival analysis model. Our framework is based on so-called anchor directions in an embedding space. We show how to estimate these anchor directions using clustering or, alternatively, using user-supplied "concepts" defined by collections of raw inputs (e.g., feature vectors all from female patients could encode the concept "female"). For tabular data, we present visualization strategies that reveal how anchor directions relate to raw clinical features and to survival time distributions. We then show how these visualization ideas extend to handling raw inputs that are images. Our framework is built on looking at angles between vectors in an embedding space, where there could be "information loss" by ignoring magnitude information. We show how this loss results in a "clumping" artifact that appears in our visualizations, and how to reduce this information loss in practice.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Distributionally Robust Survival Analysis: A Novel Fairness Loss Without Demographics
Authors:
Shu Hu,
George H. Chen
Abstract:
We propose a general approach for training survival analysis models that minimizes a worst-case error across all subpopulations that are large enough (occurring with at least a user-specified minimum probability). This approach uses a training loss function that does not know any demographic information to treat as sensitive. Despite this, we demonstrate that our proposed approach often scores bet…
▽ More
We propose a general approach for training survival analysis models that minimizes a worst-case error across all subpopulations that are large enough (occurring with at least a user-specified minimum probability). This approach uses a training loss function that does not know any demographic information to treat as sensitive. Despite this, we demonstrate that our proposed approach often scores better on recently established fairness metrics (without a significant drop in prediction accuracy) compared to various baselines, including ones which directly use sensitive demographic information in their training loss. Our code is available at: https://github.com/discovershu/DRO_COX
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee
Authors:
George H. Chen
Abstract:
Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model int…
▽ More
Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets
△ Less
Submitted 19 February, 2024; v1 submitted 21 June, 2022;
originally announced June 2022.
-
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs
Authors:
Kay Liu,
Yingtong Dou,
Yue Zhao,
Xueying Ding,
Xiyang Hu,
Ruitong Zhang,
Kaize Ding,
Canyu Chen,
Hao Peng,
Kai Shu,
Lichao Sun,
Jundong Li,
George H. Chen,
Zhihao Jia,
Philip S. Yu
Abstract:
Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge t…
▽ More
Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge this gap, we present--to the best of our knowledge--the first comprehensive benchmark for unsupervised outlier node detection on static attributed graphs called BOND, with the following highlights. (1) We benchmark the outlier detection performance of 14 methods ranging from classical matrix factorization to the latest graph neural networks. (2) Using nine real datasets, our benchmark assesses how the different detection methods respond to two major types of synthetic outliers and separately to "organic" (real non-synthetic) outliers. (3) Using an existing random graph generation technique, we produce a family of synthetically generated datasets of different graph sizes that enable us to compare the running time and memory usage of the different outlier detection algorithms. Based on our experimental results, we discuss the pros and cons of existing graph outlier detection algorithms, and we highlight opportunities for future research. Importantly, our code is freely available and meant to be easily extendable: https://github.com/pygod-team/pygod/tree/main/benchmark
△ Less
Submitted 15 October, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022
Authors:
Gerardo Flores,
George H. Chen,
Tom Pollard,
Joyce C. Ho,
Tristan Naumann
Abstract:
A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022. This index is incomplete as some authors of invited non-archival presentations opted not to include their papers in this index.
A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022. This index is incomplete as some authors of invited non-archival presentations opted not to include their papers in this index.
△ Less
Submitted 28 March, 2022;
originally announced May 2022.
-
ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions
Authors:
Zheng Li,
Yue Zhao,
Xiyang Hu,
Nicola Botta,
Cezar Ionescu,
George H. Chen
Abstract:
Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical…
▽ More
Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the "rare events" that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.
△ Less
Submitted 24 August, 2022; v1 submitted 2 January, 2022;
originally announced January 2022.
-
TOD: GPU-accelerated Outlier Detection via Tensor Operations
Authors:
Yue Zhao,
George H. Chen,
Zhihao Jia
Abstract:
Outlier detection (OD) is a key learning task for finding rare and deviant data samples, with many time-critical applications such as fraud detection and intrusion detection. In this work, we propose TOD, the first tensor-based system for efficient and scalable outlier detection on distributed multi-GPU machines. A key idea behind TOD is decomposing complex OD applications into a small collection…
▽ More
Outlier detection (OD) is a key learning task for finding rare and deviant data samples, with many time-critical applications such as fraud detection and intrusion detection. In this work, we propose TOD, the first tensor-based system for efficient and scalable outlier detection on distributed multi-GPU machines. A key idea behind TOD is decomposing complex OD applications into a small collection of basic tensor algebra operators. This decomposition enables TOD to accelerate OD computations by leveraging recent advances in deep learning infrastructure in both hardware and software. Moreover, to deploy memory-intensive OD applications on modern GPUs with limited on-device memory, we introduce two key techniques. First, provable quantization speeds up OD computations and reduces its memory footprint by automatically performing specific floating-point operations in lower precision while provably guaranteeing no accuracy loss. Second, to exploit the aggregated compute resources and memory capacity of multiple GPUs, we introduce automatic batching, which decomposes OD computations into small batches for both sequential execution on a single GPU and parallel execution on multiple GPUs.
TOD supports a diverse set of OD algorithms. Extensive evaluation on 11 real and 3 synthetic OD datasets shows that TOD is on average 10.9x faster than the leading CPU-based OD system PyOD (with a maximum speedup of 38.9x), and can handle much larger datasets than existing GPU-based OD systems. In addition, TOD allows easy integration of new OD operators, enabling fast prototyping of emerging and yet-to-discovered OD algorithms.
△ Less
Submitted 16 September, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Path planning model of mobile robots in the context of crowds
Authors:
W. Z. Wang,
R. Q. Wang,
G. H. Chen
Abstract:
Robot path planning model based on RNN and visual quality evaluation in the context of crowds is analyzed in this paper. Mobile robot path planning is the key to robot navigation and an important field in robot research. Let the motion space of the robot be a two-dimensional plane, and the motion of the robot is regarded as a kind of motion under the virtual artificial potential field force when t…
▽ More
Robot path planning model based on RNN and visual quality evaluation in the context of crowds is analyzed in this paper. Mobile robot path planning is the key to robot navigation and an important field in robot research. Let the motion space of the robot be a two-dimensional plane, and the motion of the robot is regarded as a kind of motion under the virtual artificial potential field force when the artificial potential field method is used for the path planning. Compared to simple image acquisition, image acquisition in a complex crowd environment requires image pre-processing first. We mainly use OpenCV calibration tools to pre-process the acquired images. In themethodology design, the RNN-based visual quality evaluation to filter background noise is conducted. After calibration, Gaussian noise and some other redundant information affecting the subsequent operations still exist in the image. Based on RNN, a new image quality evaluation algorithm is developed, and denoising is performed on this basis. Furthermore, the novel path planning model is designed and simulated. The expeirment compared with the state-of-the-art models have shown the robustness of the model.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Deep Kernel Survival Analysis and Subject-Specific Survival Time Prediction Intervals
Authors:
George H. Chen
Abstract:
Kernel survival analysis methods predict subject-specific survival curves and times using information about which training subjects are most similar to a test subject. These most similar training subjects could serve as forecast evidence. How similar any two subjects are is given by the kernel function. In this paper, we present the first neural network framework that learns which kernel functions…
▽ More
Kernel survival analysis methods predict subject-specific survival curves and times using information about which training subjects are most similar to a test subject. These most similar training subjects could serve as forecast evidence. How similar any two subjects are is given by the kernel function. In this paper, we present the first neural network framework that learns which kernel functions to use in kernel survival analysis. We also show how to use kernel functions to construct prediction intervals of survival time estimates that are statistically valid for individuals similar to a test subject. These prediction intervals can use any kernel function, such as ones learned using our neural kernel learning framework or using random survival forests. Our experiments show that our neural kernel survival estimators are competitive with a variety of existing survival analysis methods, and that our prediction intervals can help compare different methods' uncertainties, even for estimators that do not use kernels. In particular, these prediction interval widths can be used as a new performance metric for survival analysis methods.
△ Less
Submitted 25 July, 2020;
originally announced July 2020.
-
Neural Topic Models with Survival Supervision: Jointly Predicting Time-to-Event Outcomes and Learning How Clinical Features Relate
Authors:
George H. Chen,
Linhong Li,
Ren Zuo,
Amanda Coston,
Jeremy C. Weiss
Abstract:
We present a neural network framework for learning a survival model to predict a time-to-event outcome while simultaneously learning a topic model that reveals feature relationships. In particular, we model each subject as a distribution over "topics", where a topic could, for instance, correspond to an age group, a disorder, or a disease. The presence of a topic in a subject means that specific c…
▽ More
We present a neural network framework for learning a survival model to predict a time-to-event outcome while simultaneously learning a topic model that reveals feature relationships. In particular, we model each subject as a distribution over "topics", where a topic could, for instance, correspond to an age group, a disorder, or a disease. The presence of a topic in a subject means that specific clinical features are more likely to appear for the subject. Topics encode information about related features and are learned in a supervised manner to predict a time-to-event outcome. Our framework supports combining many different topic and survival models; training the resulting joint survival-topic model readily scales to large datasets using standard neural net optimizers with minibatch gradient descent. For example, a special case is to combine LDA with a Cox model, in which case a subject's distribution over topics serves as the input feature vector to the Cox model. We explain how to address practical implementation issues that arise when applying these neural survival-supervised topic models to clinical data, including how to visualize results to assist clinical interpretation. We study the effectiveness of our proposed framework on seven clinical datasets on predicting time until death as well as hospital ICU length of stay, where we find that neural survival-supervised topic models achieve competitive accuracy with existing approaches while yielding interpretable clinical topics that explain feature relationships. Our code is available at: https://github.com/georgehc/survival-topics
△ Less
Submitted 4 June, 2024; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Predicting Mortality Risk in Viral and Unspecified Pneumonia to Assist Clinicians with COVID-19 ECMO Planning
Authors:
Helen Zhou,
Cheng Cheng,
Zachary C. Lipton,
George H. Chen,
Jeremy C. Weiss
Abstract:
Respiratory complications due to coronavirus disease COVID-19 have claimed tens of thousands of lives in 2020. Many cases of COVID-19 escalate from Severe Acute Respiratory Syndrome (SARS-CoV-2) to viral pneumonia to acute respiratory distress syndrome (ARDS) to death. Extracorporeal membranous oxygenation (ECMO) is a life-sustaining oxygenation and ventilation therapy that may be used for patient…
▽ More
Respiratory complications due to coronavirus disease COVID-19 have claimed tens of thousands of lives in 2020. Many cases of COVID-19 escalate from Severe Acute Respiratory Syndrome (SARS-CoV-2) to viral pneumonia to acute respiratory distress syndrome (ARDS) to death. Extracorporeal membranous oxygenation (ECMO) is a life-sustaining oxygenation and ventilation therapy that may be used for patients with severe ARDS when mechanical ventilation is insufficient to sustain life. While early planning and surgical cannulation for ECMO can increase survival, clinicians report the lack of a risk score hinders these efforts. In this work, we leverage machine learning techniques to develop the PEER score, used to highlight critically ill patients with viral or unspecified pneumonia at high risk of mortality or decompensation in a subpopulation eligible for ECMO. The PEER score is validated on two large, publicly available critical care databases and predicts mortality at least as well as other existing risk scores. Stratifying our cohorts into low-risk and high-risk groups, we find that the high-risk group also has a higher proportion of decompensation indicators such as vasopressor and ventilator use. Finally, the PEER score is provided in the form of a nomogram for direct calculation of patient risk, and can be used to highlight at-risk patients among critical care patients eligible for ECMO.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online
Authors:
Emaad Manzoor,
George H. Chen,
Dokyun Lee,
Michael D. Smith
Abstract:
Deliberation among individuals online plays a key role in shaping the opinions that drive votes, purchases, donations and other critical offline behavior. Yet, the determinants of opinion-change via persuasion in deliberation online remain largely unexplored. Our research examines the persuasive power of $\textit{ethos}$ -- an individual's "reputation" -- using a 7-year panel of over a million deb…
▽ More
Deliberation among individuals online plays a key role in shaping the opinions that drive votes, purchases, donations and other critical offline behavior. Yet, the determinants of opinion-change via persuasion in deliberation online remain largely unexplored. Our research examines the persuasive power of $\textit{ethos}$ -- an individual's "reputation" -- using a 7-year panel of over a million debates from an argumentation platform containing explicit indicators of successful persuasion. We identify the causal effect of reputation on persuasion by constructing an instrument for reputation from a measure of past debate competition, and by controlling for unstructured argument text using neural models of language in the double machine-learning framework. We find that an individual's reputation significantly impacts their persuasion rate above and beyond the validity, strength and presentation of their arguments. In our setting, we find that having 10 additional reputation points causes a 31% increase in the probability of successful persuasion over the platform average. We also find that the impact of reputation is moderated by characteristics of the argument content, in a manner consistent with a theoretical model that attributes the persuasive power of reputation to heuristic information-processing under cognitive overload. We discuss managerial implications for platforms that facilitate deliberative decision-making for public and private organizations online.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Scene recognition based on DNN and game theory with its applications in human-robot interaction
Authors:
R. Q. Wang,
W. Z. Wang,
D. Z. Zhao,
G. H. Chen,
D. S. Luo
Abstract:
Scene recognition model based on the DNN and game theory with its applications in human-robot interaction is proposed in this paper. The use of deep learning methods in the field of scene recognition is still in its infancy, but has become an important trend in the future. As the innovative idea of the paper, we propose the following novelties. (1) In this paper, the image registration problem is…
▽ More
Scene recognition model based on the DNN and game theory with its applications in human-robot interaction is proposed in this paper. The use of deep learning methods in the field of scene recognition is still in its infancy, but has become an important trend in the future. As the innovative idea of the paper, we propose the following novelties. (1) In this paper, the image registration problem is transformed into a problem of minimum energy in Markov Random Field to finalize the image pre-processing task. Game theory is used to find the optimal. (2) We select neighboring homogeneous sample features and the neighboring heterogeneous sample features for the extracted sample features to build a triple and modify the traditional neural network to propose the novel DNN for scene understanding. (3) The robot control is well combined to guide the robot vision for multiple tasks. The experiment is then conducted to validate the overall performance.
△ Less
Submitted 10 January, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
Simplified_edition_Multi-robot SLAM Multi-view Target Tracking based on Panoramic Vision in Irregular Environment
Authors:
R. Q. Wang,
Z. Q. Yuan,
G. H. Chen
Abstract:
In order to improve the precision of multi-robot SLAM multi-view target tracking process, a improved multi-robot SLAM multi-view target tracking algorithm based on panoramic vision in irregular environment was put forward, adding an correction factor to renew the existing Extended Kalman Filter (EKF) model, obtaining new coordinates X and Y after twice iterations. The paper has been accepted by Co…
▽ More
In order to improve the precision of multi-robot SLAM multi-view target tracking process, a improved multi-robot SLAM multi-view target tracking algorithm based on panoramic vision in irregular environment was put forward, adding an correction factor to renew the existing Extended Kalman Filter (EKF) model, obtaining new coordinates X and Y after twice iterations. The paper has been accepted by Computing and Visualization in Science and this is a simplified version.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption
Authors:
Wei Ma,
George H. Chen
Abstract:
Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that relies on entries being revealed at uniformly sampled row and column indices can yield overly optimistic predictions of unseen user ratings. Recently, various pa…
▽ More
Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that relies on entries being revealed at uniformly sampled row and column indices can yield overly optimistic predictions of unseen user ratings. Recently, various papers have shown that we can reduce this bias in MNAR matrix completion if we know the probabilities of different matrix entries being missing. These probabilities are typically modeled using logistic regression or naive Bayes, which make strong assumptions and lack guarantees on the accuracy of the estimated probabilities. In this paper, we suggest a simple approach to estimating these probabilities that avoids these shortcomings. Our approach follows from the observation that missingness patterns in real data often exhibit low nuclear norm structure. We can then estimate the missingness probabilities by feeding the (always fully-observed) binary matrix specifying which entries are revealed or missing to an existing nuclear-norm-constrained matrix completion algorithm by Davenport et al. [2014]. Thus, we tackle MNAR matrix completion by solving a different matrix completion problem first that recovers missingness probabilities. We establish finite-sample error bounds for how accurate these probability estimates are and how well these estimates debias standard matrix completion losses for the original matrix to be completed. Our experiments show that the proposed debiasing strategy can improve a variety of existing matrix completion algorithms, and achieves downstream matrix completion accuracy at least as good as logistic regression and naive Bayes debiasing baselines that require additional auxiliary information.
△ Less
Submitted 29 October, 2019; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Truck Traffic Monitoring with Satellite Images
Authors:
Lynn H. Kaack,
George H. Chen,
M. Granger Morgan
Abstract:
The road freight sector is responsible for a large and growing share of greenhouse gas emissions, but reliable data on the amount of freight that is moved on roads in many parts of the world are scarce. Many low- and middle-income countries have limited ground-based traffic monitoring and freight surveying activities. In this proof of concept, we show that we can use an object detection network to…
▽ More
The road freight sector is responsible for a large and growing share of greenhouse gas emissions, but reliable data on the amount of freight that is moved on roads in many parts of the world are scarce. Many low- and middle-income countries have limited ground-based traffic monitoring and freight surveying activities. In this proof of concept, we show that we can use an object detection network to count trucks in satellite images and predict average annual daily truck traffic from those counts. We describe a complete model, test the uncertainty of the estimation, and discuss the transfer to developing countries.
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates
Authors:
George H. Chen
Abstract:
We establish the first nonasymptotic error bounds for Kaplan-Meier-based nearest neighbor and kernel survival probability estimators where feature vectors reside in metric spaces. Our bounds imply rates of strong consistency for these nonparametric estimators and, up to a log factor, match an existing lower bound for conditional CDF estimation. Our proof strategy also yields nonasymptotic guarante…
▽ More
We establish the first nonasymptotic error bounds for Kaplan-Meier-based nearest neighbor and kernel survival probability estimators where feature vectors reside in metric spaces. Our bounds imply rates of strong consistency for these nonparametric estimators and, up to a log factor, match an existing lower bound for conditional CDF estimation. Our proof strategy also yields nonasymptotic guarantees for nearest neighbor and kernel variants of the Nelson-Aalen cumulative hazards estimator. We experimentally compare these methods on four datasets. We find that for the kernel survival estimator, a good choice of kernel is one learned using random survival forests.
△ Less
Submitted 14 September, 2022; v1 submitted 13 May, 2019;
originally announced May 2019.
-
An Interpretable Produce Price Forecasting System for Small and Marginal Farmers in India using Collaborative Filtering and Adaptive Nearest Neighbors
Authors:
Wei Ma,
Kendall Nowocin,
Niraj Marathe,
George H. Chen
Abstract:
Small and marginal farmers, who account for over 80% of India's agricultural population, often sell their harvest at low, unfavorable prices before spoilage. These farmers often lack access to either cold storage or market forecasts. In particular, by having access to cold storage, farmers can store their produce for longer and thus have more flexibility as to when they should sell their harvest b…
▽ More
Small and marginal farmers, who account for over 80% of India's agricultural population, often sell their harvest at low, unfavorable prices before spoilage. These farmers often lack access to either cold storage or market forecasts. In particular, by having access to cold storage, farmers can store their produce for longer and thus have more flexibility as to when they should sell their harvest by. Meanwhile, by having access to market forecasts, farmers can more easily identify which markets to sell at and when. While affordable cold storage solutions have become more widely available, there has been less work on produce price forecasting. A key challenge is that in many regions of India, predominantly in rural and remote areas, we have either very limited or no produce pricing data available from public online sources.
In this paper, we present a produce price forecasting system that pulls data from the Indian Ministry of Agriculture and Farmers Welfare's website Agmarknet, trains a model of prices using over a thousand markets, and displays interpretable price forecasts in a web application viewable from a mobile phone. Due to the pricing data being extremely sparse, our method first imputes missing entries using collaborative filtering to obtain a dense dataset. Using this imputed dense dataset, we then train a decision-tree-based classifier to predict produce prices at different markets. In terms of interpretability, we display the most relevant historical pricing data that drive each forecasted price, where we take advantage of the fact that a wide family of decision-tree-based ensemble learning methods are adaptive nearest neighbor methods. We show how to construct heuristic price uncertainty intervals based on nearest neighbors. We validate forecast accuracy on data from Agmarknet and a small field survey of a few markets in Odisha.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution
Authors:
George H. Chen,
Kendall Nowocin,
Niraj Marathe
Abstract:
India's agricultural system has been facing a severe problem of crop wastage. A key contributing factor to this problem is that many small farmers lack access to reliable cold storage that extends crop shelf-life. To avoid having leftover crops that spoil, these farmers often sell their crops at unfavorable low prices. Inevitably, not all crops are sold before spoilage. Even if the farmers have ac…
▽ More
India's agricultural system has been facing a severe problem of crop wastage. A key contributing factor to this problem is that many small farmers lack access to reliable cold storage that extends crop shelf-life. To avoid having leftover crops that spoil, these farmers often sell their crops at unfavorable low prices. Inevitably, not all crops are sold before spoilage. Even if the farmers have access to cold storage, the farmers may not know how long to hold different crops in cold storage for, which hinges on strategizing over when and where to sell their harvest. In this note, we present progress toward a simultaneous hardware and software solution that aims to help farmers reduce crop spoilage and increase their profits. The hardware is a cost-effective solar-powered refrigerator and control unit. The software refers to a produce price forecasting system, for which we have tested a number of machine learning methods. Note that unlike standard price forecasting tasks such as for stock market data, the produce price data from predominantly rural Indian markets have a large amount of missing values. In developing our two-pronged solution, we are actively working with farmers at two pilot sites in Karnataka and Odisha.
△ Less
Submitted 7 December, 2017; v1 submitted 28 October, 2017;
originally announced October 2017.
-
A Latent Source Model for Online Collaborative Filtering
Authors:
Guy Bresler,
George H. Chen,
Devavrat Shah
Abstract:
Despite the prevalence of collaborative filtering in recommendation systems, there has been little theoretical development on why and how well it works, especially in the "online" setting, where items are recommended to users over time. We address this theoretical gap by introducing a model for online recommendation systems, cast item recommendation under the model as a learning problem, and analy…
▽ More
Despite the prevalence of collaborative filtering in recommendation systems, there has been little theoretical development on why and how well it works, especially in the "online" setting, where items are recommended to users over time. We address this theoretical gap by introducing a model for online recommendation systems, cast item recommendation under the model as a learning problem, and analyze the performance of a cosine-similarity collaborative filtering method. In our model, each of $n$ users either likes or dislikes each of $m$ items. We assume there to be $k$ types of users, and all the users of a given type share a common string of probabilities determining the chance of liking each item. At each time step, we recommend an item to each user, where a key distinction from related bandit literature is that once a user consumes an item (e.g., watches a movie), then that item cannot be recommended to the same user again. The goal is to maximize the number of likable items recommended to users over time. Our main result establishes that after nearly $\log(km)$ initial learning time steps, a simple collaborative filtering algorithm achieves essentially optimal performance without knowing $k$. The algorithm has an exploitation step that uses cosine similarity and two types of exploration steps, one to explore the space of items (standard in the literature) and the other to explore similarity between users (novel to this work).
△ Less
Submitted 31 October, 2014;
originally announced November 2014.
-
Sparse Projections of Medical Images onto Manifolds
Authors:
George H. Chen,
Christian Wachinger,
Polina Golland
Abstract:
Manifold learning has been successfully applied to a variety of medical imaging problems. Its use in real-time applications requires fast projection onto the low-dimensional space. To this end, out-of-sample extensions are applied by constructing an interpolation function that maps from the input space to the low-dimensional manifold. Commonly used approaches such as the Nyström extension and kern…
▽ More
Manifold learning has been successfully applied to a variety of medical imaging problems. Its use in real-time applications requires fast projection onto the low-dimensional space. To this end, out-of-sample extensions are applied by constructing an interpolation function that maps from the input space to the low-dimensional manifold. Commonly used approaches such as the Nyström extension and kernel ridge regression require using all training points. We propose an interpolation function that only depends on a small subset of the input training data. Consequently, in the testing phase each new point only needs to be compared against a small number of input training data in order to project the point onto the low-dimensional space. We interpret our method as an out-of-sample extension that approximates kernel ridge regression. Our method involves solving a simple convex optimization problem and has the attractive property of guaranteeing an upper bound on the approximation error, which is crucial for medical applications. Tuning this error bound controls the sparsity of the resulting interpolation function. We illustrate our method in two clinical applications that require fast mapping of input images onto a low-dimensional space.
△ Less
Submitted 28 March, 2013; v1 submitted 21 March, 2013;
originally announced March 2013.
-
A Latent Source Model for Nonparametric Time Series Classification
Authors:
George H. Chen,
Stanislav Nikolov,
Devavrat Shah
Abstract:
For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications,…
▽ More
For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.
△ Less
Submitted 12 December, 2013; v1 submitted 14 February, 2013;
originally announced February 2013.