Skip to main content

Showing 1–50 of 89 results for author: Phillips, M

  1. arXiv:2402.05280  [pdf, ps, other

    cs.LG cs.CG

    No Dimensional Sampling Coresets for Classification

    Authors: Meysam Alishahi, Jeff M. Phillips

    Abstract: We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 47 Pages

  2. arXiv:2311.09394  [pdf, other

    cs.SE cs.PL

    GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production

    Authors: Kostya Serebryany, Chris Kennelly, Mitch Phillips, Matt Denton, Marco Elver, Alexander Potapenko, Matt Morehouse, Vlad Tsyrklevich, Christian Holler, Julian Lettner, David Kilzer, Lander Brandt

    Abstract: Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C+… ▽ More

    Submitted 13 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  3. arXiv:2311.05651  [pdf, other

    cs.CG cs.DS cs.LG

    On Mergable Coresets for Polytope Distance

    Authors: Benwei Shi, Aditya Bhaskara, Wai Ming Tai, Jeff M. Phillips

    Abstract: We show that a constant-size constant-error coreset for polytope distance is simple to maintain under merges of coresets. However, increasing the size cannot improve the error bound significantly beyond that constant.

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Presented in SoCG'19 Young Researchers Forum (CG:YRF)

    ACM Class: I.3.5

  4. arXiv:2311.03393  [pdf, other

    cs.DB cs.AI

    Sketching Multidimensional Time Series for Fast Discord Mining

    Authors: Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang, Jeff M. Phillips, Eamonn Keogh

    Abstract: Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with… ▽ More

    Submitted 7 December, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

  5. arXiv:2310.03919  [pdf, other

    cs.IR cs.AI cs.LG

    An Efficient Content-based Time Series Retrieval System

    Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang, Jeff M. Phillips

    Abstract: A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated met… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  6. arXiv:2308.07418  [pdf, other

    cs.LG stat.ML

    Locally Adaptive and Differentiable Regression

    Authors: Mingxuan Han, Varun Shankar, Jeff M Phillips, Chenglong Ye

    Abstract: Over-parameterized models like deep nets and random forests have become very popular in machine learning. However, the natural goals of continuity and differentiability, common in regression models, are now often ignored in modern overparametrized, locally-adaptive models. We propose a general framework to construct a global continuous and differentiable model based on a weighted average of locall… ▽ More

    Submitted 12 October, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Journal ref: Journal of Machine Learning for Modeling and Computing 2023

  7. arXiv:2306.16516  [pdf, ps, other

    cs.CG cs.LG

    For Kernel Range Spaces a Constant Number of Queries Are Sufficient

    Authors: Jeff M. Phillips, Hasan Pourmahmood-Aghababa

    Abstract: We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-\|p-\cdot\|^2)$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 27 pages

  8. arXiv:2306.03173  [pdf, other

    cs.LG

    Linear Distance Metric Learning with Noisy Labels

    Authors: Meysam Alishahi, Anna Little, Jeff M. Phillips

    Abstract: In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive… ▽ More

    Submitted 20 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 52 pages

  9. arXiv:2305.18774  [pdf, other

    cs.LG cs.NE

    Bayesian Decision Trees Inspired from Evolutionary Algorithms

    Authors: Efthyvoulos Drousiotis, Alexander M. Phillips, Paul G. Spirakis, Simon Maskell

    Abstract: Bayesian Decision Trees (DTs) are generally considered a more advanced and accurate model than a regular Decision Tree (DT) because they can handle complex and uncertain data. Existing work on Bayesian DTs uses Markov Chain Monte Carlo (MCMC) with an accept-reject mechanism and sample using naive proposals to proceed to the next iteration, which can be slow because of the burn-in time needed. We c… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.09090

  10. arXiv:2305.16606  [pdf, other

    cs.IR

    Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

    Authors: Tao Yang, Cuize Han, Chen Luo, Parth Gupta, Jeff M. Phillips, Qingyao Ai

    Abstract: Ranking is at the core of many artificial intelligence (AI) applications, including search engines, recommender systems, etc. Modern ranking systems are often constructed with learning-to-rank (LTR) models built from user behavior signals. While previous studies have demonstrated the effectiveness of using user behavior signals (e.g., clicks) as both features and labels of LTR algorithms, we argue… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  11. arXiv:2210.12704  [pdf, other

    cs.LG

    Batch Multi-Fidelity Active Learning with Budget Constraints

    Authors: Shibo Li, Jeff M. Phillips, Xin Yu, Robert M. Kirby, Shandian Zhe

    Abstract: Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

  12. arXiv:2209.01322  [pdf, other

    cs.CG cs.LG

    Classifying Spatial Trajectories

    Authors: Hasan Pourmahmood-Aghababa, Jeff M. Phillips

    Abstract: We provide the first comprehensive study on how to classify trajectories using only their spatial representations, measured on 5 real-world data sets. Our comparison considers 20 distinct classifiers arising either as a KNN classifier of a popular distance, or as a more general type of classifier using a vectorized representation of each trajectory. We additionally develop new methods for how to v… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: 21 pages, 15 figures

  13. arXiv:2112.10931  [pdf, other

    cs.NI

    Hiding Signal Strength Interference from Outside Adversaries

    Authors: Mingxuan Han, Jeff M. Phillips, Sneha Kumar Kasera

    Abstract: The presence of people can be detected by passively observing the signal strength of Wifi and related forms of communication. This paper tackles the question of how and when can this be prevented by adjustments to the transmitted signal strength, and other similar measures. The main contribution of this paper is a formal framework to analyze this problem, and the identification of several scenario… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: 6 pages, 2 figures

  14. arXiv:2112.01537  [pdf, other

    cs.HC cs.AI cs.LG

    Improving mathematical questioning in teacher training

    Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu, Ginger S. Watson, Laura E. Barnes, Donald E Brown

    Abstract: High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factors can be difficult to model. This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills based on the well-known Instructional Qua… ▽ More

    Submitted 6 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted to appear at the NeurIPS 2021 Human Centered AI Workshop (HCAI). Data collection process for this data is described here arXiv:2112.00985

  15. arXiv:2112.00985  [pdf, other

    cs.AI cs.HC cs.LG

    Evaluation of mathematical questioning strategies using data collected through weak supervision

    Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu, Ginger S. Watson, Laura E. Barnes, Donald E Brown

    Abstract: A large body of research demonstrates how teachers' questioning strategies can improve student learning outcomes. However, developing new scenarios is challenging because of the lack of training data for a specific scenario and the costs associated with labeling. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skil… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted to appear at the NeurIPS 2021 Workshop on Math AI for Education (MATHAI4ED)

  16. arXiv:2110.07663  [pdf, other

    math.NA cs.PF

    Tuning Spectral Element Preconditioners for Parallel Scalability on GPUs

    Authors: Malachi Phillips, Stefan Kerkemeier, Paul Fischer

    Abstract: The Poisson pressure solve resulting from the spectral element discretization of the incompressible Navier-Stokes equation requires fast, robust, and scalable preconditioning. In the current work, a parallel scaling study of Chebyshev-accelerated Schwarz and Jacobi preconditioning schemes is presented, with special focus on GPU architectures, such as OLCF's Summit. Convergence properties of the Ch… ▽ More

    Submitted 12 December, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 12 pages, 8 figures, submitted/accepted to SIAM Conference on Parallel Processing for Scientific Computing (PP22) proceedings

    MSC Class: 35-04 ACM Class: D.0; F.2; G.1.3; G.4; I.6

  17. arXiv:2109.05072  [pdf, other

    cs.DC cs.MS math.NA

    GPU Algorithms for Efficient Exascale Discretizations

    Authors: Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Ryan Bleile, Jed Brown, Jean-Sylvain Camier, Robert Carson, Noel Chalmers, Veselin Dobrev, Yohann Dudouit, Paul Fischer, Ali Karakus, Stefan Kerkemeier, Tzanio Kolev, Yu-Hsiang Lan, Elia Merzari, Misun Min, Malachi Phillips, Thilina Rathnayake, Robert Rieben, Thomas Stitt, Ananias Tomboulides, Stanimire Tomov, Vladimir Tomov, Arturo Vargas , et al. (2 additional authors not shown)

    Abstract: In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM,… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  18. arXiv:2108.12084  [pdf, other

    cs.CL cs.AI cs.LG

    Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

    Authors: Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff M Phillips, Kai-Wei Chang

    Abstract: Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understandin… ▽ More

    Submitted 10 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Journal ref: EMNLP 2021

  19. Practical and Configurable Network Traffic Classification Using Probabilistic Machine Learning

    Authors: Jiahui Chen, Joe Breen, Jeff M. Phillips, Jacobus Van der Merwe

    Abstract: Network traffic classification that is widely applicable and highly accurate is valuable for many network security and management tasks. A flexible and easily configurable classification framework is ideal, as it can be customized for use in a wide variety of networks. In this paper, we propose a highly configurable and flexible machine learning traffic classification method that relies only on st… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Published in the Springer Cluster Computing journal

  20. arXiv:2106.13851  [pdf, other

    cs.CG cs.LG

    Approximate Maximum Halfspace Discrepancy

    Authors: Michael Matheny, Jeff M. Phillips

    Abstract: Consider the geometric range space $(X, \mathcal{H}_d)$ where $X \subset \mathbb{R}^d$ and $\mathcal{H}_d$ is the set of ranges defined by $d$-dimensional halfspaces. In this setting we consider that $X$ is the disjoint union of a red and blue set. For each halfspace $h \in \mathcal{H}_d$ define a function $Φ(h)$ that measures the "difference" between the fraction of red and fraction of blue point… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  21. arXiv:2104.05829  [pdf, other

    cs.PF cs.DC

    NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver

    Authors: Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rathnayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, Tim Warburton

    Abstract: The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 14 pages, 8 figures

    MSC Class: 35-04 ACM Class: D.0; F.2; G.2; G.4; I.6

  22. arXiv:2104.02797  [pdf, other

    cs.CL cs.HC

    VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

    Authors: Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Bei Wang

    Abstract: Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this,… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 11 pages

  23. arXiv:2010.12710  [pdf, other

    cs.CL cs.CY cs.LG

    Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education

    Authors: Debajyoti Datta, Maria Phillips, Jennifer Chiu, Ginger S. Watson, James P. Bywater, Laura Barnes, Donald Brown

    Abstract: Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data s… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Preprint: Under Review

    ACM Class: I.2.7

  24. arXiv:2010.06536  [pdf, other

    cs.CV cs.AI

    Kartta Labs: Collaborative Time Travel

    Authors: Sasan Tavakkol, Feng Han, Brandon Mayer, Mark Phillips, Cyrus Shahabi, Yao-Yi Chiang, Raimondas Kiveris

    Abstract: We introduce the modular and scalable design of Kartta Labs, an open source, open data, and scalable system for virtually reconstructing cities from historical maps and photos. Kartta Labs relies on crowdsourcing and artificial intelligence consisting of two major modules: Maps and 3D models. Each module, in turn, consists of sub-modules that enable the system to reconstruct a city from historical… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  25. arXiv:2009.02947  [pdf, other

    cs.HC

    Towards a Practical Virtual Office for Mobile Knowledge Workers

    Authors: Eyal Ofek, Jens Grubert, Michel Pahud, Mark Phillips, Per Ola Kristensson

    Abstract: As more people work from home or during travel, new opportunities and challenges arise around mobile office work. On one hand, people may work at flexible hours, independent of traffic limitations, but on the other hand, they may need to work at makeshift spaces, with less than optimal working conditions and decoupled from co-workers. Virtual Reality (VR) has the potential to change the way inform… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

    Comments: https://www.microsoft.com/en-us/research/event/new-future-of-work/#!publications

    ACM Class: H.5.2

    Journal ref: Microsoft New Future of Work 2020 Symposium

  26. arXiv:2009.00611  [pdf, other

    cs.IR cs.CL cs.DL cs.LG

    Identifying Documents In-Scope of a Collection from Web Archives

    Authors: Krutarth Patel, Cornelia Caragea, Mark Phillips, Nathaniel Fox

    Abstract: Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e.g., scientific digital libraries and repositories of technical reports. In doing so, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the huge number of documents collected by web archiving inst… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: 10 pages

    Journal ref: In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020)

  27. arXiv:2007.15924  [pdf, other

    cs.CG

    Orientation-Preserving Vectorized Distance Between Curves

    Authors: Jeff M. Phillips, Hasan Pourmahmood-Aghababa

    Abstract: We introduce an orientation-preserving landmark-based distance for continuous curves, which can be viewed as an alternative to the \Frechet or Dynamic Time Warping distances. This measure retains many of the properties of those measures, and we prove some relations, but can be interpreted as a Euclidean distance in a particular vector space. Hence it is significantly easier to use, faster for gene… ▽ More

    Submitted 24 May, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

    Comments: 23 pages, 10 figures, 1 table, accepted for MSML21, this paper is in Primary: CG (Computational Geometry) and Secondary: LG (Machine Learning) categories

    ACM Class: I.3.5

  28. arXiv:2007.00049  [pdf, other

    cs.CL cs.AI cs.LG

    OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

    Authors: Sunipa Dev, Tao Li, Jeff M Phillips, Vivek Srikumar

    Abstract: Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they not only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention t… ▽ More

    Submitted 10 September, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

    Journal ref: EMNLP 2021

  29. arXiv:2002.02013  [pdf, other

    cs.LG cs.DS stat.ML

    A Deterministic Streaming Sketch for Ridge Regression

    Authors: Benwei Shi, Jeff M. Phillips

    Abstract: We provide a deterministic space-efficient algorithm for estimating ridge regression. For $n$ data points with $d$ features and a large enough regularization parameter, we provide a solution within $\varepsilon$ L$_2$ error using only $O(d/\varepsilon)$ space. This is the first $o(d^2)$ space deterministic streaming algorithm with guaranteed solution error and risk bound for this classic problem.… ▽ More

    Submitted 10 March, 2021; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: Fix a few typos. To be published in AISTATS 2021

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:586-594, 2021

  30. arXiv:1912.07673  [pdf, ps, other

    cs.DS cs.CG

    Finding the Mode of a Kernel Density Estimate

    Authors: Jasper C. H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, Wai Ming Tai

    Abstract: Given points $p_1, \dots, p_n$ in $\mathbb{R}^d$, how do we find a point $x$ which maximizes $\frac{1}{n} \sum_{i=1}^n e^{-\|p_i - x\|^2}$? In other words, how do we find the maximizing point, or mode of a Gaussian kernel density estimation (KDE) centered at $p_1, \dots, p_n$? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding p… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

  31. arXiv:1910.05862  [pdf, other

    cs.LG stat.ML

    Constrained Non-Affine Alignment of Embeddings

    Authors: Yuwei Wang, Yan Zheng, Yanqing Peng, Chin-Chia Michael Yeh, Zhongfang Zhuang, Das Mahashweta, Bendre Mangesh, Feifei Li, Wei Zhang, Jeff M. Phillips

    Abstract: Embeddings are one of the fundamental building blocks for data analysis tasks. Embeddings are already essential tools for large language models and image analysis, and their use is being extended to many other research domains. The generation of these distributed representations is often a data- and computation-expensive process; yet the holistic analysis and adjustment of them after they have bee… ▽ More

    Submitted 19 November, 2021; v1 submitted 13 October, 2019; originally announced October 2019.

  32. arXiv:1907.02171  [pdf, other

    cs.CG

    Sketched MinDist

    Authors: Jeff M. Phillips, Pingfan Tang

    Abstract: We consider sketch vectors of geometric objects $J$ through the \mindist function \[ v_i(J) = \inf_{p \in J} \|p-q_i\| \] for $q_i \in Q$ from a point set $Q$. Collecting the vector of these sketch values induces a simple, effective, and powerful distance: the Euclidean distance between these sketched vectors. This paper shows how large this set $Q$ needs to be under a variety of shapes and scenar… ▽ More

    Submitted 7 July, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

  33. arXiv:1906.09381  [pdf, other

    stat.ML cs.LG

    The Kernel Spatial Scan Statistic

    Authors: Mingxuan Han, Michael Matheny, Jeff M. Phillips

    Abstract: Kulldorff's (1997) seminal paper on spatial scan statistics (SSS) has led to many methods considering different regions of interest, different statistical models, and different approximations while also having numerous applications in epidemiology, environmental monitoring, and homeland security. SSS provides a way to rigorously test for the existence of an anomaly and provide statistical guarante… ▽ More

    Submitted 9 August, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: 13 pages, 13 figures

  34. arXiv:1906.04261  [pdf, other

    cs.SI cs.LG

    Examining Untempered Social Media: Analyzing Cascades of Polarized Conversations

    Authors: Arunkumar Bagavathi, Pedram Bashiri, Shannon Reid, Matthew Phillips, Siddharth Krishnan

    Abstract: Online social media, periodically serves as a platform for cascading polarizing topics of conversation. The inherent community structure present in online social networks (homophily) and the advent of fringe outlets like Gab have created online "echo chambers" that amplify the effects of polarization, which fuels detrimental behavior. Recently, in October 2018, Gab made headlines when it was revea… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  35. arXiv:1906.01693  [pdf, other

    cs.DS

    Scalable Spatial Scan Statistics for Trajectories

    Authors: Michael Matheny, Dong Xie, Jeff M. Phillips

    Abstract: We define several new models for how to define anomalous regions among enormous sets of trajectories. These are based on spatial scan statistics, and identify a geometric region which captures a subset of trajectories which are significantly different in a measured characteristic from the background population. The model definition depends on how much a geometric region is contributed to by some o… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

  36. arXiv:1903.08014  [pdf, other

    cs.DS cs.CG

    Independent Range Sampling, Revisited Again

    Authors: Peyman Afshani, Jeff M. Phillips

    Abstract: We revisit the range sampling problem: the input is a set of points where each point is associated with a real-valued weight. The goal is to store them in a structure such that given a query range and an integer $k$, we can extract $k$ independent random samples from the points inside the query range, where the probability of sampling a point is proportional to its weight. This line of work was… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

  37. arXiv:1903.03211  [pdf, other

    cs.CG

    The VC Dimension of Metric Balls under Fréchet and Hausdorff Distances

    Authors: Anne Driemel, André Nusser, Jeff M. Phillips, Ioannis Psarros

    Abstract: The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set $X$ is a set of polygonal curves in $\mathbb{R}^d$ and the sets… ▽ More

    Submitted 15 November, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: 24 pages, 5 figures

  38. arXiv:1811.04136  [pdf, ps, other

    cs.LG cs.CG stat.ML

    The GaussianSketch for Almost Relative Error Kernel Distance

    Authors: Jeff M. Phillips, Wai Ming Tai

    Abstract: We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance betwee… ▽ More

    Submitted 19 June, 2020; v1 submitted 9 November, 2018; originally announced November 2018.

  39. Improved Bounds on Information Dissemination by Manhattan Random Waypoint Model

    Authors: Aria Rezaei, Jie Gao, Jeff M. Phillips, Csaba D. Tóth

    Abstract: With the popularity of portable wireless devices it is important to model and predict how information or contagions spread by natural human mobility -- for understanding the spreading of deadly infectious diseases and for improving delay tolerant communication schemes. Formally, we model this problem by considering $M$ moving agents, where each agent initially carries a \emph{distinct} bit of info… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 10 pages, ACM SIGSPATIAL 2018, Seattle, US

    Journal ref: 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL 18), 2018, Seattle, WA, USA

  40. arXiv:1806.01330  [pdf, other

    cs.CL stat.ML

    Closed Form Word Embedding Alignment

    Authors: Sunipa Dev, Safia Hassan, Jeff M. Phillips

    Abstract: We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same… ▽ More

    Submitted 17 November, 2020; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted ICDM 2019 and KAIS Special Issue

  41. arXiv:1804.11307  [pdf, other

    cs.CG

    Practical Low-Dimensional Halfspace Range Space Sampling

    Authors: Michael Matheny, Jeff M. Phillips

    Abstract: We develop, analyze, implement, and compare new algorithms for creating $\varepsilon$-samples of range spaces defined by halfspaces which have size sub-quadratic in $1/\varepsilon$, and have runtime linear in the input size and near-quadratic in $1/\varepsilon$. The key to our solution is an efficient construction of partition trees. Despite not requiring any techniques developed after the early 1… ▽ More

    Submitted 17 July, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

  42. arXiv:1804.11287  [pdf, other

    cs.CG

    Computing Approximate Statistical Discrepancy

    Authors: Michael Matheny, Jeff M. Phillips

    Abstract: Consider a geometric range space $(X,cA)$ where each data point $x \in X$ has two or more values (say $r(x)$ and $b(x)$). Also consider a function $Φ(A)$ defined on any subset $A \in (X,cA)$ on the sum of values in that range e.g., $r_A = \sum_{x \in A} r(x)$ and $b_A = \sum_{x \in A} b(x)$. The $Φ$-maximum range is $A^* = \arg \max_{A \in (X,cA)} Φ(A)$. Our goal is to find some $\hat{A}$ such tha… ▽ More

    Submitted 27 September, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

  43. arXiv:1804.11284  [pdf, other

    cs.CG cs.LG

    Simple Distances for Trajectories via Landmarks

    Authors: Jeff M. Phillips, Pingfan Tang

    Abstract: We develop a new class of distances for objects including lines, hyperplanes, and trajectories, based on the distance to a set of landmarks. These distances easily and interpretably map objects to a Euclidean space, are simple to compute, and perform well in data analysis tasks. For trajectories, they match and in some cases significantly out-perform all state-of-the-art other metrics, can effortl… ▽ More

    Submitted 11 June, 2019; v1 submitted 30 April, 2018; originally announced April 2018.

    ACM Class: I.3.5; G.3

  44. arXiv:1803.08884  [pdf, other

    cs.NE cs.AI cs.GT cs.MA q-bio.PE

    Inequity aversion improves cooperation in intertemporal social dilemmas

    Authors: Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel

    Abstract: Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However… ▽ More

    Submitted 27 September, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: 15 pages, 8 figures

  45. arXiv:1802.01751  [pdf, other

    cs.LG cs.CG stat.ML

    Near-Optimal Coresets of Kernel Density Estimates

    Authors: Jeff M. Phillips, Wai Ming Tai

    Abstract: We construct near-optimal coresets for kernel density estimates for points in $\mathbb{R}^d$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(\sqrt{d}/\varepsilon\cdot \sqrt{\log 1/\varepsilon} )$, and we show a near-matching lower bound of size $Ω(\min\{\sqrt{d}/\varepsilon, 1/\varepsilon^2\})$. When $d\geq 1/\varepsilon^2$, it is… ▽ More

    Submitted 11 April, 2019; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: This paper is combined with arXiv:1710.04325

  46. arXiv:1710.06925  [pdf, other

    cs.HC eess.SP

    Visualizing Sensor Network Coverage with Location Uncertainty

    Authors: Tim Sodergren, Jessica Hair, Jeff M. Phillips, Bei Wang

    Abstract: We present an interactive visualization system for exploring the coverage in sensor networks with uncertain sensor locations. We consider a simple case of uncertainty where the location of each sensor is confined to a discrete number of points sampled uniformly at random from a region with a fixed radius. Employing techniques from topological data analysis, we model and visualize network coverage… ▽ More

    Submitted 18 October, 2017; originally announced October 2017.

  47. arXiv:1710.04325  [pdf, other

    cs.LG cs.CG stat.ML

    Improved Coresets for Kernel Density Estimates

    Authors: Jeff M. Phillips, Wai Ming Tai

    Abstract: We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic kernels (including Gaussian and Laplace kernels), our approximation preserves the $L_\infty$ error between kernel density estimates within error $ε$, w… ▽ More

    Submitted 11 October, 2017; originally announced October 2017.

  48. arXiv:1709.04453  [pdf, other

    cs.HC cs.CG

    Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

    Authors: Yan Zheng, Yi Ou, Alexander Lex, Jeff M. Phillips

    Abstract: The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampling… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

  49. arXiv:1702.03644  [pdf, other

    cs.LG cs.DS

    Coresets for Kernel Regression

    Authors: Yan Zheng, Jeff M. Phillips

    Abstract: Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data. However, the central operation which is performed many times, evaluating a kernel on the data set, takes linear time. This is impractical for modern large data sets. In this paper we describe coresets for kernel regression: compressed data sets which ca… ▽ More

    Submitted 31 May, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

    Comments: 11 pages, 20 figures

  50. arXiv:1701.07500  [pdf, other

    cs.DC

    Scalable Architecture for Anomaly Detection and Visualization in Power Generating Assets

    Authors: Paras Jain, Chirag Tailor, Sam Ford, Liexiao Ding, Michael Phillips, Fang Liu, Nagi Gebraeel, Duen Horng Chau

    Abstract: Power-generating assets (e.g., jet engines, gas turbines) are often instrumented with tens to hundreds of sensors for monitoring physical and performance degradation. Anomaly detection algorithms highlight deviations from predetermined benchmarks with the goal of detecting incipient faults. We are developing an integrated system to address three key challenges within analyzing sensor data from pow… ▽ More

    Submitted 25 January, 2017; originally announced January 2017.