Skip to main content

Showing 1–50 of 106 results for author: Chawla, S

  1. arXiv:2405.17130  [pdf, other

    cs.LG cs.CL

    Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training

    Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla

    Abstract: Despite being a heavily researched topic, Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons: (i) the gained robustness is frequently accompanied by a drop in generalization and (ii) generating adversarial examples (AEs) is computationally prohibitively expensive. To address these limitations, we propose SMAAT, a new AT algorithm that leverages t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.00987  [pdf, other

    cs.LG

    S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

    Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla

    Abstract: Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open probl… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at ICLR 2024

  3. arXiv:2404.14679  [pdf, ps, other

    cs.GT

    A Multi-Dimensional Online Contention Resolution Scheme for Revenue Maximization

    Authors: Shuchi Chawla, Dimitris Christou, Trung Dang, Zhiyi Huang, Gregory Kehne, Rojin Rezvan

    Abstract: We study multi-buyer multi-item sequential item pricing mechanisms for revenue maximization with the goal of approximating a natural fractional relaxation -- the ex ante optimal revenue. We assume that buyers' values are subadditive but make no assumptions on the value distributions. While the optimal revenue, and therefore also the ex ante benchmark, is inapproximable by any simple mechanism in t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 39 pages

  4. arXiv:2404.05219  [pdf, other

    cs.LG

    Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey

    Authors: Naveen Karunanayake, Ravin Gunawardena, Suranga Seneviratne, Sanjay Chawla

    Abstract: Deep neural networks (DNNs) deployed in real-world applications can encounter out-of-distribution (OOD) data and adversarial examples. These represent distinct forms of distributional shifts that can significantly impact DNNs' reliability and robustness. Traditionally, research has addressed OOD detection and adversarial robustness as separate challenges. This survey focuses on the intersection of… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  5. arXiv:2402.08789  [pdf, other

    eess.AS cs.AI cs.LG q-bio.QM

    Leveraging cough sounds to optimize chest x-ray usage in low-resource settings

    Authors: Alexander Philip, Sanya Chawla, Lola Jover, George P. Kafentzis, Joe Brew, Vishakh Saraf, Shibu Vijayan, Peter Small, Carlos Chaccour

    Abstract: Chest X-ray is a commonly used tool during triage, diagnosis and management of respiratory diseases. In resource-constricted settings, optimizing this resource can lead to valuable cost savings for the health care system and the patients as well as to and improvement in consult time. We used prospectively-collected data from 137 patients referred for chest X-ray at the Christian Medical Center and… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  6. arXiv:2402.07483  [pdf, other

    cs.AI cs.CL

    T-RAG: Lessons from the LLM Trenches

    Authors: Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla

    Abstract: Large Language Models (LLM) have shown remarkable language capabilities fueling attempts to integrate them into applications across a wide range of domains. An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, limited computational resources and the need f… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Added Needle in a Haystack analysis for T-RAG

  7. arXiv:2311.14754  [pdf, other

    cs.LG

    ExCeL : Combined Extreme and Collective Logit Information for Enhancing Out-of-Distribution Detection

    Authors: Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla

    Abstract: Deep learning models often exhibit overconfidence in predicting out-of-distribution (OOD) data, underscoring the crucial role of OOD detection in ensuring reliability in predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity, primarily due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  8. arXiv:2307.05717  [pdf, other

    cs.OH

    Towards Mobility Data Science (Vision Paper)

    Authors: Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento , et al. (23 additional authors not shown)

    Abstract: Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences… ▽ More

    Submitted 7 March, 2024; v1 submitted 21 June, 2023; originally announced July 2023.

    Comments: Updated to reflect the major revision for ACM Transactions on Spatial Algorithms and Systems (TSAS). This version reflects the final version accepted by ACM TSAS

  9. arXiv:2306.11604  [pdf, ps, other

    cs.DS

    Composition of nested embeddings with an application to outlier removal

    Authors: Shuchi Chawla, Kristin Sheridan

    Abstract: We study the design of embeddings into Euclidean space with outliers. Given a metric space $(X,d)$ and an integer $k$, the goal is to embed all but $k$ points in $X$ (called the ``outliers") into $\ell_2$ with the smallest possible distortion $c$. Finding the optimal distortion $c$ for a given outlier set size $k$, or alternately the smallest $k$ for a given target distortion $c$ are both NP-hard… ▽ More

    Submitted 6 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 28 pages (including 2 appendices), 5 figures

  10. arXiv:2304.01958  [pdf, other

    cs.DS

    Online Time-Windows TSP with Predictions

    Authors: Shuchi Chawla, Dimitris Christou

    Abstract: In the Time-Windows TSP (TW-TSP) we are given requests at different locations on a network; each request is endowed with a reward and an interval of time; the goal is to find a tour that visits as much reward as possible during the corresponding time window. For the online version of this problem, where each request is revealed at the start of its time window, no finite competitive ratio can be ob… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 31 pages, 1 figure

  11. arXiv:2211.16316  [pdf, other

    cs.LG

    A3T: Accuracy Aware Adversarial Training

    Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Sanjay Chawla

    Abstract: Adversarial training has been empirically shown to be more prone to overfitting than standard training. The exact underlying reasons still need to be fully understood. In this paper, we identify one cause of overfitting related to current practices of generating adversarial samples from misclassified samples. To address this, we propose an alternative approach that leverages the misclassified samp… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  12. arXiv:2211.10873  [pdf, other

    cs.LG cs.AI hep-ph

    Interpretable Scientific Discovery with Symbolic Regression: A Review

    Authors: Nour Makke, Sanjay Chawla

    Abstract: Symbolic regression is emerging as a promising machine learning method for learning succinct underlying interpretable mathematical expressions directly from data. Whereas it has been traditionally tackled with genetic programming, it has recently gained a growing interest in deep learning as a data-driven model discovery method, achieving significant advances in various application domains ranging… ▽ More

    Submitted 2 May, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

  13. arXiv:2211.05523  [pdf, other

    cs.CL cs.AI

    Impact of Adversarial Training on Robustness and Generalizability of Language Models

    Authors: Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud, Sanjay Chawla

    Abstract: Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the e… ▽ More

    Submitted 10 December, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

  14. arXiv:2210.01797  [pdf, other

    cs.LG cs.AI cs.IR

    Ten Years after ImageNet: A 360° Perspective on AI

    Authors: Sanjay Chawla, Preslav Nakov, Ahmed Ali, Wendy Hall, Issa Khalil, Xiaosong Ma, Husrev Taha Sencar, Ingmar Weber, Michael Wooldridge, Ting Yu

    Abstract: It is ten years since neural networks made their spectacular comeback. Prompted by this anniversary, we take a holistic perspective on Artificial Intelligence (AI). Supervised Learning for cognitive tasks is effectively solved - provided we have enough high-quality labeled data. However, deep neural network models are not easily interpretable, and thus the debate between blackbox and whitebox mode… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

  15. arXiv:2204.04136  [pdf, ps, other

    cs.GT

    Individually-Fair Auctions for Multi-Slot Sponsored Search

    Authors: Shuchi Chawla, Rojin Rezvan, Nathaniel Sauerberg

    Abstract: We design fair sponsored search auctions that achieve a near-optimal tradeoff between fairness and quality. Our work builds upon the model and auction design of Chawla and Jagadeesan \cite{CJ22}, who considered the special case of a single slot. We consider sponsored search settings with multiple slots and the standard model of click through rates that are multiplicatively separable into an advert… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  16. arXiv:2204.01962  [pdf, ps, other

    cs.GT

    Buy-Many Mechanisms for Many Unit-Demand Buyers

    Authors: Shuchi Chawla, Rojin Rezvan, Yifeng Teng, Christos Tzamos

    Abstract: A recent line of research has established a novel desideratum for designing approximately-revenue-optimal multi-item mechanisms, namely the buy-many constraint. Under this constraint, prices for different allocations made by the mechanism must be subadditive, implying that the price of a bundle cannot exceed the sum of prices of individual items it contains. This natural constraint has enabled sev… ▽ More

    Submitted 16 May, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

  17. arXiv:2203.17259  [pdf, other

    cs.DL stat.AP

    To ArXiv or not to ArXiv: A Study Quantifying Pros and Cons of Posting Preprints Online

    Authors: Charvi Rastogi, Ivan Stelmakh, Xinwei Shen, Marina Meila, Federico Echenique, Shuchi Chawla, Nihar B. Shah

    Abstract: Double-blind conferences have engaged in debates over whether to allow authors to post their papers online on arXiv or elsewhere during the review process. Independently, some authors of research papers face the dilemma of whether to put their papers on arXiv due to its pros and cons. We conduct a study to substantiate this debate and dilemma via quantitative measurements. Specifically, we conduct… ▽ More

    Submitted 11 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: 17 pages, 3 figures

  18. Cite-seeing and Reviewing: A Study on Citation Bias in Peer Review

    Authors: Ivan Stelmakh, Charvi Rastogi, Ryan Liu, Shuchi Chawla, Federico Echenique, Nihar B. Shah

    Abstract: Citations play an important role in researchers' careers as a key factor in evaluation of scientific impact. Many anecdotes advice authors to exploit this fact and cite prospective reviewers to try obtaining a more positive evaluation for their submission. In this work, we investigate if such a citation bias actually exists: Does the citation of a reviewer's own work in a submission cause them to… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: 19 pages, 3 figures

  19. arXiv:2201.02381  [pdf, other

    cs.AI cs.LG

    Offline Reinforcement Learning for Road Traffic Control

    Authors: Mayuresh Kunjir, Sanjay Chawla

    Abstract: Traffic signal control is an important problem in urban mobility with a significant potential of economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic signal control, the work so far has focussed on learning through simulations which could lead to inaccuracies due to simplifying assumptions. Instead, real experience data on traffic is avail… ▽ More

    Submitted 11 December, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: 30 pages

    ACM Class: I.2.1

  20. Attack of the Knights: A Non Uniform Cache Side-Channel Attack

    Authors: Farabi Mahmud, Sungkeun Kim, Harpreet Singh Chawla, Chia-Che Tsai, Eun Jung Kim, Abdullah Muzahid

    Abstract: For a distributed last-level cache (LLC) in a large multicore chip, the access time to one LLC bank can significantly differ from that to another due to the difference in physical distance. In this paper, we successfully demonstrated a new distance-based side-channel attack by timing the AES decryption operation and extracting part of an AES secret key on an Intel Knights Landing CPU. We introduce… ▽ More

    Submitted 31 May, 2023; v1 submitted 18 December, 2021; originally announced December 2021.

    Journal ref: Annual Computer Security Applications Conference ACSAC 2023

  21. Updating Street Maps using Changes Detected in Satellite Imagery

    Authors: Favyen Bastani, Songtao He, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

    Abstract: Accurately maintaining digital street maps is labor-intensive. To address this challenge, much work has studied automatically processing geospatial data sources such as GPS trajectories and satellite images to reduce the cost of maintaining digital maps. An end-to-end map update system would first process geospatial data sources to extract insights, and second leverage those insights to update and… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: SIGSPATIAL 2021

  22. arXiv:2108.12976  [pdf, ps, other

    cs.DS cs.LG

    Approximating Pandora's Box with Correlations

    Authors: Shuchi Chawla, Evangelia Gergatsouli, Jeremy McMahan, Christos Tzamos

    Abstract: We revisit the classic Pandora's Box (PB) problem under correlated distributions on the box values. Recent work of arXiv:1911.01632 obtained constant approximate algorithms for a restricted class of policies for the problem that visit boxes in a fixed order. In this work, we study the complexity of approximating the optimal policy which may adaptively choose which box to visit next based on the va… ▽ More

    Submitted 21 July, 2023; v1 submitted 29 August, 2021; originally announced August 2021.

  23. arXiv:2107.02846  [pdf

    cs.CY

    Visions in Theoretical Computer Science: A Report on the TCS Visioning Workshop 2020

    Authors: Shuchi Chawla, Jelani Nelson, Chris Umans, David Woodruff

    Abstract: Theoretical computer science (TCS) is a subdiscipline of computer science that studies the mathematical foundations of computational and algorithmic processes and interactions. Work in this field is often recognized by its emphasis on mathematical technique and rigor. At the heart of the field are questions surrounding the nature of computation: What does it mean to compute? What is computable? An… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: A Computing Community Consortium (CCC) workshop report, 36 pages

    Report number: ccc2021report_2

  24. arXiv:2106.04704  [pdf, ps, other

    cs.GT cs.DS

    Pricing Ordered Items

    Authors: Shuchi Chawla, Rojin Rezvan, Yifeng Teng, Christos Tzamos

    Abstract: We study the revenue guarantees and approximability of item pricing. Recent work shows that with $n$ heterogeneous items, item-pricing guarantees an $O(\log n)$ approximation to the optimal revenue achievable by any (buy-many) mechanism, even when buyers have arbitrarily combinatorial valuations. However, finding good item prices is challenging -- it is known that even under unit-demand valuations… ▽ More

    Submitted 4 November, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  25. arXiv:2104.01063  [pdf, other

    cs.AI cs.LG

    Permutation-Invariant Subgraph Discovery

    Authors: Raghvendra Mall, Shameem A. Parambath, Han Yufei, Ting Yu, Sanjay Chawla

    Abstract: We introduce Permutation and Structured Perturbation Inference (PSPI), a new problem formulation that abstracts many graph matching tasks that arise in systems biology. PSPI can be viewed as a robust formulation of the permutation inference or graph matching, where the objective is to find a permutation between two graphs under the assumption that a set of edges may have undergone a perturbation d… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: 8 pages, 4 Figures, 2 Tables

  26. arXiv:2012.12394  [pdf, other

    cs.LG

    Probabilistic Outlier Detection and Generation

    Authors: Stefano Giovanni Rizzo, Linsey Pang, Yixian Chen, Sanjay Chawla

    Abstract: A new method for outlier detection and generation is introduced by lifting data into the space of probability distributions which are not analytically expressible, but from which samples can be drawn using a neural generator. Given a mixture of unknown latent inlier and outlier distributions, a Wasserstein double autoencoder is used to both detect and generate inliers and outliers. The proposed me… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  27. arXiv:2011.09406  [pdf, ps, other

    cs.DS cs.GT

    Non-Adaptive Matroid Prophet Inequalities

    Authors: Shuchi Chawla, Kira Goldner, Anna R. Karlin, J. Benjamin Miller

    Abstract: We investigate non-adaptive algorithms for matroid prophet inequalities. Matroid prophet inequalities have been considered resolved since 2012 when [KW12] introduced thresholds that guarantee a tight 2-approximation to the prophet; however, this algorithm is adaptive. Other approaches of [CHMS10] and [FSZ16] have used non-adaptive thresholds with a feasibility restriction; however, this translates… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  28. QarSUMO: A Parallel, Congestion-optimized Traffic Simulator

    Authors: Hao Chen, Ke Yang, Stefano Giovanni Rizzo, Giovanna Vantini, Phillip Taylor, Xiaosong Ma, Sanjay Chawla

    Abstract: Traffic simulators are important tools for tasks such as urban planning and transportation management. Microscopic simulators allow per-vehicle movement simulation, but require longer simulation time. The simulation overhead is exacerbated when there is traffic congestion and most vehicles move slowly. This in particular hurts the productivity of emerging urban computing studies based on reinforce… ▽ More

    Submitted 21 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Fix a typo in Figure 9

    ACM Class: C.1.4; H.4.0

  29. arXiv:2009.08100  [pdf, other

    cs.SI cs.CL

    How-to Present News on Social Media: A Causal Analysis of Editing News Headlines for Boosting User Engagement

    Authors: Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla

    Abstract: To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, the research community does not own a sufficient understanding of what kinds of editing strategies effectively promote audience engagement. In this study… ▽ More

    Submitted 21 April, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: ICWSM'21 full paper

  30. arXiv:2008.02467  [pdf

    cs.LG stat.ML

    Unravelling the Architecture of Membrane Proteins with Conditional Random Fields

    Authors: Lior Lukov, Sanjay Chawla, Wei Liu, Brett Church, Gaurav Pandey

    Abstract: In this paper, we will show that the recently introduced graphical model: Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior. More specifically, we will apply the CRF model to an important classification problem in protein science, namely the secondary structure predic… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: See the originally compiled PDF of this paper at: https://drive.google.com/file/d/1IYF52Wk8m96KIlrQHUVtEBdm0Kw3M40c

  31. arXiv:2007.09547  [pdf, other

    cs.CV

    Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

    Authors: Songtao He, Favyen Bastani, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Mohamed M. Elshrif, Samuel Madden, Amin Sadeghi

    Abstract: Inferring road graphs from satellite imagery is a challenging computer vision task. Prior solutions fall into two categories: (1) pixel-wise segmentation-based approaches, which predict whether each pixel is on a road, and (2) graph-based approaches, which predict the road graph iteratively. We find that these two approaches have complementary strengths while suffering from their own inherent limi… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  32. arXiv:2007.07990  [pdf, other

    cs.GT cs.DS

    Static pricing for multi-unit prophet inequalities

    Authors: Shuchi Chawla, Nikhil Devanur, Thodoris Lykouris

    Abstract: We study a pricing problem where a seller has $k$ identical copies of a product, buyers arrive sequentially, and the seller prices the items aiming to maximize social welfare. When $k=1$, this is the so called "prophet inequality" problem for which there is a simple pricing scheme achieving a competitive ratio of $1/2$. On the other end of the spectrum, as $k$ goes to infinity, the asymptotic perf… ▽ More

    Submitted 20 June, 2023; v1 submitted 15 July, 2020; originally announced July 2020.

  33. arXiv:2003.13966  [pdf, other

    cs.GT cs.LG

    Individual Fairness in Advertising Auctions through Inverse Proportionality

    Authors: Shuchi Chawla, Meena Jagadeesan

    Abstract: Recent empirical work demonstrates that online advertisement can exhibit bias in the delivery of ads across users even when all advertisers bid in a non-discriminatory manner. We study the design of ad auctions that, given fair bids, are guaranteed to produce fair outcomes. Following the works of Dwork and Ilvento (2019) and Chawla et al. (2020), our goal is to design a truthful auction that satis… ▽ More

    Submitted 30 November, 2021; v1 submitted 31 March, 2020; originally announced March 2020.

    Comments: To appear at ITCS 2022; this is the full version

  34. arXiv:2003.10636  [pdf, ps, other

    cs.GT

    Menu-size Complexity and Revenue Continuity of Buy-many Mechanisms

    Authors: Shuchi Chawla, Yifeng Teng, Christos Tzamos

    Abstract: We study the multi-item mechanism design problem where a monopolist sells $n$ heterogeneous items to a single buyer. We focus on buy-many mechanisms, a natural class of mechanisms frequently used in practice. The buy-many property allows the buyer to interact with the mechanism multiple times instead of once as in the more commonly studied buy-one mechanisms. This imposes additional incentive cons… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

  35. arXiv:1912.12408  [pdf, other

    cs.CV cs.LG

    RoadTagger: Robust Road Attribute Inference with Graph Neural Networks

    Authors: Songtao He, Favyen Bastani, Satvat Jagwani, Edward Park, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Samuel Madden, Mohammad Amin Sadeghi

    Abstract: Inferring road attributes such as lane count and road type from satellite imagery is challenging. Often, due to the occlusion in satellite imagery and the spatial correlation of road attributes, a road attribute at one position on a road may only be apparent when considering far-away segments of the road. Thus, to robustly infer road attributes, the model must integrate scattered information and c… ▽ More

    Submitted 28 December, 2019; originally announced December 2019.

  36. arXiv:1911.01632  [pdf, ps, other

    cs.DS

    Pandora's Box with Correlations: Learning and Approximation

    Authors: Shuchi Chawla, Evangelia Gergatsouli, Yifeng Teng, Christos Tzamos, Ruimin Zhang

    Abstract: The Pandora's Box problem and its extensions capture optimization problems with stochastic input where the algorithm can obtain instantiations of input random variables at some cost. To our knowledge, all previous work on this class of problems assumes that different random variables in the input are distributed independently. As such it does not capture many real-world settings. In this paper, we… ▽ More

    Submitted 16 April, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

  37. arXiv:1910.04869  [pdf, other

    cs.CV

    Inferring and Improving Street Maps with Data-Driven Automation

    Authors: Favyen Bastani, Songtao He, Satvat Jagwani, Edward Park, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

    Abstract: Street maps are a crucial data source that help to inform a wide range of decisions, from navigating a city to disaster relief and urban planning. However, in many parts of the world, street maps are incomplete or lag behind new construction. Editing maps today involves a tedious process of manually tracing and annotating roads, buildings, and other map features. Over the past decade, many autom… ▽ More

    Submitted 6 November, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

  38. arXiv:1909.00845  [pdf, other

    cs.DB

    Revenue Maximization for Query Pricing

    Authors: Shuchi Chawla, Shaleen Deep, Paraschos Koutris, Yifeng Teng

    Abstract: Buying and selling of data online has increased substantially over the last few years. Several frameworks have already been proposed that study query pricing in theory and practice. The key guiding principle in these works is the notion of {\em arbitrage-freeness} where the broker can set different prices for different queries made to the dataset, but must ensure that the pricing function does not… ▽ More

    Submitted 9 September, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

    Comments: To appear in PVLDB; version 2 with some cosmetic changes

  39. arXiv:1907.01484  [pdf, other

    cs.DC

    Themis: Fair and Efficient GPU Cluster Scheduling

    Authors: Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla

    Abstract: Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs h… ▽ More

    Submitted 29 October, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

  40. arXiv:1906.08732  [pdf, other

    cs.GT cs.LG

    Multi-Category Fairness in Sponsored Search Auctions

    Authors: Shuchi Chawla, Christina Ilvento, Meena Jagadeesan

    Abstract: Fairness in advertising is a topic of particular concern motivated by theoretical and empirical observations in both the computer science and economics literature. We examine the problem of fairness in advertising for general purpose platforms that service advertisers from many different categories. First, we propose inter-category and intra-category fairness desiderata that take inspiration from… ▽ More

    Submitted 29 August, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: Updated version with revised and expanded content

  41. Machine-Assisted Map Editing

    Authors: Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden

    Abstract: Mapping road networks today is labor-intensive. As a result, road maps have poor coverage outside urban centers in many countries. Systems to automatically infer road network graphs from aerial imagery and GPS trajectories have been proposed to improve coverage of road maps. However, because of high error rates, these systems have not been adopted by mapping communities. We propose machine-assiste… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Journal ref: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pg 23-32, 2018

  42. arXiv:1905.09130  [pdf, other

    cs.AI cs.LG

    AI-CARGO: A Data-Driven Air-Cargo Revenue Management System

    Authors: Stefano Giovanni Rizzo, Ji Lucas, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Sanjay Chawla

    Abstract: We propose AI-CARGO, a revenue management system for air-cargo that combines machine learning prediction with decision-making using mathematical optimization methods. AI-CARGO addresses a problem that is unique to the air-cargo business, namely the wide discrepancy between the quantity (weight or volume) that a shipper will book and the actual received amount at departure time by the airline. The… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

    Comments: 9 pages, 8 figures

  43. arXiv:1904.05325  [pdf, other

    cs.SI cs.LG stat.ML

    Risk Aware Ranking for Top-$k$ Recommendations

    Authors: Shameem A Puthiya Parambath, Nishant Vijayakumar, Sanjay Chawla

    Abstract: Given an incomplete ratings data over a set of users and items, the preference completion problem aims to estimate a personalized total preference order over a subset of the items. In practical settings, a ranked list of top-$k$ items from the estimated preference order is recommended to the end user in the decreasing order of preference for final consumption. We analyze this model and observe tha… ▽ More

    Submitted 12 April, 2019; v1 submitted 17 March, 2019; originally announced April 2019.

  44. arXiv:1902.10315  [pdf, other

    cs.GT

    Buy-many mechanisms are not much better than item pricing

    Authors: Shuchi Chawla, Yifeng Teng, Christos Tzamos

    Abstract: Multi-item mechanisms can be very complex offering many different bundles to the buyer that could even be randomized. Such complexity is thought to be necessary as the revenue gaps between randomized and deterministic mechanisms, or deterministic and simple mechanisms are huge even for additive valuations. We challenge this conventional belief by showing that these large gaps can only happen in… ▽ More

    Submitted 1 July, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

  45. arXiv:1901.03407  [pdf, other

    cs.LG stat.ML

    Deep Learning for Anomaly Detection: A Survey

    Authors: Raghavendra Chalapathy, Sanjay Chawla

    Abstract: Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their ef… ▽ More

    Submitted 23 January, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

  46. arXiv:1805.11723  [pdf, other

    cs.DB

    Building your Cross-Platform Application with RHEEM

    Authors: Sanjay Chawla, Bertty Contreras-Rojas, Zoi Kaoudi, Sebastian Kruse, Jorge-Arnulfo Quiané-Ruiz

    Abstract: Today, organizations typically perform tedious and costly tasks to juggle their code and data across different data processing platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging because it requires quite good expertise for all the available data processing platforms. In this report, we present Rheem, a general-purpose cross-platform data pro… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  47. RHEEMix in the Data Jungle: A Cost-based Optimizer for Cross-platform Systems

    Authors: Sebastian Kruse, Zoi Kaoudi, Bertty Contreras, Sanjay Chawla, Felix Naumann, Jorge-Arnulfo Quiané-Ruiz

    Abstract: In pursuit of efficient and scalable data analytics, the insight that "one size does not fit all" has given rise to a plethora of specialized data processing platforms and today's complex data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The opti… ▽ More

    Submitted 5 September, 2020; v1 submitted 9 May, 2018; originally announced May 2018.

    Journal ref: VLDB Journal 2020

  48. arXiv:1804.04876  [pdf, other

    cs.CV

    Group Anomaly Detection using Deep Generative Models

    Authors: Raghavendra Chalapathy, Edward Toth, Sanjay Chawla

    Abstract: Unlike conventional anomaly detection research that focuses on point anomalies, our goal is to detect anomalous collections of individual data points. In particular, we perform group anomaly detection (GAD) with an emphasis on irregular group distributions (e.g. irregular mixtures of image pixels). GAD is an important task in detecting unusual and anomalous phenomena in real-world applications suc… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

    Comments: Submitted Under review to The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-2018 Conference Dublin, Ireland during the 10-14 September 2018

  49. arXiv:1802.06360  [pdf, other

    cs.LG cs.NE stat.ML

    Anomaly Detection using One-Class Neural Networks

    Authors: Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

    Abstract: We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets. OC-NN combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the O… ▽ More

    Submitted 10 January, 2019; v1 submitted 18 February, 2018; originally announced February 2018.

  50. arXiv:1802.03680  [pdf, other

    cs.CV

    RoadTracer: Automatic Extraction of Road Networks from Aerial Images

    Authors: Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, David DeWitt

    Abstract: Mapping road networks is currently both expensive and labor-intensive. High-resolution aerial imagery provides a promising avenue to automatically infer a road network. Prior work uses convolutional neural networks (CNNs) to detect which pixels belong to a road (segmentation), and then uses complex post-processing heuristics to infer graph connectivity. We show that these segmentation methods have… ▽ More

    Submitted 26 April, 2018; v1 submitted 10 February, 2018; originally announced February 2018.

    Comments: CVPR 2018