Skip to main content

Showing 1–16 of 16 results for author: Khurana, U

  1. Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

    Authors: Qian Zhu, Dakuo Wang, Shuai Ma, April Yi Wang, Zixin Chen, Udayan Khurana, Xiaojuan Ma

    Abstract: As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited und… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Computational Notebooks, Human-AI Collaboration, Feature Recommendation

  2. arXiv:2308.01080  [pdf, other

    cs.CL

    Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation

    Authors: Lea Krause, Selene Báez Santamaría, Michiel van der Meer, Urja Khurana

    Abstract: This paper discusses our approaches for task-oriented conversational modelling using subjective knowledge, with a particular emphasis on response generation. Our methodology was shaped by an extensive data analysis that evaluated key factors such as response length, sentiment, and dialogue acts present in the provided dataset. We used few-shot learning to augment the data with newly generated subj… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: DSTC11

  3. arXiv:2303.01378  [pdf, other

    cs.AI cs.DB cs.LG

    A Vision for Semantically Enriched Data Science

    Authors: Udayan Khurana, Kavitha Srinivas, Sainyam Galhotra, Horst Samulowitz

    Abstract: The recent efforts in automation of machine learning or data science has achieved success in various tasks such as hyper-parameter optimization or model selection. However, key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation. Data Scientists have long leveraged common sense reasoning and domain knowledge to understand and enrich data for b… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.08018

  4. arXiv:2209.08966  [pdf, other

    cs.CL cs.AI

    Will It Blend? Mixing Training Paradigms & Prompting for Argument Quality Prediction

    Authors: Michiel van der Meer, Myrthe Reuver, Urja Khurana, Lea Krause, Selene Báez Santamaría

    Abstract: This paper describes our contributions to the Shared Task of the 9th Workshop on Argument Mining (2022). Our approach uses Large Language Models for the task of Argument Quality Prediction. We perform prompt engineering using GPT-3, and also investigate the training paradigms multi-task learning, contrastive learning, and intermediate-task training. We find that a mixed prediction setup outperform… ▽ More

    Submitted 5 October, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted at the 9th Workshop on Argument Mining (2022)

  5. arXiv:2206.15455  [pdf, other

    cs.CL

    Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

    Authors: Urja Khurana, Ivar Vermeulen, Eric Nalisnick, Marloes van Noorloos, Antske Fokkens

    Abstract: \textbf{Offensive Content Warning}: This paper contains offensive language only for providing examples that clarify this research and do not reflect the authors' opinions. Please be aware that these examples are offensive and may cause you distress. The subjectivity of recognizing \textit{hate speech} makes it a complex task. This is also reflected by different and incomplete definitions in NLP.… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted at WOAH 2022, co-located at NAACL 2022. Cite ACL version

  6. arXiv:2205.08018  [pdf, other

    cs.AI

    A Survey on Semantics in Automated Data Science

    Authors: Udayan Khurana, Kavitha Srinivas, Horst Samulowitz

    Abstract: Data Scientists leverage common sense reasoning and domain knowledge to understand and enrich data for building predictive models. In recent years, we have witnessed a surge in tools and techniques for {\em automated machine learning}. While data scientists can employ various such tools to help with model building, many other aspects such as {\em feature engineering} that require semantic understa… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  7. arXiv:2111.09612  [pdf, ps, other

    cs.CL cs.LG

    How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task

    Authors: Urja Khurana, Eric Nalisnick, Antske Fokkens

    Abstract: Despite their success, modern language models are fragile. Even small changes in their training pipeline can lead to unexpected results. We study this phenomenon by examining the robustness of ALBERT (arXiv:1909.11942) in combination with Stochastic Weight Averaging (SWA) (arXiv:1803.05407) -- a cheap way of ensembling -- on a sentiment analysis task (SST-2). In particular, we analyze SWA's stabil… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Accepted at the second workshop on Evaluation & Comparison of NLP Systems, co-located at EMNLP 2021. Cite ACL version

  8. arXiv:2101.03970  [pdf, other

    cs.LG cs.HC

    How Much Automation Does a Data Scientist Want?

    Authors: Dakuo Wang, Q. Vera Liao, Yunfeng Zhang, Udayan Khurana, Horst Samulowitz, Soya Park, Michael Muller, Lisa Amini

    Abstract: Data science and machine learning (DS/ML) are at the heart of the recent advancements of many Artificial Intelligence (AI) applications. There is an active research thread in AI, \autoai, that aims to develop systems for automating end-to-end the DS/ML Lifecycle. However, do DS and ML workers really want to automate their DS/ML workflow? To answer this question, we first synthesize a human-centere… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  9. arXiv:2012.08594  [pdf, other

    cs.AI cs.DB

    Semantic Annotation for Tabular Data

    Authors: Udayan Khurana, Sainyam Galhotra

    Abstract: Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have proposed supervised learning-based or heuristic pattern-based approaches to semantic type annotation. Both have shortcomings that prevent them from generalizin… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  10. arXiv:1910.14436  [pdf, other

    cs.AI cs.LG

    How can AI Automate End-to-End Data Science?

    Authors: Charu Aggarwal, Djallel Bouneffouf, Horst Samulowitz, Beat Buesser, Thanh Hoang, Udayan Khurana, Sijia Liu, Tejaswini Pedapati, Parikshit Ram, Ambrish Rawat, Martin Wistuba, Alexander Gray

    Abstract: Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emergin… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

  11. arXiv:1903.00743  [pdf, other

    cs.LG cs.AI stat.ML

    Automating Predictive Modeling Process using Reinforcement Learning

    Authors: Udayan Khurana, Horst Samulowitz

    Abstract: Building a good predictive model requires an array of activities such as data imputation, feature transformations, estimator selection, hyper-parameter search and ensemble construction. Given the large, complex and heterogenous space of options, off-the-shelf optimization methods are infeasible for realistic response times. In practice, much of the predictive modeling process is conducted by exper… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

  12. arXiv:1709.07150  [pdf, other

    cs.AI cs.LG stat.ML

    Feature Engineering for Predictive Modeling using Reinforcement Learning

    Authors: Udayan Khurana, Horst Samulowitz, Deepak Turaga

    Abstract: Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy proc… ▽ More

    Submitted 21 September, 2017; originally announced September 2017.

  13. arXiv:1509.08960  [pdf, other

    cs.DB

    Storing and Analyzing Historical Graph Data at Scale

    Authors: Udayan Khurana, Amol Deshpande

    Abstract: The work on large-scale graph analytics to date has largely focused on the study of static properties of graph snapshots. However, a static view of interactions between entities is often an oversimplification of several complex phenomena like the spread of epidemics, information diffusion, formation of online communities}, and so on. Being able to find temporal interaction patterns, visualize the… ▽ More

    Submitted 29 September, 2015; originally announced September 2015.

  14. arXiv:1207.5777  [pdf, other

    cs.DB cs.SI physics.soc-ph

    Efficient Snapshot Retrieval over Historical Graph Data

    Authors: Udayan Khurana, Amol Deshpande

    Abstract: We address the problem of managing historical data for large evolving information networks like social networks or citation networks, with the goal to enable temporal and evolutionary queries and analysis. We present the design and architecture of a distributed graph database system that stores the entire history of a network and provides support for efficient retrieval of multiple graphs from arb… ▽ More

    Submitted 24 July, 2012; originally announced July 2012.

  15. arXiv:cs/0505066  [pdf

    cs.DS

    Decision Sort and its Parallel Implementation

    Authors: Udayan Khuarana

    Abstract: In this paper, a sorting technique is presented that takes as input a data set whose primary key domain is known to the sorting algorithm, and works with an time efficiency of O(n+k), where k is the primary key domain. It is shown that the algorithm has applicability over a wide range of data sets. Later, a parallel formulation of the same is proposed and its effectiveness is argued. Though this… ▽ More

    Submitted 24 May, 2005; originally announced May 2005.

    Comments: 5 pages, 3 tables, 1 figure, National Conference on Bioinformatics Computing'05

  16. arXiv:cs/0505056  [pdf

    cs.IR cs.IT

    Text Compression and Superfast Searching

    Authors: Udayan Khurana, Anirudh Koul

    Abstract: In this paper, a new compression scheme for text is presented. The same is efficient in giving high compression ratios and enables super fast searching within the compressed text. Typical compression ratios of 70-80% and reducing the search time by 80-85% are the features of this paper. Till now, a trade-off between high ratios and searchability within compressed text has been seen. In this pape… ▽ More

    Submitted 23 May, 2005; originally announced May 2005.

    Comments: 11 pages, 5 tables

    ACM Class: E.4