Skip to main content

Showing 1–16 of 16 results for author: Amer-Yahia, S

  1. arXiv:2403.13286  [pdf, other

    stat.ML cs.DB cs.LG

    A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

    Authors: Yun Wang, Chrysanthi Kosyfaki, Sihem Amer-Yahia, Reynold Cheng

    Abstract: Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing in graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses in attributed graphs. We develop a sampling-based hypothesis testing framework,… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2306.01388  [pdf, other

    cs.DB

    From Large Language Models to Databases and Back: A discussion on research and education

    Authors: Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang

    Abstract: This discussion was conducted at a recent panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023), held April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM (ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was moderated by Lei Chen and Xiaochun Yang. The discussion raised several questions on how lar… ▽ More

    Submitted 7 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 7 pages, 2 figures, the Panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023)

  3. arXiv:2206.02845  [pdf, other

    cs.DB cs.LG

    On Efficient Approximate Queries over Machine Learning Models

    Authors: Dujian Ding, Sihem Amer-Yahia, Laks VS Lakshmanan

    Abstract: The question of answering queries over ML predictions has been gaining attention in the database community. This question is challenging because the cost of finding high quality answers corresponds to invoking an oracle such as a human expert or an expensive deep neural network model on every single item in the DB and then applying the query. We develop a novel unified framework for approximate qu… ▽ More

    Submitted 17 November, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted by PVLDB 2023, 16 pages, 15 figures

  4. arXiv:2205.13956  [pdf, other

    cs.LG cs.DB

    Guided Exploration of Data Summaries

    Authors: Brit Youngmann, Sihem Amer-Yahia, Aurélien Personnaz

    Abstract: Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. A useful summary contains k individually uniform sets that are collectively diverse to be representative. Uniformity addresses interpretability and diversity addresses representativity. Findin… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  5. arXiv:2104.04194  [pdf, other

    cs.LG cs.AI cs.DB

    INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

    Authors: Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian, Theofilos Belmpas, Martin Braschler, Ursin Brunner, Diego Calvanese, Maximilian Fabricius, Orest Gkini, Catherine Kosten, Davide Lanti, Antonis Litke, Hendrik Lücke-Tieke, Francesco Alessandro Massucci, Tarcisio Mendes de Farias, Alessandro Mosca, Francesco Multari, Nikolaos Papadakis, Dimitris Papadopoulos, Yogendra Patil, Aurélien Personnaz, Guillem Rull, Ana Sima, Ellery Smith, Dimitrios Skoutas , et al. (3 additional authors not shown)

    Abstract: A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data expl… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 8 pages, 5 figures

    ACM Class: I.2; H.2

  6. arXiv:2009.00453  [pdf, other

    cs.CV

    DropLeaf: a precision farming smartphone application for measuring pesticide spraying methods

    Authors: Bruno Brandoli, Gabriel Spadon, Travis Esau, Patrick Hennessy, Andre C. P. L. Carvalho, Jose F. Rodrigues-Jr, Sihem Amer-Yahia

    Abstract: Pesticide application has been heavily used in the cultivation of major crops, contributing to the increase of crop production over the past decades. However, their appropriate use and calibration of machines rely upon evaluation methodologies that can precisely estimate how well the pesticides' spraying covered the crops. A few strategies have been proposed in former works, yet their elevated cos… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

    Comments: Submitted to Computers and Electronics in Agriculture. arXiv admin note: text overlap with arXiv:1711.07828

  7. arXiv:2003.06875  [pdf, other

    cs.DB cs.DS cs.HC

    Recommending Deployment Strategies for Collaborative Tasks

    Authors: Dong Wei, Senjuti Basu Roy, Sihem Amer-Yahia

    Abstract: Our work contributes to aiding requesters in deploying collaborative tasks in crowdsourcing. We initiate the study of recommending deployment strategies for collaborative tasks to requesters that are consistent with deployment parameters they desire: a lower-bound on the quality of the crowd contribution, an upper-bound on the latency of task completion, and an upper-bound on the cost incurred by… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

  8. arXiv:1909.04605  [pdf, other

    cs.LG cs.NE stat.ML

    Patient trajectory prediction in the Mimic-III dataset, challenges and pitfalls

    Authors: Jose F Rodrigues-Jr, Gabriel Spadon, Bruno Brandoli, Sihem Amer-Yahia

    Abstract: Automated medical prognosis has gained interest as artificial intelligence evolves and the potential for computer-aided medicine becomes evident. Nevertheless, it is challenging to design an effective system that, given a patient's medical history, is able to predict probable future conditions. Previous works, mostly carried out over private datasets, have tackled the problem by using artificial n… ▽ More

    Submitted 28 November, 2019; v1 submitted 10 September, 2019; originally announced September 2019.

  9. arXiv:1801.03233  [pdf, other

    cs.DB cs.AI

    Eliciting Worker Preference for Task Completion

    Authors: Mohammadreza Esfandiari, Senjuti Basu Roy, Sihem Amer-Yahia

    Abstract: Current crowdsourcing platforms provide little support for worker feedback. Workers are sometimes invited to post free text describing their experience and preferences in completing tasks. They can also use forums such as Turker Nation1 to exchange preferences on tasks and requesters. In fact, crowdsourcing platforms rely heavily on observing workers and inferring their preferences implicitly. In… ▽ More

    Submitted 9 January, 2018; originally announced January 2018.

  10. arXiv:1712.03529  [pdf, other

    cs.DB

    Exploration of User Groups in VEXUS

    Authors: Sihem Amer-Yahia, Behrooz Omidvar-Tehrani, Joao Comba, Viviane Moreira, Fabian Colque Zegarra

    Abstract: We introduce VEXUS, an interactive visualization framework for exploring user data to fulfill tasks such as finding a set of experts, forming discussion groups and analyzing collective behaviors. User data is characterized by a combination of demographics like age and occupation, and actions such as rating a movie, writing a paper, following a medical treatment or buying groceries. The ubiquity of… ▽ More

    Submitted 10 December, 2017; originally announced December 2017.

  11. arXiv:1605.02772  [pdf, ps, other

    cs.DS

    Querying Temporal Drifts at Multiple Granularities (Technical Report)

    Authors: Sofia Kleisarchaki, Sihem Amer-Yahia, Ahlame Douzal-Chouakria, Vassilis Christophides

    Abstract: There exists a large body of work on online drift detection with the goal of dynamically finding and maintaining changes in data streams. In this paper, we adopt a query-based approach to drift detection. Our approach relies on {\em a drift index}, a structure that captures drift at different time granularities and enables flexible {\em drift queries}. We formalize different drift queries that rep… ▽ More

    Submitted 13 May, 2016; v1 submitted 9 May, 2016; originally announced May 2016.

  12. arXiv:1603.04792  [pdf, other

    cs.DB

    Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

    Authors: Martin Kirchgessner, Vincent Leroy, Sihem Amer-Yahia, Shashwat Mishra

    Abstract: Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy rosé wine" or "customers who buy paté also buy salted butter and sour bread." Unfortunat… ▽ More

    Submitted 15 March, 2016; originally announced March 2016.

  13. arXiv:1502.05106  [pdf, other

    cs.DB

    "The Whole Is Greater Than the Sum of Its Parts": Optimization in Collaborative Crowdsourcing

    Authors: Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das

    Abstract: In this work, we initiate the investigation of optimization opportunities in collaborative crowdsourcing. Many popular applications, such as collaborative document editing, sentence translation, or citizen science resort to this special form of human-based computing, where, crowd workers with appropriate skills and expertise are required to form groups to solve complex tasks. Central to any collab… ▽ More

    Submitted 12 April, 2015; v1 submitted 17 February, 2015; originally announced February 2015.

  14. arXiv:1401.1302  [pdf, other

    cs.DB cs.SI

    Optimization in Knowledge-Intensive Crowdsourcing

    Authors: Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das

    Abstract: We present SmartCrowd, a framework for optimizing collaborative knowledge-intensive crowdsourcing. SmartCrowd distinguishes itself by accounting for human factors in the process of assigning tasks to workers. Human factors designate workers' expertise in different skills, their expected minimum wage, and their availability. In SmartCrowd, we formulate task assignment as an optimization problem, an… ▽ More

    Submitted 7 January, 2014; originally announced January 2014.

    Comments: 12 pages

  15. arXiv:1208.0285  [pdf, other

    cs.DB

    Who Tags What? An Analysis Framework

    Authors: Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, Cong Yu

    Abstract: The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. I… ▽ More

    Submitted 1 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1567-1578 (2012)

  16. arXiv:0909.2058  [pdf

    cs.DB cs.HC cs.IR cs.PL

    SocialScope: Enabling Information Discovery on Social Content Sites

    Authors: Sihem Amer-Yahia, Laks Lakshmanan, Cong Yu

    Abstract: Recently, many content sites have started encouraging their users to engage in social activities such as adding buddies on Yahoo! Travel and sharing articles with their friends on New York Times. This has led to the emergence of {\em social content sites}, which is being facilitated by initiatives like OpenID (http://www.openid.net/) and OpenSocial (http://www.opensocial.org/). These community s… ▽ More

    Submitted 10 September, 2009; originally announced September 2009.

    Comments: CIDR 2009