subscribe to arXiv mailings

FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

Authors: Rajat Kumar Jenamani, Priya Sundaresan, Maram Sakr, Tapomayukh Bhattacharjee, Dorsa Sadigh

Abstract: Robot-assisted feeding has the potential to improve the quality of life for individuals with mobility limitations who are unable to feed themselves independently. However, there exists a large gap between the homogeneous, curated plates existing feeding systems can handle, and truly in-the-wild meals. Feeding realistic plates is immensely challenging due to the sheer range of food items that a rob… ▽ More Robot-assisted feeding has the potential to improve the quality of life for individuals with mobility limitations who are unable to feed themselves independently. However, there exists a large gap between the homogeneous, curated plates existing feeding systems can handle, and truly in-the-wild meals. Feeding realistic plates is immensely challenging due to the sheer range of food items that a robot may encounter, each requiring specialized manipulation strategies which must be sequenced over a long horizon to feed an entire meal. An assistive feeding system should not only be able to sequence different strategies efficiently in order to feed an entire meal, but also be mindful of user preferences given the personalized nature of the task. We address this with FLAIR, a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models, along with a library of parameterized skills, to plan and execute user-preferred and efficient bite sequences. In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup, while adhering to the diverse preferences of 42 participants without mobility limitations as evaluated in a user study. We demonstrate the seamless integration of FLAIR with existing bite transfer methods [19, 28], and deploy it across 2 institutions and 3 robots, illustrating its adaptability. Finally, we illustrate the real-world efficacy of our system by successfully feeding a care recipient with severe mobility limitations. Supplementary materials and videos can be found at: https://emprise.cs.cornell.edu/flair . △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: RSS 2024

arXiv:2406.19880 [pdf, other]

Parameterized Verification of Round-based Distributed Algorithms via Extended Threshold Automata

Authors: Tom Baumeister, Paul Eichler, Swen Jacobs, Mouhammad Sakr, Marcus Völp

Abstract: Threshold automata are a computational model that has proven to be versatile in modeling threshold-based distributed algorithms and enabling their completely automatic parameterized verification. We present novel techniques for the verification of threshold automata, based on well-structured transition systems, that allow us to extend the expressiveness of both the computational model and the spec… ▽ More Threshold automata are a computational model that has proven to be versatile in modeling threshold-based distributed algorithms and enabling their completely automatic parameterized verification. We present novel techniques for the verification of threshold automata, based on well-structured transition systems, that allow us to extend the expressiveness of both the computational model and the specifications that can be verified. In particular, we extend the model to allow decrements and resets of shared variables, possibly on cycles, and the specifications to general coverability. While these extensions of the model in general lead to undecidability, our algorithms provide a semi-decision procedure. We demonstrate the benefit of our extensions by showing that we can model complex round-based algorithms such as the phase king consensus algorithm and the Red Belly Blockchain protocol (published in 2019), and verify them fully automatically for the first time. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.05327 [pdf, other]

Multi-Entry Generalized Search Trees for Indexing Trajectories

Authors: Maxime Schoemans, Walid G. Aref, Esteban Zimányi, Mahmoud Sakr

Abstract: The idea of generalized indices is one of the success stories of database systems research. It has found its way to implementation in common database systems. GiST (Generalized Search Tree) and SP-GiST (Space-Partitioned Generalized Search Tree) are two widely-used generalized indices that are typically used for multidimensional data. Currently, the generalized indices GiST and SP-GiST represent o… ▽ More The idea of generalized indices is one of the success stories of database systems research. It has found its way to implementation in common database systems. GiST (Generalized Search Tree) and SP-GiST (Space-Partitioned Generalized Search Tree) are two widely-used generalized indices that are typically used for multidimensional data. Currently, the generalized indices GiST and SP-GiST represent one database object using one index entry, e.g., a bounding box for each spatio-temporal object. However, when dealing with complex objects, e.g., moving object trajectories, a single entry per object is inadequate for creating efficient indices. Previous research has highlighted that splitting trajectories into multiple bounding boxes prior to indexing can enhance query performance as it leads to a higher index filter. In this paper, we introduce MGiST and MSP-GiST, the multi-entry generalized search tree counterparts of GiST and SP-GiST, respectively, that are designed to enable the partitioning of objects into multiple entries during insertion. The methods for decomposing a complex object into multiple sub-objects differ from one data type to another, and may depend on some domain-specific parameters. Thus, MGiST and MSP-GiST are designed to allow for pluggable modules that aid in optimizing the split of an object into multiple sub-objects. We demonstrate the usefulness of MGiST and MSP-GiST using a trajectory indexing scenario, where we realize several trajectory indexes using MGiST and MSP-GiST and instantiate these search trees with trajectory-specific splitting algorithms. We create and test the performance of several multi-entry versions of widely-used spatial index structures, e.g., R-Tree, Quad-Tree, and KD-Tree. We conduct evaluations using both synthetic and real-world data, and observe up to an order of magnitude enhancement in performance of point, range, and KNN queries. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2404.18262 [pdf, other]

Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning

Authors: Atharva Naik, Jessica Ruhan Yin, Anusha Kamath, Qianou Ma, Sherry Tongshuang Wu, Charles Murray, Christopher Bogart, Majd Sakr, Carolyn P. Rose

Abstract: An advantage of Large Language Models (LLMs) is their contextualization capability - providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an… ▽ More An advantage of Large Language Models (LLMs) is their contextualization capability - providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an Online Programming Exercise bot for a college-level Cloud Computing course with ChatGPT, which offers students contextualized reflection triggers during a collaborative query optimization task in database design. We demonstrate that LLMs can be used to generate highly situated reflection triggers that incorporate details of the collaborative discussion happening in context. We discuss in depth the exploration of the design space of the triggers and their correspondence with the learning objectives as well as the impact on student learning in a pilot study with 34 students. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.09366 [pdf, ps, other]

doi 10.1145/3626253.3635608

Understanding the Role of Temperature in Diverse Question Generation by GPT-4

Authors: Arav Agarwal, Karthik Mittal, Aidan Doyle, Pragnya Sridhar, Zipiao Wan, Jacob Arthur Doughty, Jaromir Savelka, Majd Sakr

Abstract: We conduct a preliminary study of the effect of GPT's temperature parameter on the diversity of GPT4-generated questions. We find that using higher temperature values leads to significantly higher diversity, with different temperatures exposing different types of similarity between generated sets of questions. We also demonstrate that diverse question generation is especially difficult for questio… ▽ More We conduct a preliminary study of the effect of GPT's temperature parameter on the diversity of GPT4-generated questions. We find that using higher temperature values leads to significantly higher diversity, with different temperatures exposing different types of similarity between generated sets of questions. We also demonstrate that diverse question generation is especially difficult for questions targeting lower levels of Bloom's Taxonomy. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2403.04821 [pdf, other]

New algorithms for the simplification of multiple trajectories under bandwidth constraints

Authors: Gilles Dejaegere, Mahmoud Sakr

Abstract: This study introduces time-windowed variations of three established trajectory simplification algorithms. These new algorithms are specifically designed to be used in contexts with bandwidth limitations. We present the details of these algorithms and highlight the differences compared to their classical counterparts. To evaluate their performance, we conduct accuracy assessments for varying size… ▽ More This study introduces time-windowed variations of three established trajectory simplification algorithms. These new algorithms are specifically designed to be used in contexts with bandwidth limitations. We present the details of these algorithms and highlight the differences compared to their classical counterparts. To evaluate their performance, we conduct accuracy assessments for varying sizes of time windows, utilizing two different datasets and exploring different compression ratios. The accuracies of the proposed algorithms are compared with those of existing methods. Our findings demonstrate that, for larger time windows, the enhanced version of the bandwidth-constrained STTrace outperforms other algorithms, with the bandwidth-constrained improved version of \squish also yielding satisfactory results at a lower computational cost. Conversely, for short time windows, only the bandwidth-constrained version of Dead Reckoning remains satisfactory. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Preprint, To be published as a proceeding of Workshop on Big Mobility Data Analytics (BMDA) co-located with EDBT/ICDT 2024 Joint Conference

arXiv:2401.16791 [pdf, other]

Accelerated Cloud for Artificial Intelligence (ACAI)

Authors: Dachi Chen, Weitian Ding, Chen Liang, Chang Xu, Junwei Zhang, Majd Sakr

Abstract: Training an effective Machine learning (ML) model is an iterative process that requires effort in multiple dimensions. Vertically, a single pipeline typically includes an initial ETL (Extract, Transform, Load) of raw datasets, a model training stage, and an evaluation stage where the practitioners obtain statistics of the model performance. Horizontally, many such pipelines may be required to find… ▽ More Training an effective Machine learning (ML) model is an iterative process that requires effort in multiple dimensions. Vertically, a single pipeline typically includes an initial ETL (Extract, Transform, Load) of raw datasets, a model training stage, and an evaluation stage where the practitioners obtain statistics of the model performance. Horizontally, many such pipelines may be required to find the best model within a search space of model configurations. Many practitioners resort to maintaining logs manually and writing simple glue code to automate the workflow. However, carrying out this process on the cloud is not a trivial task in terms of resource provisioning, data management, and bookkeeping of job histories to make sure the results are reproducible. We propose an end-to-end cloud-based machine learning platform, Accelerated Cloud for AI (ACAI), to help improve the productivity of ML practitioners. ACAI achieves this goal by enabling cloud-based storage of indexed, labeled, and searchable data, as well as automatic resource provisioning, job scheduling, and experiment tracking. Specifically, ACAI provides practitioners (1) a data lake for storing versioned datasets and their corresponding metadata, and (2) an execution engine for executing ML jobs on the cloud with automatic resource provisioning (auto-provision), logging and provenance tracking. To evaluate ACAI, we test the efficacy of our auto-provisioner on the MNIST handwritten digit classification task, and we study the usability of our system using experiments and interviews. We show that our auto-provisioner produces a 1.7x speed-up and 39% cost reduction, and our system reduces experiment time for ML scientists by 20% on typical ML use cases. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2312.03173 [pdf, other]

doi 10.1145/3636243.3636256

A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education

Authors: Jacob Doughty, Zipiao Wan, Anishka Bompelli, Jubahed Qayum, Taozhi Wang, Juran Zhang, Yujia Zheng, Aidan Doyle, Pragnya Sridhar, Arav Agarwal, Christopher Bogart, Eric Keylor, Can Kultur, Jaromir Savelka, Majd Sakr

Abstract: There is a constant need for educators to develop and maintain effective up-to-date assessments. While there is a growing body of research in computing education on utilizing large language models (LLMs) in generation and engagement with coding exercises, the use of LLMs for generating programming MCQs has not been extensively explored. We analyzed the capability of GPT-4 to produce multiple-choic… ▽ More There is a constant need for educators to develop and maintain effective up-to-date assessments. While there is a growing body of research in computing education on utilizing large language models (LLMs) in generation and engagement with coding exercises, the use of LLMs for generating programming MCQs has not been extensively explored. We analyzed the capability of GPT-4 to produce multiple-choice questions (MCQs) aligned with specific learning objectives (LOs) from Python programming classes in higher education. Specifically, we developed an LLM-powered (GPT-4) system for generation of MCQs from high-level course context and module-level LOs. We evaluated 651 LLM-generated and 449 human-crafted MCQs aligned to 246 LOs from 6 Python courses. We found that GPT-4 was capable of producing MCQs with clear language, a single correct choice, and high-quality distractors. We also observed that the generated MCQs appeared to be well-aligned with the LOs. Our findings can be leveraged by educators wishing to take advantage of the state-of-the-art generative models to support MCQ authoring efforts. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.09518 [pdf, other]

From GPT-3 to GPT-4: On the Evolving Efficacy of LLMs to Answer Multiple-choice Questions for Programming Classes in Higher Education

Authors: Jaromir Savelka, Arav Agarwal, Christopher Bogart, Majd Sakr

Abstract: We explore the evolving efficacy of three generative pre-trained transformer (GPT) models in generating answers for multiple-choice questions (MCQ) from introductory and intermediate Python programming courses in higher education. We focus on the differences in capabilities of the models prior to the release of ChatGPT (Nov '22), at the time of the release, and today (i.e., Aug '23). Recent studie… ▽ More We explore the evolving efficacy of three generative pre-trained transformer (GPT) models in generating answers for multiple-choice questions (MCQ) from introductory and intermediate Python programming courses in higher education. We focus on the differences in capabilities of the models prior to the release of ChatGPT (Nov '22), at the time of the release, and today (i.e., Aug '23). Recent studies have established that the abilities of the OpenAI's GPT models to handle assessments originally designed for humans keep increasing as the newer more capable models are released. However, the qualitative differences in the capabilities and limitations of these models to reason about and/or analyze programming MCQs have been under-explored. We evaluated three OpenAI's GPT models on formative and summative MCQ assessments from three Python courses (530 questions) focusing on the qualitative differences in the evolving efficacy of the subsequent models. This study provides further evidence and insight into the trajectory of the current developments where there already exists a technology that can be utilized by students to collect passing scores, with no effort whatsoever, on what today counts as viable programming knowledge and skills assessments. This study could be leveraged by educators and institutions to better understand the recent technological developments in order to adapt the design of programming assessments as well as to fuel the necessary discussions into how assessments in future programming classes should be updated. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.08033, arXiv:2306.10073

arXiv:2310.13083 [pdf, other]

How Can Everyday Users Efficiently Teach Robots by Demonstrations?

Authors: Maram Sakr, Zhikai Zhang, Benjamin Li, Haomiao Zhang, H. F. Machiel Van der Loos, Dana Kulic, Elizabeth Croft

Abstract: Learning from Demonstration (LfD) is a framework that allows lay users to easily program robots. However, the efficiency of robot learning and the robot's ability to generalize to task variations hinges upon the quality and quantity of the provided demonstrations. Our objective is to guide human teachers to furnish more effective demonstrations, thus facilitating efficient robot learning. To achie… ▽ More Learning from Demonstration (LfD) is a framework that allows lay users to easily program robots. However, the efficiency of robot learning and the robot's ability to generalize to task variations hinges upon the quality and quantity of the provided demonstrations. Our objective is to guide human teachers to furnish more effective demonstrations, thus facilitating efficient robot learning. To achieve this, we propose to use a measure of uncertainty, namely task-related information entropy, as a criterion for suggesting informative demonstration examples to human teachers to improve their teaching skills. In a conducted experiment (N=24), an augmented reality (AR)-based guidance system was employed to train novice users to produce additional demonstrations from areas with the highest entropy within the workspace. These novice users were trained for a few trials to teach the robot a generalizable task using a limited number of demonstrations. Subsequently, the users' performance after training was assessed first on the same task (retention) and then on a novel task (transfer) without guidance. The results indicated a substantial improvement in robot learning efficiency from the teacher's demonstrations, with an improvement of up to 198% observed on the novel task. Furthermore, the proposed approach was compared to a state-of-the-art heuristic rule and found to improve robot learning efficiency by 210% compared to the heuristic rule. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2307.05717 [pdf, other]

Towards Mobility Data Science (Vision Paper)

Authors: Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento , et al. (23 additional authors not shown)

Abstract: Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences… ▽ More Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years. △ Less

Submitted 7 March, 2024; v1 submitted 21 June, 2023; originally announced July 2023.

Comments: Updated to reflect the major revision for ACM Transactions on Spatial Algorithms and Systems (TSAS). This version reflects the final version accepted by ACM TSAS

arXiv:2306.17459 [pdf, other]

Harnessing LLMs in Curricular Design: Using GPT-4 to Support Authoring of Learning Objectives

Authors: Pragnya Sridhar, Aidan Doyle, Arav Agarwal, Christopher Bogart, Jaromir Savelka, Majd Sakr

Abstract: We evaluated the capability of a generative pre-trained transformer (GPT-4) to automatically generate high-quality learning objectives (LOs) in the context of a practically oriented university course on Artificial Intelligence. Discussions of opportunities (e.g., content generation, explanation) and risks (e.g., cheating) of this emerging technology in education have intensified, but to date there… ▽ More We evaluated the capability of a generative pre-trained transformer (GPT-4) to automatically generate high-quality learning objectives (LOs) in the context of a practically oriented university course on Artificial Intelligence. Discussions of opportunities (e.g., content generation, explanation) and risks (e.g., cheating) of this emerging technology in education have intensified, but to date there has not been a study of the models' capabilities in supporting the course design and authoring of LOs. LOs articulate the knowledge and skills learners are intended to acquire by engaging with a course. To be effective, LOs must focus on what students are intended to achieve, focus on specific cognitive processes, and be measurable. Thus, authoring high-quality LOs is a challenging and time consuming (i.e., expensive) effort. We evaluated 127 LOs that were automatically generated based on a carefully crafted prompt (detailed guidelines on high-quality LOs authoring) submitted to GPT-4 for conceptual modules and projects of an AI Practitioner course. We analyzed the generated LOs if they follow certain best practices such as beginning with action verbs from Bloom's taxonomy in regards to the level of sophistication intended. Our analysis showed that the generated LOs are sensible, properly expressed (e.g., starting with an action verb), and that they largely operate at the appropriate level of Bloom's taxonomy, respecting the different nature of the conceptual modules (lower levels) and projects (higher levels). Our results can be leveraged by instructors and curricular designers wishing to take advantage of the state-of-the-art generative models to support their curricular and course design efforts. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2306.10073 [pdf, other]

doi 10.1145/3568813.3600142

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

Authors: Jaromir Savelka, Arav Agarwal, Marshall An, Chris Bogart, Majd Sakr

Abstract: This paper studies recent developments in large language models' (LLM) abilities to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. The emergence of ChatGPT resulted in heated debates of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming classes (e.g., cheating). Recent studies show that while… ▽ More This paper studies recent developments in large language models' (LLM) abilities to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. The emergence of ChatGPT resulted in heated debates of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming classes (e.g., cheating). Recent studies show that while the technology performs surprisingly well on diverse sets of assessment instruments employed in typical programming classes the performance is usually not sufficient to pass the courses. The release of GPT-4 largely emphasized notable improvements in the capabilities related to handling assessments originally designed for human test-takers. This study is the necessary analysis in the context of this ongoing transition towards mature generative AI systems. Specifically, we report the performance of GPT-4, comparing it to the previous generations of GPT models, on three Python courses with assessments ranging from simple multiple-choice questions (no code involved) to complex programming projects with code bases distributed into multiple files (599 exercises overall). Additionally, we analyze the assessments that were not handled well by GPT-4 to understand the current limitations of the model, as well as its capabilities to leverage feedback provided by an auto-grader. We found that the GPT models evolved from completely failing the typical programming class' assessments (the original GPT-3) to confidently passing the courses with no human involvement (GPT-4). While we identified certain limitations in GPT-4's handling of MCQs and coding exercises, the rate of improvement across the recent generations of GPT models strongly suggests their potential to handle almost any type of assessment widely used in higher education programming courses. These findings could be leveraged by educators and institutions to adapt the design of programming assessments as well as to fuel the necessary discussions into how programming classes should be updated to reflect the recent technological developments. This study provides evidence that programming instructors need to prepare for a world in which there is an easy-to-use widely accessible technology that can be utilized by learners to collect passing scores, with no effort whatsoever, on what today counts as viable programming knowledge and skills assessments. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Journal ref: ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1. August 2023. Pages 78 - 92

arXiv:2303.09325 [pdf, other]

doi 10.1145/3587102.3588792

Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?

Authors: Jaromir Savelka, Arav Agarwal, Christopher Bogart, Yifan Song, Majd Sakr

Abstract: We evaluated the capability of generative pre-trained transformers (GPT), to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. Discussions of potential uses (e.g., exercise generation, code explanation) and misuses (e.g., cheating) of this emerging technology in programming education have intensified, but to date there has not been a rigorous… ▽ More We evaluated the capability of generative pre-trained transformers (GPT), to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. Discussions of potential uses (e.g., exercise generation, code explanation) and misuses (e.g., cheating) of this emerging technology in programming education have intensified, but to date there has not been a rigorous analysis of the models' capabilities in the realistic context of a full-fledged programming course with diverse set of assessment instruments. We evaluated GPT on three Python courses that employ assessments ranging from simple multiple-choice questions (no code involved) to complex programming projects with code bases distributed into multiple files (599 exercises overall). Further, we studied if and how successfully GPT models leverage feedback provided by an auto-grader. We found that the current models are not capable of passing the full spectrum of assessments typically involved in a Python programming course (<70% on even entry-level modules). Yet, it is clear that a straightforward application of these easily accessible models could enable a learner to obtain a non-trivial portion of the overall available score (>55%) in introductory and intermediate courses alike. While the models exhibit remarkable capabilities, including correcting solutions based on auto-grader's feedback, some limitations exist (e.g., poor handling of exercises requiring complex chains of reasoning steps). These findings can be leveraged by instructors wishing to adapt their assessments so that GPT becomes a valuable assistant for a learner as opposed to an end-to-end solution. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 7 pages. arXiv admin note: text overlap with arXiv:2303.08033

Journal ref: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V.1 (ITiCSE 2023) 117-123

arXiv:2303.08033 [pdf, other]

Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code

Authors: Jaromir Savelka, Arav Agarwal, Christopher Bogart, Majd Sakr

Abstract: We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments, often involving short snippets of code, from introductory and intermediate programming courses at the postsecondary level. This emerging technology stirs countless discussions of its potential uses (e.g., exercise generation, code explanation) as well as misus… ▽ More We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments, often involving short snippets of code, from introductory and intermediate programming courses at the postsecondary level. This emerging technology stirs countless discussions of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming education (e.g., cheating). However, the capabilities of GPT models and their limitations to reason about and/or analyze code in educational settings have been under-explored. We evaluated several OpenAI's GPT models on formative and summative MCQ assessments from three Python courses (530 questions). We found that MCQs containing code snippets are not answered as successfully as those that only contain natural language. While questions requiring to fill-in a blank in the code or completing a natural language statement about the snippet are handled rather successfully, MCQs that require analysis and/or reasoning about the code (e.g., what is true/false about the snippet, or what is its output) appear to be the most challenging. These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses, so that GPT becomes a valuable assistant for a learner as opposed to a source of confusion and/or potential hindrance in the learning process. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 12 pages

arXiv:2206.07396 [pdf, other]

Selectivity Estimation of Inequality Joins In Databases

Authors: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr

Abstract: Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation, based on which the optimizer can select the cheapest plan to execute. While the problem is known since the mid 70s, we were surprised that there are no solutions in the literature for the selectivity estimation of inequality joins. By… ▽ More Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation, based on which the optimizer can select the cheapest plan to execute. While the problem is known since the mid 70s, we were surprised that there are no solutions in the literature for the selectivity estimation of inequality joins. By testing four common database systems: Oracle, SQL-Server, PostgreSQL, and MySQL, we found that the open-source systems PostgreSQL and MySQL lack this estimation. Oracle and SQL-Server make fairly accurate estimations, yet their algorithms are secret. This paper thus proposes an algorithm for inequality join selectivity estimation. The proposed algorithm has been implemented in PostgreSQL and sent as a patch to be included in the next releases. △ Less

Submitted 15 June, 2022; originally announced June 2022.

ACM Class: H.2

arXiv:2206.00251 [pdf, other]

The Reactive Synthesis Competition (SYNTCOMP): 2018-2021

Authors: Swen Jacobs, Guillermo A. Perez, Remco Abraham, Veronique Bruyere, Michael Cadilhac, Maximilien Colange, Charly Delfosse, Tom van Dijk, Alexandre Duret-Lutz, Peter Faymonville, Bernd Finkbeiner, Ayrat Khalimov, Felix Klein, Michael Luttenberger, Klara Meyer, Thibaud Michaud, Adrien Pommellet, Florian Renkin, Philipp Schlehuber-Caissier, Mouhammad Sakr, Salomon Sickert, Gaetan Staquet, Clement Tamines, Leander Tentrup, Adam Walker

Abstract: We report on the last four editions of the reactive synthesis competition (SYNTCOMP 2018-2021). We briefly describe the evaluation scheme and the experimental setup of SYNTCOMP. Then, we introduce new benchmark classes that have been added to the SYNTCOMP library and give an overview of the participants of SYNTCOMP. Finally, we present and analyze the results of our experimental evaluations, inclu… ▽ More We report on the last four editions of the reactive synthesis competition (SYNTCOMP 2018-2021). We briefly describe the evaluation scheme and the experimental setup of SYNTCOMP. Then, we introduce new benchmark classes that have been added to the SYNTCOMP library and give an overview of the participants of SYNTCOMP. Finally, we present and analyze the results of our experimental evaluations, including a ranking of tools with respect to quantity and quality - that is, the total size in terms of logic and memory elements - of solutions. △ Less

Submitted 6 May, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: accepted for publication in STTT

arXiv:2203.13843 [pdf, other]

doi 10.1109/LRA.2022.3191950

Quantifying Demonstration Quality for Robot Learning and Generalization

Authors: Maram Sakr, Zexi Jesse Li, H. F. Machiel Van der Loos, Dana Kulic, Elizabeth A. Croft

Abstract: Learning from Demonstration (LfD) seeks to democratize robotics by enabling diverse end-users to teach robots to perform a task by providing demonstrations. However, most LfD techniques assume users provide optimal demonstrations. This is not always the case in real applications where users are likely to provide demonstrations of varying quality, that may change with expertise and other factors. D… ▽ More Learning from Demonstration (LfD) seeks to democratize robotics by enabling diverse end-users to teach robots to perform a task by providing demonstrations. However, most LfD techniques assume users provide optimal demonstrations. This is not always the case in real applications where users are likely to provide demonstrations of varying quality, that may change with expertise and other factors. Demonstration quality plays a crucial role in robot learning and generalization. Hence, it is important to quantify the quality of the provided demonstrations before using them for robot learning. In this paper, we propose quantifying the quality of the demonstrations based on how well they perform in the learned task. We hypothesize that task performance can give an indication of the generalization performance on similar tasks. The proposed approach is validated in a user study (N = 27). Users with different robotics expertise levels were recruited to teach a PR2 robot a generic task (pressing a button) under different task constraints. They taught the robot in two sessions on two different days to capture their teaching behaviour across sessions. The task performance was utilized to classify the provided demonstrations into high-quality and low-quality sets. The results show a significant Pearson correlation coefficient (R = 0.85, p < 0.0001) between the task performance and generalization performance across all participants. We also found that users clustered into two groups: Users who provided high-quality demonstrations from the first session, assigned to the fast-adapters group, and users who provided low-quality demonstrations in the first session and then improved with practice, assigned to the slow-adapters group. These results highlight the importance of quantifying demonstration quality, which can be indicative of the adaptation level of the user to the task. △ Less

Submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.08343 [pdf, other]

Design and Evaluation of an Augmented Reality Head-Mounted Display Interface for Human Robot Teams Collaborating in Physically Shared Manufacturing Tasks

Authors: Wesley P Chan, Geoffrey Hanks, Maram Sakr, Haomiao Zhang, Tiger Zuo, H F Machiel Van der Loos, Elizabeth Croft

Abstract: We provide an experimental evaluation of a wearable augmented reality (AR) system we have developed for human-robot teams working on tasks requiring collaboration in shared physical workspace. Recent advances in AR technology have facilitated the development of more intuitive user interfaces for many human-robot interaction applications. While it has been anticipated that AR can provided a more in… ▽ More We provide an experimental evaluation of a wearable augmented reality (AR) system we have developed for human-robot teams working on tasks requiring collaboration in shared physical workspace. Recent advances in AR technology have facilitated the development of more intuitive user interfaces for many human-robot interaction applications. While it has been anticipated that AR can provided a more intuitive interface to robot assistants helping human workers in various manufacturing scenarios, existing studies in robotics have been largely limited to teleoperation and programming. Industry 5.0 envisions cooperation between human and robot working in teams. Indeed, there exist many industrial task that can benefit from human-robot collaboration. A prime example is high-value composite manufacturing. Working with our industry partner towards this example application, we evaluated our AR interface design for shared physical workspace collaboration in human-robot teams. We conducted a multi-dimensional analysis of our interface using establish metrics. Results from our user study (n=26) show that subjectively, the AR interface feels more novel and a standard joystick interface feels more dependable to users. However, the AR interface was found to reduce physical demand and task completion time, while increasing robot utilization. Furthermore, user's freedom of choice to collaborate with the robot may also affect the perceived usability of the system. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2111.03322 [pdf, ps, other]

Automatic Repair and Deadlock Detection for Parameterized Systems

Authors: Swen Jacobs, Mouhammad Sakr, Marcus Völp

Abstract: We present an algorithm for the repair of parameterized systems. The repair problem is, for a given process implementation, to find a refinement such that a given safety property is satisfied by the resulting parameterized system, and deadlocks are avoided. Our algorithm uses a parameterized model checker to determine the correctness of candidate solutions and employs a constraint system to rule o… ▽ More We present an algorithm for the repair of parameterized systems. The repair problem is, for a given process implementation, to find a refinement such that a given safety property is satisfied by the resulting parameterized system, and deadlocks are avoided. Our algorithm uses a parameterized model checker to determine the correctness of candidate solutions and employs a constraint system to rule out candidates. We apply this algorithm on systems that can be represented as well-structured transition systems (WSTS), including disjunctive systems, pairwise rendezvous systems, and broadcast protocols. Moreover, we show that parameterized deadlock detection can be decided in EXPTIME for disjunctive systems, and that deadlock detection is in general undecidable for broadcast protocols. △ Less

Submitted 28 July, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

arXiv:2107.00218 [pdf]

Comparing Example-Based Collaborative Reflection to Problem Solving Practice for Learning during Team-Based Software Engineering Projects

Authors: Sreecharan Sankaranarayanan, Siddharth Reddy Kandimalla, Christopher Bogart, R. Charles Murray, Haokang An, Michael Hilton, Majd Sakr, Carolyn Rosé

Abstract: Contributing to the literature on aptitude-treatment interactions between worked examples and problem-solving, this paper addresses differential learning from the two approaches when students are positioned as domain experts learning new concepts. Our evaluation is situated in a team project that is part of an advanced software engineering course. In this course, students who possess foundational… ▽ More Contributing to the literature on aptitude-treatment interactions between worked examples and problem-solving, this paper addresses differential learning from the two approaches when students are positioned as domain experts learning new concepts. Our evaluation is situated in a team project that is part of an advanced software engineering course. In this course, students who possess foundational domain knowledge but are learning new concepts engage alternatively in programming followed by worked example-based reflection. They are either allowed to finish programming or are curtailed after a pre-specified time to participate in a longer worked example-based reflection. We find significant pre- to post-test learning gains in both conditions. Then, we not only find significantly more learning when students participated in longer worked example-based reflections but also a significant performance improvement on a problem-solving transfer task. These findings suggest that domain experts learning new concepts benefit more from worked example-based reflections than from problem-solving. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: 4 pages, 1 image, 1 table, 14th Computer Supported Collaborative Learning (CSCL) Proceedings at the Annual Meeting of the International Society of the Learning Sciences (ISLS)

Journal ref: 14th Computer-Supported Collaborative Learning Proceedings at the Annual Meeting of the International Society of the Learning Sciences 2021, pp. 213-216

arXiv:2012.02836 [pdf, other]

Orientation Matters: 6-DoF Autonomous Camera Movement for Minimally Invasive Surgery

Authors: Alaa Eldin Abdelaal, Nancy Hong, Apeksha Avinash, Divya Budihal, Maram Sakr, Gregory D. Hager, Septimiu E. Salcudean

Abstract: We propose a new method for six-degree-of-freedom (6-DoF) autonomous camera movement for minimally invasive surgery, which, unlike previous methods, takes into account both the position and orientation information from structures in the surgical scene. In addition to locating the camera for a good view of the manipulated object, our autonomous camera takes into account workspace constraints, inclu… ▽ More We propose a new method for six-degree-of-freedom (6-DoF) autonomous camera movement for minimally invasive surgery, which, unlike previous methods, takes into account both the position and orientation information from structures in the surgical scene. In addition to locating the camera for a good view of the manipulated object, our autonomous camera takes into account workspace constraints, including the horizon and safety constraints. We developed a simulation environment to test our method on the "wire chaser" surgical training task from validated training curricula in conventional laparoscopy and robot-assisted surgery. Furthermore, we propose, for the first time, the application of the proposed autonomous camera method in video-based surgical skill assessment, an area where videos are typically recorded using fixed cameras. In a study with N=30 human subjects, we show that video examination of the autonomous camera view as it tracks the ring motion over the wire leads to more accurate user error (ring touching the wire) detection than when using a fixed camera view, or camera movement with a fixed orientation. Our preliminary work suggests that there are potential benefits to autonomous camera positioning informed by scene orientation, and this can direct designers of automated endoscopes and surgical robotic systems, especially when using chip-on-tip cameras that can be wristed for 6-DoF motion. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:1911.03122 [pdf, ps, other]

Promptness and Bounded Fairness in Concurrent and Parameterized Systems

Authors: Swen Jacobs, Mouhammad Sakr, Martin Zimmermann

Abstract: We investigate the satisfaction of specifications in Prompt Linear Temporal Logic (Prompt-LTL) by concurrent systems. Prompt-LTL is an extension of LTL that allows to specify parametric bounds on the satisfaction of eventualities, thus adding a quantitative aspect to the specification language. We establish a connection between bounded fairness, bounded stutter equivalence, and the satisfaction of… ▽ More We investigate the satisfaction of specifications in Prompt Linear Temporal Logic (Prompt-LTL) by concurrent systems. Prompt-LTL is an extension of LTL that allows to specify parametric bounds on the satisfaction of eventualities, thus adding a quantitative aspect to the specification language. We establish a connection between bounded fairness, bounded stutter equivalence, and the satisfaction of Prompt-LTL\X formulas. Based on this connection, we prove the first cutoff results for different classes of systems with a parametric number of components and quantitative specifications, thereby identifying previously unknown decidable fragments of the parameterized model checking problem. △ Less

Submitted 15 November, 2019; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Accepted for publication in VMCAI 2020

arXiv:1904.07736 [pdf, other]

The 5th Reactive Synthesis Competition (SYNTCOMP 2018): Benchmarks, Participants & Results

Authors: Swen Jacobs, Roderick Bloem, Maximilien Colange, Peter Faymonville, Bernd Finkbeiner, Ayrat Khalimov, Felix Klein, Michael Luttenberger, Philipp J. Meyer, Thibaud Michaud, Mouhammad Sakr, Salomon Sickert, Leander Tentrup, Adam Walker

Abstract: We report on the fifth reactive synthesis competition (SYNTCOMP 2018). We introduce four new benchmark classes that have been added to the SYNTCOMP library, and briefly describe the evaluation scheme and the experimental setup of SYNTCOMP 2018. We give an overview of the participants of SYNTCOMP 2018 and highlight changes compared to previous years. Finally, we present and analyze the results of o… ▽ More We report on the fifth reactive synthesis competition (SYNTCOMP 2018). We introduce four new benchmark classes that have been added to the SYNTCOMP library, and briefly describe the evaluation scheme and the experimental setup of SYNTCOMP 2018. We give an overview of the participants of SYNTCOMP 2018 and highlight changes compared to previous years. Finally, we present and analyze the results of our experimental evaluation, including a ranking of tools with respect to quantity and quality of solutions. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1711.11439, arXiv:1609.00507

arXiv:1707.01369 [pdf, ps, other]

Analyzing Guarded Protocols: Better Cutoffs, More Systems, More Expressivity

Authors: Swen Jacobs, Mouhammad Sakr

Abstract: We study cutoff results for parameterized verification and synthesis of guarded protocols, as introduced by Emerson and Kahlon (2000). Guarded protocols describe systems of processes whose transitions are enabled or disabled depending on the existence of other processes in certain local states. Cutoff results reduce reasoning about systems with an arbitrary number of processes to systems of a dete… ▽ More We study cutoff results for parameterized verification and synthesis of guarded protocols, as introduced by Emerson and Kahlon (2000). Guarded protocols describe systems of processes whose transitions are enabled or disabled depending on the existence of other processes in certain local states. Cutoff results reduce reasoning about systems with an arbitrary number of processes to systems of a determined, fixed size. Our work is based on the observation that existing cutoff results for guarded protocols are often impractical, since they scale linearly in the number of local states of processes in the system. We provide new cutoffs that scale not with the number of local states, but with the number of guards in the system, which is in many cases much smaller. Furthermore, we consider natural extensions of the classes of systems and specifications under consideration, and present results for problems that have not been known to admit cutoffs before. △ Less

Submitted 5 July, 2017; originally announced July 2017.

arXiv:1303.3592 [pdf, other]

doi 10.1109/HRI.2013.6483610

Expressing Ethnicity through Behaviors of a Robot Character

Authors: Maxim Makatchev, Reid Simmons, Majd Sakr, Micheline Ziadee

Abstract: Achieving homophily, or association based on similarity, between a human user and a robot holds a promise of improved perception and task performance. However, no previous studies that address homophily via ethnic similarity with robots exist. In this paper, we discuss the difficulties of evoking ethnic cues in a robot, as opposed to a virtual agent, and an approach to overcome those difficulties… ▽ More Achieving homophily, or association based on similarity, between a human user and a robot holds a promise of improved perception and task performance. However, no previous studies that address homophily via ethnic similarity with robots exist. In this paper, we discuss the difficulties of evoking ethnic cues in a robot, as opposed to a virtual agent, and an approach to overcome those difficulties based on using ethnically salient behaviors. We outline our methodology for selecting and evaluating such behaviors, and culminate with a study that evaluates our hypotheses of the possibility of ethnic attribution of a robot character through verbal and nonverbal behaviors and of achieving the homophily effect. △ Less

Submitted 14 March, 2013; originally announced March 2013.

Comments: 10 pages, 4 figures

Journal ref: Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction (HRI), Tokyo, Japan, 2013, pages 357-364

arXiv:1203.2299 [pdf, other]

A Cross-cultural Corpus of Annotated Verbal and Nonverbal Behaviors in Receptionist Encounters

Authors: Maxim Makatchev, Reid Simmons, Majd Sakr

Abstract: We present the first annotated corpus of nonverbal behaviors in receptionist interactions, and the first nonverbal corpus (excluding the original video and audio data) of service encounters freely available online. Native speakers of American English and Arabic participated in a naturalistic role play at reception desks of university buildings in Doha, Qatar and Pittsburgh, USA. Their manually ann… ▽ More We present the first annotated corpus of nonverbal behaviors in receptionist interactions, and the first nonverbal corpus (excluding the original video and audio data) of service encounters freely available online. Native speakers of American English and Arabic participated in a naturalistic role play at reception desks of university buildings in Doha, Qatar and Pittsburgh, USA. Their manually annotated nonverbal behaviors include gaze direction, hand and head gestures, torso positions, and facial expressions. We discuss possible uses of the corpus and envision it to become a useful tool for the human-robot interaction community. △ Less

Submitted 10 March, 2012; originally announced March 2012.

Comments: 7 pages, 3 figures, presented at the Workshop on Gaze in HRI: From Modeling to Communication (a Workshop of International Conference on Human-Robot Interaction), March 5, 2012

Showing 1–27 of 27 results for author: Sakr, M