subscribe to arXiv mailings

Prompts First, Finally

Authors: Brent N. Reeves, James Prather, Paul Denny, Juho Leinonen, Stephen MacNeil, Brett A. Becker, Andrew Luxton-Reilly

Abstract: Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain unsolved, it may be useful in the present moment t… ▽ More Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain unsolved, it may be useful in the present moment to step back and examine the overall trajectory of Computer Science writ large. Since the very beginning, our discipline has sought to increase the level of abstraction in each new representation. We have progressed from hardware dip switches, through special purpose languages and visual representations like flow charts, all the way now to ``natural language.'' With the advent of GenAI, students can finally change the abstraction level of a problem to the ``language'' they've been ``problem solving'' with all their lives. In this paper, we argue that our programming abstractions were always headed here -- to natural language. Now is the time to adopt a ``Prompts First'' approach to Computer Science Education. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 4 pages

arXiv:2407.04873 [pdf, ps, other]

Evaluating Language Models for Generating and Judging Programming Feedback

Authors: Charles Koutcheme, Nicola Dainese, Arto Hellas, Sami Sarsa, Juho Leinonen, Syed Ashraf, Paul Denny

Abstract: The emergence of large language models (LLMs) has transformed research and practice in a wide range of domains. Within the computing education research (CER) domain, LLMs have received plenty of attention especially in the context of learning programming. Much of the work on LLMs in CER has however focused on applying and evaluating proprietary models. In this article, we evaluate the efficiency o… ▽ More The emergence of large language models (LLMs) has transformed research and practice in a wide range of domains. Within the computing education research (CER) domain, LLMs have received plenty of attention especially in the context of learning programming. Much of the work on LLMs in CER has however focused on applying and evaluating proprietary models. In this article, we evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments, and in judging the quality of the programming feedback, contrasting the results against proprietary models. Our evaluations on a dataset of students' submissions to Python introductory programming exercises suggest that the state-of-the-art open-source LLMs (Meta's Llama3) are almost on-par with proprietary models (GPT-4o) in both the generation and assessment of programming feedback. We further demonstrate the efficiency of smaller LLMs in the tasks, and highlight that there are a wide range of LLMs that are accessible even for free for educators and practitioners. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2405.14178 [pdf, other]

doi 10.1145/3649217.3653574

Desirable Characteristics for AI Teaching Assistants in Programming Education

Authors: Paul Denny, Stephen MacNeil, Jaromir Savelka, Leo Porter, Andrew Luxton-Reilly

Abstract: Providing timely and personalized feedback to large numbers of students is a long-standing challenge in programming courses. Relying on human teaching assistants (TAs) has been extensively studied, revealing a number of potential shortcomings. These include inequitable access for students with low confidence when needing support, as well as situations where TAs provide direct solutions without hel… ▽ More Providing timely and personalized feedback to large numbers of students is a long-standing challenge in programming courses. Relying on human teaching assistants (TAs) has been extensively studied, revealing a number of potential shortcomings. These include inequitable access for students with low confidence when needing support, as well as situations where TAs provide direct solutions without helping students to develop their own problem-solving skills. With the advent of powerful large language models (LLMs), digital teaching assistants configured for programming contexts have emerged as an appealing and scalable way to provide instant, equitable, round-the-clock support. Although digital TAs can provide a variety of help for programming tasks, from high-level problem solving advice to direct solution generation, the effectiveness of such tools depends on their ability to promote meaningful learning experiences. If students find the guardrails implemented in digital TAs too constraining, or if other expectations are not met, they may seek assistance in ways that do not help them learn. Thus, it is essential to identify the features that students believe make digital teaching assistants valuable. We deployed an LLM-powered digital assistant in an introductory programming course and collected student feedback ($n=813$) on the characteristics of the tool they perceived to be most important. Our results highlight that students value such tools for their ability to provide instant, engaging support, particularly during peak times such as before assessment deadlines. They also expressed a strong preference for features that enable them to retain autonomy in their learning journey, such as scaffolding that helps to guide them through problem-solving steps rather than simply being shown direct solutions. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted to ITiCSE 2024

arXiv:2405.05993 [pdf]

Precision Rehabilitation for Patients Post-Stroke based on Electronic Health Records and Machine Learning

Authors: Fengyi Gao, Xingyu Zhang, Sonish Sivarajkumar, Parker Denny, Bayan Aldhahwani, Shyam Visweswaran, Ryan Shi, William Hogan, Allyn Bove, Yanshan Wang

Abstract: In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text reha… ▽ More In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text rehabilitation procedure notes. We collected data for 265 stroke patients from the University of Pittsburgh Medical Center. We employed a pre-existing natural language processing (NLP) algorithm to extract data on rehabilitation exercises and developed a rule-based NLP algorithm to extract Activity Measure for Post-Acute Care (AM-PAC) scores, covering basic mobility (BM) and applied cognitive (AC) domains, from procedure notes. Changes in AM-PAC scores were classified based on the minimal clinically important difference (MCID), and significance was assessed using Friedman and Wilcoxon tests. To identify impactful exercises, we used Chi-square tests, Fisher's exact tests, and logistic regression for odds ratios. Additionally, we developed five machine learning models-logistic regression (LR), Adaboost (ADB), support vector machine (SVM), gradient boosting (GB), and random forest (RF)-to predict outcomes in functional ability. Statistical analyses revealed significant associations between functional improvements and specific exercises. The RF model achieved the best performance in predicting functional outcomes. In this study, we identified three rehabilitation exercises that significantly contributed to patient post-stroke functional ability improvement in the first two months. Additionally, the successful application of a machine learning model to predict patient-specific functional outcomes underscores the potential for precision rehabilitation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05347 [pdf, other]

Benchmarking Educational Program Repair

Authors: Charles Koutcheme, Nicola Dainese, Sami Sarsa, Juho Leinonen, Arto Hellas, Paul Denny

Abstract: The emergence of large language models (LLMs) has sparked enormous interest due to their potential application across a range of educational tasks. For example, recent work in programming education has used LLMs to generate learning resources, improve error messages, and provide feedback on code. However, one factor that limits progress within the field is that much of the research uses bespoke da… ▽ More The emergence of large language models (LLMs) has sparked enormous interest due to their potential application across a range of educational tasks. For example, recent work in programming education has used LLMs to generate learning resources, improve error messages, and provide feedback on code. However, one factor that limits progress within the field is that much of the research uses bespoke datasets and different evaluation metrics, making direct comparisons between results unreliable. Thus, there is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches. One task where LLMs show great promise is program repair, which can be used to provide debugging support and next-step hints to students. In this article, we propose a novel educational program repair benchmark. We curate two high-quality publicly available programming datasets, present a unified evaluation procedure introducing a novel evaluation metric rouge@k for approximating the quality of repairs, and evaluate a set of five recent models to establish baseline performance. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 15 pages, 2 figures, 3 tables. Non-archival report presented at the NeurIPS'23 Workshop on Generative AI for Education (GAIED)

arXiv:2405.05253 [pdf, other]

Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge

Authors: Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Paul Denny

Abstract: Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models… ▽ More Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as an automated evaluator by comparing its evaluations with those of a human expert. We observe that GPT-4 demonstrates a bias toward positively rating feedback while exhibiting moderate agreement with human raters, showcasing its potential as a feedback evaluator. Second, we explore the quality of feedback generated by several leading open-source LLMs by using GPT-4 to evaluate the feedback. We find that some models offer competitive performance with popular proprietary LLMs, such as ChatGPT, indicating opportunities for their responsible use in educational settings. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 7 pages, 4 figures, 2 tables. Accepted for publication at the 29th annual ACM conference on Innovation and Technology in Computer Science Education (ITiCSE 2024)

arXiv:2405.01477 [pdf, other]

"Sometimes You Just Gotta Risk It for the Biscuit": A Portrait of Student Risk-Taking

Authors: Juho Leinonen, Paul Denny

Abstract: Understanding how individuals, including students, make decisions involving risk is a fundamental aspect of behavioral research. Despite the ubiquity of risk in various aspects of life, limited empirical work has explored student risk-taking behavior in computing education. This study aims to partially replicate prior research on risk-taking behavior in software engineers while focusing on student… ▽ More Understanding how individuals, including students, make decisions involving risk is a fundamental aspect of behavioral research. Despite the ubiquity of risk in various aspects of life, limited empirical work has explored student risk-taking behavior in computing education. This study aims to partially replicate prior research on risk-taking behavior in software engineers while focusing on students, shedding light on the factors that affect their risk-taking choices. In our work, students were presented with a hypothetical scenario related to meeting a course project deadline, where they had to choose between a risky option and a safer alternative. We examined several factors that might influence these choices, including the framing of the decision (as a potential gain or loss), students' enjoyment of programming, perceived difficulty of programming, and their academic performance in the course. Our findings reveal intriguing insights into student risk-taking behavior. First, similar to software engineers in prior work, the framing of the decision significantly impacted the choices students made, with the loss framing leading to a higher likelihood for risky choices. Surprisingly, students displayed a greater inclination towards risk-taking compared to their professional counterparts in prior research. Furthermore, we observed that students' prior academic performance in the course and their enjoyment of programming had a subtle influence on their risk-taking tendencies, with better-performing students and those who enjoyed programming being marginally more prone to taking risks. Notably, we did not find statistically significant correlations between perceived difficulty of programming and risk-taking behavior among students. △ Less

Submitted 2 May, 2024; v1 submitted 1 April, 2024; originally announced May 2024.

Comments: 7 pages, 1 figure, 4 tables

arXiv:2404.11072 [pdf, other]

Large Language Models Meet User Interfaces: The Case of Provisioning Feedback

Authors: Stanislav Pozdniakov, Jonathan Brazil, Solmaz Abdi, Aneesha Bakharia, Shazia Sadiq, Dragan Gasevic, Paul Denny, Hassan Khosravi

Abstract: Incorporating Generative AI (GenAI) and Large Language Models (LLMs) in education can enhance teaching efficiency and enrich student learning. Current LLM usage involves conversational user interfaces (CUIs) for tasks like generating materials or providing feedback. However, this presents challenges including the need for educator expertise in AI and CUIs, ethical concerns with high-stakes decisio… ▽ More Incorporating Generative AI (GenAI) and Large Language Models (LLMs) in education can enhance teaching efficiency and enrich student learning. Current LLM usage involves conversational user interfaces (CUIs) for tasks like generating materials or providing feedback. However, this presents challenges including the need for educator expertise in AI and CUIs, ethical concerns with high-stakes decisions, and privacy risks. CUIs also struggle with complex tasks. To address these, we propose transitioning from CUIs to user-friendly applications leveraging LLMs via API calls. We present a framework for ethically incorporating GenAI into educational tools and demonstrate its application in our tool, Feedback Copilot, which provides personalized feedback on student assignments. Our evaluation shows the effectiveness of this approach, with implications for GenAI researchers, educators, and technologists. This work charts a course for the future of GenAI in education. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: submission to C&E AI

arXiv:2404.10990 [pdf, other]

doi 10.1145/3649217.3653568

Automating Personalized Parsons Problems with Customized Contexts and Concepts

Authors: Andre del Carpio Gutierrez, Paul Denny, Andrew Luxton-Reilly

Abstract: Parsons problems provide useful scaffolding for introductory programming students learning to write code. However, generating large numbers of high-quality Parsons problems that appeal to the diverse range of interests in a typical introductory course is a significant challenge for educators. Large language models (LLMs) may offer a solution, by allowing students to produce on-demand Parsons probl… ▽ More Parsons problems provide useful scaffolding for introductory programming students learning to write code. However, generating large numbers of high-quality Parsons problems that appeal to the diverse range of interests in a typical introductory course is a significant challenge for educators. Large language models (LLMs) may offer a solution, by allowing students to produce on-demand Parsons problems for topics covering the breadth of the introductory programming curriculum, and targeting thematic contexts that align with their personal interests. In this paper, we introduce PuzzleMakerPy, an educational tool that uses an LLM to generate unlimited contextualized drag-and-drop programming exercises in the form of Parsons Problems, which introductory programmers can use as a supplemental learning resource. We evaluated PuzzleMakerPy by deploying it in a large introductory programming course, and found that the ability to personalize the contextual framing used in problem descriptions was highly engaging for students, and being able to customize the programming topics was reported as being useful for their learning. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted to ITiCSE 2024

arXiv:2404.03120 [pdf, other]

doi 10.1145/3649217.3653580

Enhancing Student Engagement in Large-Scale Capstone Courses: An Experience Report

Authors: Asma Shakil, Paul Denny

Abstract: Computer science (CS) capstone courses offer students a valuable opportunity to gain hands-on experience in software development, practice essential soft skills, and enhance their employability prospects. They are a core component in many CS undergraduate degrees and address the ACM curricula requirements of inculcating professional dispositions in students and making them aware of the broader soc… ▽ More Computer science (CS) capstone courses offer students a valuable opportunity to gain hands-on experience in software development, practice essential soft skills, and enhance their employability prospects. They are a core component in many CS undergraduate degrees and address the ACM curricula requirements of inculcating professional dispositions in students and making them aware of the broader societal implications of computing. However, coordinating a capstone course, especially for a large student cohort, can be a daunting task for academic staff. It demands considerable time and energy for planning and coordinating activities between students, academic staff, and any external stakeholders. In this experience report, we outline the iterative development and refinement of our capstone course as it grew substantially in size over a span of six consecutive sessions. We outline the pedagogies that helped us to enhance student engagement and motivation in the course as assessed by end-of-course surveys and students' written reflections. We share the lessons that we have learnt and provide recommendations to educators who are designing new capstone courses or looking to scale existing ones. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted to ITiCSE 2024

arXiv:2403.09409 [pdf, ps, other]

"Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students using Large Language Models

Authors: Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Littlefield, Sami Sarsa, Stephen MacNeil

Abstract: Grasping complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between unfamiliar concepts and familiar ones, providing an engaging way to aid understanding. However, creating effective educational analogies is difficult even for experienced instructors.… ▽ More Grasping complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between unfamiliar concepts and familiar ones, providing an engaging way to aid understanding. However, creating effective educational analogies is difficult even for experienced instructors. We investigate to what extent large language models (LLMs), specifically ChatGPT, can provide access to personally relevant analogies on demand. Focusing on recursion, a challenging threshold concept, we conducted an investigation analyzing the analogies generated by more than 350 first-year computing students. They were provided with a code snippet and tasked to generate their own recursion-based analogies using ChatGPT, optionally including personally relevant topics in their prompts. We observed a great deal of diversity in the analogies produced with student-prescribed topics, in contrast to the otherwise generic analogies, highlighting the value of student creativity when working with LLMs. Not only did students enjoy the activity and report an improved understanding of recursion, but they described more easily remembering analogies that were personally and culturally relevant. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 7 pages, 2 figures, ITiCSE 2024 preprint

arXiv:2403.08396 [pdf, other]

A Picture Is Worth a Thousand Words: Exploring Diagram and Video-Based OOP Exercises to Counter LLM Over-Reliance

Authors: Bruno Pereira Cipriano, Pedro Alves, Paul Denny

Abstract: Much research has highlighted the impressive capabilities of large language models (LLMs), like GPT and Bard, for solving introductory programming exercises. Recent work has shown that LLMs can effectively solve a range of more complex object-oriented programming (OOP) exercises with text-based specifications. This raises concerns about academic integrity, as students might use these models to com… ▽ More Much research has highlighted the impressive capabilities of large language models (LLMs), like GPT and Bard, for solving introductory programming exercises. Recent work has shown that LLMs can effectively solve a range of more complex object-oriented programming (OOP) exercises with text-based specifications. This raises concerns about academic integrity, as students might use these models to complete assignments unethically, neglecting the development of important skills such as program design, problem-solving, and computational thinking. To address this, we propose an innovative approach to formulating OOP tasks using diagrams and videos, as a way to foster problem-solving and deter students from a copy-and-prompt approach in OOP courses. We introduce a novel notation system for specifying OOP assignments, encompassing structural and behavioral requirements, and assess its use in a classroom setting over a semester. Student perceptions of this approach are explored through a survey (n=56). Generally, students responded positively to diagrams and videos, with video-based projects being better received than diagram-based exercises. This notation appears to have several benefits, with students investing more effort in understanding the diagrams and feeling more motivated to engage with the video-based projects. Furthermore, students reported being less inclined to rely on LLM-based code generation tools for these diagram and video-based exercises. Experiments with GPT-4 and Bard's vision abilities revealed that they currently fall short in interpreting these diagrams to generate accurate code solutions. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: This is the author's draft of this paper

arXiv:2403.06050 [pdf, other]

Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills

Authors: Paul Denny, David H. Smith IV, Max Fowler, James Prather, Brett A. Becker, Juho Leinonen

Abstract: Reading, understanding and explaining code have traditionally been important skills for novices learning programming. As large language models (LLMs) become prevalent, these foundational skills are more important than ever given the increasing need to understand and evaluate model-generated code. Brand new skills are also needed, such as the ability to formulate clear prompts that can elicit inten… ▽ More Reading, understanding and explaining code have traditionally been important skills for novices learning programming. As large language models (LLMs) become prevalent, these foundational skills are more important than ever given the increasing need to understand and evaluate model-generated code. Brand new skills are also needed, such as the ability to formulate clear prompts that can elicit intended code from an LLM. Thus, there is great interest in integrating pedagogical approaches for the development of both traditional coding competencies and the novel skills required to interact with LLMs. One effective way to develop and assess code comprehension ability is with ``Explain in plain English'' (EiPE) questions, where students succinctly explain the purpose of a fragment of code. However, grading EiPE questions has always been difficult given the subjective nature of evaluating written explanations and this has stifled their uptake. In this paper, we explore a natural synergy between EiPE questions and code-generating LLMs to overcome this limitation. We propose using an LLM to generate code based on students' responses to EiPE questions -- not only enabling EiPE responses to be assessed automatically, but helping students develop essential code comprehension and prompt crafting skills in parallel. We investigate this idea in an introductory programming course and report student success in creating effective prompts for solving EiPE questions. We also examine student perceptions of this activity and how it influences their views on the use of LLMs for aiding and assessing learning. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted to ITiCSE 2024

arXiv:2402.01580 [pdf, other]

Generative AI for Education (GAIED): Advances, Opportunities, and Challenges

Authors: Paul Denny, Sumit Gulwani, Neil T. Heffernan, Tanja Käser, Steven Moore, Anna N. Rafferty, Adish Singla

Abstract: This survey article has grown out of the GAIED (pronounced "guide") workshop organized by the authors at the NeurIPS 2023 conference. We organized the GAIED workshop as part of a community-building effort to bring together researchers, educators, and practitioners to explore the potential of generative AI for enhancing education. This article aims to provide an overview of the workshop activities… ▽ More This survey article has grown out of the GAIED (pronounced "guide") workshop organized by the authors at the NeurIPS 2023 conference. We organized the GAIED workshop as part of a community-building effort to bring together researchers, educators, and practitioners to explore the potential of generative AI for enhancing education. This article aims to provide an overview of the workshop activities and highlight several future research directions in the area of GAIED. △ Less

Submitted 6 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.11314 [pdf, other]

doi 10.1145/3613904.3642773

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs

Authors: Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Z. Henley, Paul Denny, Michelle Craig, Tovi Grossman

Abstract: Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, gene… ▽ More Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, generates pseudo-code with line-by-line explanations, and annotates student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights. Our findings reveal four design considerations for future educational AI assistants: D1) exploiting AI's unique benefits; D2) simplifying query formulation while promoting cognitive engagement; D3) avoiding direct responses while encouraging motivated learning; and D4) maintaining transparency and control for students to asses and steer AI responses. △ Less

Submitted 25 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

Comments: CHI 2024 Paper - The paper includes 17 pages, 8 figures, 2 tables, along with a 2-page appendix

arXiv:2401.10759 [pdf, other]

Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models

Authors: James Prather, Paul Denny, Juho Leinonen, David H. Smith IV, Brent N. Reeves, Stephen MacNeil, Brett A. Becker, Andrew Luxton-Reilly, Thezyrie Amarouche, Bailey Kimmel

Abstract: Large Language Models (LLMs) have upended decades of pedagogy in computing education. Students previously learned to code through \textit{writing} many small problems with less emphasis on code reading and comprehension. Recent research has shown that free code generation tools powered by LLMs can solve introductory programming problems presented in natural language with ease. In this paper, we pr… ▽ More Large Language Models (LLMs) have upended decades of pedagogy in computing education. Students previously learned to code through \textit{writing} many small problems with less emphasis on code reading and comprehension. Recent research has shown that free code generation tools powered by LLMs can solve introductory programming problems presented in natural language with ease. In this paper, we propose a new way to teach programming with Prompt Problems. Students receive a problem visually, indicating how input should be transformed to output, and must translate that to a prompt for an LLM to decipher. The problem is considered correct when the code that is generated by the student prompt can pass all test cases. In this paper we present the design of this tool, discuss student interactions with it as they learn, and provide insights into this new class of programming problems as well as the design tools that integrate LLMs. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: accepted for CHI 2024

arXiv:2311.16017 [pdf, other]

Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models

Authors: Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth Bernstein, Arto Hellas, Sami Sarsa, Joanne Kim

Abstract: Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior -- in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard t… ▽ More Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior -- in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard to spot when reading the code, and they can also at times be missed by automated tests. There is great educational potential in automatically detecting logic errors, especially when paired with suitable feedback for novices. Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks, including generating and explaining code. These capabilities are closely linked to code syntax, which aligns with the next token prediction behavior of LLMs. On the other hand, logic errors relate to the runtime performance of code and thus may not be as well suited to analysis by LLMs. To explore this, we investigate the performance of two popular LLMs, GPT-3 and GPT-4, for detecting and providing a novice-friendly explanation of logic errors. We compare LLM performance with a large cohort of introductory computing students $(n=964)$ solving the same error detection task. Through a mixed-methods analysis of student and model responses, we observe significant improvement in logic error identification between the previous and current generation of LLMs, and find that both LLM generations significantly outperform students. We outline how such models could be integrated into computing education tools, and discuss their potential for supporting students when learning programming. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.10777 [pdf, other]

A Systematic Review of Aspect-based Sentiment Analysis (ABSA): Domains, Methods, and Trends

Authors: Yan Cathy Hua, Paul Denny, Katerina Taskova, Jörg Wicker

Abstract: Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment analysis that identifies aspects and their associated opinions from a given text. With the surge of digital opinionated text data, ABSA gained increasing popularity for its ability to mine more detailed and targeted insights. Many review papers on ABSA subtasks and solution methodologies exist, however, few focus on trends… ▽ More Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment analysis that identifies aspects and their associated opinions from a given text. With the surge of digital opinionated text data, ABSA gained increasing popularity for its ability to mine more detailed and targeted insights. Many review papers on ABSA subtasks and solution methodologies exist, however, few focus on trends over time or systemic issues relating to research application domains, datasets, and solution approaches. To fill the gap, this paper presents a Systematic Literature Review (SLR) of ABSA studies with a focus on trends and high-level relationships among these fundamental components. This review is one of the largest SLRs on ABSA, and also, to our knowledge, the first that systematically examines the trends and inter-relations among ABSA research and data distribution across domains and solution paradigms and approaches. Our sample includes 519 primary studies screened from 4191 search results without time constraints via an innovative automatic filtering process. Our quantitative analysis not only identifies trends in nearly two decades of ABSA research development but also unveils a systemic lack of dataset and domain diversity as well as domain mismatch that may hinder the development of future ABSA research. We discuss these findings and their implications and propose suggestions for future research. △ Less

Submitted 16 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.07321 [pdf, other]

Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images

Authors: Aryan Singh, Pepijn Van de Ven, Ciarán Eising, Patrick Denny

Abstract: Deep learning models have demonstrated remarkable results for various computer vision tasks, including the realm of medical imaging. However, their application in the medical domain is limited due to the requirement for large amounts of training data, which can be both challenging and expensive to obtain. To mitigate this, pre-trained models have been fine-tuned on domain-specific data, but such a… ▽ More Deep learning models have demonstrated remarkable results for various computer vision tasks, including the realm of medical imaging. However, their application in the medical domain is limited due to the requirement for large amounts of training data, which can be both challenging and expensive to obtain. To mitigate this, pre-trained models have been fine-tuned on domain-specific data, but such an approach can suffer from inductive biases. Furthermore, deep learning models struggle to learn the relationship between spatially distant features and their importance, as convolution operations treat all pixels equally. Pioneering a novel solution to this challenge, we employ the Image Foresting Transform to optimally segment images into superpixels. These superpixels are subsequently transformed into graph-structured data, enabling the proficient extraction of features and modeling of relationships using Graph Neural Networks (GNNs). Our method harnesses an ensemble of three distinct GNN architectures to boost its robustness. In our evaluations targeting pneumonia classification, our methodology surpassed prevailing Deep Neural Networks (DNNs) in performance, all while drastically cutting down on the parameter count. This not only trims down the expenses tied to data but also accelerates training and minimizes bias. Consequently, our proposition offers a sturdy, economically viable, and scalable strategy for medical image classification, significantly diminishing dependency on extensive training data sets. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Our code is available at https://github.com/aryan-at-ul/AICS_2023_submission

Journal ref: AICS 2023

arXiv:2311.05943 [pdf, other]

Prompt Problems: A New Programming Exercise for the Generative AI Era

Authors: Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, Brent N. Reeves

Abstract: Large Language Models (LLMs) are revolutionizing the field of computing education with their powerful code-generating capabilities. Traditional pedagogical practices have focused on code writing tasks, but there is now a shift in importance towards code reading, comprehension and evaluation of LLM-generated code. Alongside this shift, an important new skill is emerging -- the ability to solve prog… ▽ More Large Language Models (LLMs) are revolutionizing the field of computing education with their powerful code-generating capabilities. Traditional pedagogical practices have focused on code writing tasks, but there is now a shift in importance towards code reading, comprehension and evaluation of LLM-generated code. Alongside this shift, an important new skill is emerging -- the ability to solve programming tasks by constructing good prompts for code-generating models. In this work we introduce a new type of programming exercise to hone this nascent skill: 'Prompt Problems'. Prompt Problems are designed to help students learn how to write effective prompts for AI code generators. A student solves a Prompt Problem by crafting a natural language prompt which, when provided as input to an LLM, outputs code that successfully solves a specified programming task. We also present a new web-based tool called Promptly which hosts a repository of Prompt Problems and supports the automated evaluation of prompt-generated code. We deploy Promptly for the first time in one CS1 and one CS2 course and describe our experiences, which include student perceptions of this new type of activity and their interactions with the tool. We find that students are enthusiastic about Prompt Problems, and appreciate how the problems engage their computational thinking skills and expose them to new programming constructs. We discuss ideas for the future development of new variations of Prompt Problems, and the need to carefully study their integration into classroom practice. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: Accepted to SIGCSE'24. arXiv admin note: substantial text overlap with arXiv:2307.16364

arXiv:2311.02775 [pdf, other]

AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Authors: Yann Hicke, Anmol Agarwal, Qianou Ma, Paul Denny

Abstract: Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data priva… ▽ More Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data privacy. Our approach combines augmentation techniques such as retrieval augmented generation (RAG), supervised fine-tuning (SFT), and learning from human preferences data using Direct Preference Optimization (DPO). Through extensive experimentation on a Piazza dataset from an introductory CS course, comprising 10,000 QA pairs and 1,500 pairs of preference data, we demonstrate a significant 30% improvement in the quality of answers, with RAG being a particularly impactful addition. Our contributions include the development of a novel architecture for educational QA, extensive evaluations of LLM performance utilizing both human assessments and LLM-based metrics, and insights into the challenges and future directions of educational data processing. This work paves the way for the development of AI-TA, an intelligent QA assistant customizable for courses with an online QA platform △ Less

Submitted 18 December, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: Updates for camera-ready submission

Journal ref: NeurIPS Workshop on Generative AI for Education (GAIED), 2023

arXiv:2310.20105 [pdf, other]

Efficient Classification of Student Help Requests in Programming Courses Using Large Language Models

Authors: Jaromir Savelka, Paul Denny, Mark Liffiton, Brad Sheese

Abstract: The accurate classification of student help requests with respect to the type of help being sought can enable the tailoring of effective responses. Automatically classifying such requests is non-trivial, but large language models (LLMs) appear to offer an accessible, cost-effective solution. This study evaluates the performance of the GPT-3.5 and GPT-4 models for classifying help requests from stu… ▽ More The accurate classification of student help requests with respect to the type of help being sought can enable the tailoring of effective responses. Automatically classifying such requests is non-trivial, but large language models (LLMs) appear to offer an accessible, cost-effective solution. This study evaluates the performance of the GPT-3.5 and GPT-4 models for classifying help requests from students in an introductory programming class. In zero-shot trials, GPT-3.5 and GPT-4 exhibited comparable performance on most categories, while GPT-4 outperformed GPT-3.5 in classifying sub-categories for requests related to debugging. Fine-tuning the GPT-3.5 model improved its performance to such an extent that it approximated the accuracy and consistency across categories observed between two human raters. Overall, this study demonstrates the feasibility of using LLMs to enhance educational systems through the automated classification of student needs. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.16984 [pdf, other]

doi 10.1145/3636243.3636249

Patterns of Student Help-Seeking When Using a Large Language Model-Powered Programming Assistant

Authors: Brad Sheese, Mark Liffiton, Jaromir Savelka, Paul Denny

Abstract: Providing personalized assistance at scale is a long-standing challenge for computing educators, but a new generation of tools powered by large language models (LLMs) offers immense promise. Such tools can, in theory, provide on-demand help in large class settings and be configured with appropriate guardrails to prevent misuse and mitigate common concerns around learner over-reliance. However, the… ▽ More Providing personalized assistance at scale is a long-standing challenge for computing educators, but a new generation of tools powered by large language models (LLMs) offers immense promise. Such tools can, in theory, provide on-demand help in large class settings and be configured with appropriate guardrails to prevent misuse and mitigate common concerns around learner over-reliance. However, the deployment of LLM-powered tools in authentic classroom settings is still rare, and very little is currently known about how students will use them in practice and what type of help they will seek. To address this, we examine students' use of an innovative LLM-powered tool that provides on-demand programming assistance without revealing solutions directly. We deployed the tool for 12 weeks in an introductory computer and data science course ($n = 52$), collecting more than 2,500 queries submitted by students throughout the term. We manually categorized all student queries based on the type of assistance sought, and we automatically analyzed several additional query characteristics. We found that most queries requested immediate help with programming assignments, whereas fewer requests asked for help on related concepts or for deepening conceptual understanding. Furthermore, students often provided minimal information to the tool, suggesting this is an area in which targeted instruction would be beneficial. We also found that students who achieved more success in the course tended to have used the tool more frequently overall. Lessons from this research can be leveraged by programming educators and institutions who plan to augment their teaching with emerging LLM-powered tools. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.00658 [pdf, other]

doi 10.1145/3623762.3633499

The Robots are Here: Navigating the Generative AI Revolution in Computing Education

Authors: James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, Jaromir Savelka

Abstract: Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and t… ▽ More Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: 39 pages of content + 12 pages of references and appendices

arXiv:2309.13500 [pdf, other]

Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy

Authors: Lin Ni, Sijie Wang, Zeyu Zhang, Xiaoxuan Li, Xianda Zheng, Paul Denny, Jiamou Liu

Abstract: Learnersourcing offers great potential for scalable education through student content creation. However, predicting student performance on learnersourced questions, which is essential for personalizing the learning experience, is challenging due to the inherent noise in student-generated data. Moreover, while conventional graph-based methods can capture the complex network of student and question… ▽ More Learnersourcing offers great potential for scalable education through student content creation. However, predicting student performance on learnersourced questions, which is essential for personalizing the learning experience, is challenging due to the inherent noise in student-generated data. Moreover, while conventional graph-based methods can capture the complex network of student and question interactions, they often fall short under cold start conditions where limited student engagement with questions yields sparse data. To address both challenges, we introduce an innovative strategy that synergizes the potential of integrating Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology employs a signed bipartite graph to comprehensively model student answers, complemented by a contrastive learning framework that enhances noise resilience. Furthermore, LLM's contribution lies in generating foundational question embeddings, proving especially advantageous in addressing cold start scenarios characterized by limited graph data. Validation across five real-world datasets sourced from the PeerWise platform underscores our approach's effectiveness. Our method outperforms baselines, showcasing enhanced predictive accuracy and robustness. △ Less

Submitted 28 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

MSC Class: 97P80

arXiv:2309.10444 [pdf, other]

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Authors: Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Gaël Gendron, Timothy Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other stud… ▽ More Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other students understand the solution and promotes a deeper understanding of related concepts. However, it is often difficult for students to craft effective solution explanations, due to limited subject understanding. To help scaffold the task of automated explanation generation, we present and evaluate a framework called "ILearner-LLM", that iteratively enhances the generated explanations for the given questions with large language models. Comprising an explanation generation model and an explanation evaluation model, the framework generates high-quality student-aligned explanations by iteratively feeding the quality rating score from the evaluation model back into the instruction prompt of the explanation generation model. Experimental results demonstrate the effectiveness of our ILearner-LLM on LLaMA2-13B and GPT-4 to generate higher quality explanations that are closer to those written by students on five PeerWise datasets. Our findings represent a promising path to enrich the learnersourcing experience for students and to enhance the capabilities of large language models for educational applications. △ Less

Submitted 10 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: The short version (v4) was accepted as a non-archival workshop paper at AGI@ICLR 2024; the full version is under review

arXiv:2308.06921 [pdf, other]

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Authors: Mark Liffiton, Brad Sheese, Jaromir Savelka, Paul Denny

Abstract: Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool… ▽ More Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions. We detail the design of the tool, which incorporates a number of useful features for instructors, and elaborate on the pipeline of prompting strategies we use to ensure generated outputs are suitable for students. To evaluate CodeHelp, we deployed it in a first-year computer and data science course with 52 students and collected student interactions over a 12-week period. We examine students' usage patterns and perceptions of the tool, and we report reflections from the course instructor and a series of recommendations for classroom use. Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2307.16364 [pdf, other]

Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators

Authors: Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, Brent N. Reeves

Abstract: With their remarkable ability to generate code, large language models (LLMs) are a transformative technology for computing education practice. They have created an urgent need for educators to rethink pedagogical approaches and teaching strategies for newly emerging skill sets. Traditional approaches to learning programming have focused on frequent and repeated practice at writing code. The ease w… ▽ More With their remarkable ability to generate code, large language models (LLMs) are a transformative technology for computing education practice. They have created an urgent need for educators to rethink pedagogical approaches and teaching strategies for newly emerging skill sets. Traditional approaches to learning programming have focused on frequent and repeated practice at writing code. The ease with which code can now be generated has resulted in a shift in focus towards reading, understanding and evaluating LLM-generated code. In parallel with this shift, a new essential skill is emerging -- the ability to construct good prompts for code-generating models. This paper introduces a novel pedagogical concept known as a `Prompt Problem', designed to help students learn how to craft effective prompts for LLMs. A Prompt Problem challenges a student to create a natural language prompt that leads an LLM to produce the correct code for a specific problem. To support the delivery of Prompt Problems at scale, in this paper we also present a novel tool called Promptly which hosts a repository of Prompt Problems and automates the evaluation of prompt-generated code. We report empirical findings from a field study in which Promptly was deployed in a first-year Python programming course (n=54). We explore student interactions with the tool and their perceptions of the Prompt Problem concept. We found that Promptly was largely well-received by students for its ability to engage their computational thinking skills and expose them to new programming constructs. We also discuss avenues for future work, including variations on the design of Prompt Problems and the need to study their integration into the curriculum and teaching practice. △ Less

Submitted 30 July, 2023; originally announced July 2023.

arXiv:2307.12790 [pdf, other]

Compact & Capable: Harnessing Graph Neural Networks and Edge Convolution for Medical Image Classification

Authors: Aryan Singh, Pepijn Van de Ven, Ciarán Eising, Patrick Denny

Abstract: Graph-based neural network models are gaining traction in the field of representation learning due to their ability to uncover latent topological relationships between entities that are otherwise challenging to identify. These models have been employed across a diverse range of domains, encompassing drug discovery, protein interactions, semantic segmentation, and fluid dynamics research. In this s… ▽ More Graph-based neural network models are gaining traction in the field of representation learning due to their ability to uncover latent topological relationships between entities that are otherwise challenging to identify. These models have been employed across a diverse range of domains, encompassing drug discovery, protein interactions, semantic segmentation, and fluid dynamics research. In this study, we investigate the potential of Graph Neural Networks (GNNs) for medical image classification. We introduce a novel model that combines GNNs and edge convolution, leveraging the interconnectedness of RGB channel feature values to strongly represent connections between crucial graph nodes. Our proposed model not only performs on par with state-of-the-art Deep Neural Networks (DNNs) but does so with 1000 times fewer parameters, resulting in reduced training time and data requirements. We compare our Graph Convolutional Neural Network (GCNN) to pre-trained DNNs for classifying MedMNIST dataset classes, revealing promising prospects for GNNs in medical image analysis. Our results also encourage further exploration of advanced graph-based models such as Graph Attention Networks (GAT) and Graph Auto-Encoders in the medical imaging domain. The proposed model yields more reliable, interpretable, and accurate outcomes for tasks like semantic segmentation and image classification compared to simpler GCNNs △ Less

Submitted 24 July, 2023; originally announced July 2023.

Journal ref: Proceedings of the Irish Machine Vision and Image Processing Conference 2023

arXiv:2306.10509 [pdf, other]

Can We Trust AI-Generated Educational Content? Comparative Analysis of Human and AI-Generated Learning Resources

Authors: Paul Denny, Hassan Khosravi, Arto Hellas, Juho Leinonen, Sami Sarsa

Abstract: As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors. In this study, we investigated the potential for… ▽ More As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors. In this study, we investigated the potential for LLMs to produce learning resources in an introductory programming context, by comparing the quality of the resources generated by an LLM with those created by students as part of a learnersourcing activity. Using a blind evaluation, students rated the correctness and helpfulness of resources generated by AI and their peers, after both were initially provided with identical exemplars. Our results show that the quality of AI-generated resources, as perceived by students, is equivalent to the quality of resources generated by their peers. This suggests that AI-generated resources may serve as viable supplementary material in certain contexts. Resources generated by LLMs tend to closely mirror the given exemplars, whereas student-generated resources exhibit greater variety in terms of content length and specific syntax features used. The study highlights the need for further research exploring different types of learning resources and a broader range of subject areas, and understanding the long-term impact of AI-generated resources on learning outcomes. △ Less

Submitted 3 July, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

arXiv:2306.06386 [pdf, other]

doi 10.1016/j.caeai.2023.100151

Learnersourcing in the Age of AI: Student, Educator and Machine Partnerships for Content Creation

Authors: Hassan Khosravi, Paul Denny, Steven Moore, John Stamper

Abstract: Engaging students in creating novel content, also referred to as learnersourcing, is increasingly recognised as an effective approach to promoting higher-order learning, deeply engaging students with course material and developing large repositories of content suitable for personalized learning. Despite these benefits, some common concerns and criticisms are associated with learnersourcing (e.g.,… ▽ More Engaging students in creating novel content, also referred to as learnersourcing, is increasingly recognised as an effective approach to promoting higher-order learning, deeply engaging students with course material and developing large repositories of content suitable for personalized learning. Despite these benefits, some common concerns and criticisms are associated with learnersourcing (e.g., the quality of resources created by students, challenges in incentivising engagement and lack of availability of reliable learnersourcing systems), which have limited its adoption. This paper presents a framework that considers the existing learnersourcing literature, the latest insights from the learning sciences and advances in AI to offer promising future directions for developing learnersourcing systems. The framework is designed around important questions and human-AI partnerships relating to four key aspects: (1) creating novel content, (2) evaluating the quality of the created content, (3) utilising learnersourced contributions of students and (4) enabling instructors to support students in the learnersourcing process. We then present two comprehensive case studies that illustrate the application of the proposed framework in relation to two existing popular learnersourcing systems. △ Less

Submitted 10 June, 2023; originally announced June 2023.

arXiv:2306.02608 [pdf, other]

Computing Education in the Era of Generative AI

Authors: Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, Sami Sarsa

Abstract: The computing education community has a rich history of pedagogical innovation designed to support students in introductory courses, and to support teachers in facilitating student learning. Very recent advances in artificial intelligence have resulted in code generation models that can produce source code from natural language problem descriptions -- with impressive accuracy in many cases. The wi… ▽ More The computing education community has a rich history of pedagogical innovation designed to support students in introductory courses, and to support teachers in facilitating student learning. Very recent advances in artificial intelligence have resulted in code generation models that can produce source code from natural language problem descriptions -- with impressive accuracy in many cases. The wide availability of these models and their ease of use has raised concerns about potential impacts on many aspects of society, including the future of computing education. In this paper, we discuss the challenges and opportunities such models present to computing educators, with a focus on introductory programming classrooms. We summarize the results of two recent articles, the first evaluating the performance of code generation models on typical introductory-level programming problems, and the second exploring the quality and novelty of learning resources generated by these models. We consider likely impacts of such models upon pedagogical practice in the context of the most recent advances at the time of writing. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted for publication as a Contributed Article in Communications of the ACM (CACM)

arXiv:2305.12599 [pdf, other]

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

Authors: Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data… ▽ More Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning. △ Less

Submitted 6 June, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 21 pages, 8 figures, the Findings of ACL 2024

arXiv:2304.06279 [pdf, other]

doi 10.1145/3587102.3588854

Using Sensor-Based Programming to Improve Self-Efficacy and Outcome Expectancy for Students from Underrepresented Groups

Authors: Hussel Suriyaarachchi, Alaeddin Nassani, Paul Denny, Suranga Nanayakkara

Abstract: Knowledge of programming and computing is becoming increasingly valuable in today's world, and thus it is crucial that students from all backgrounds have the opportunity to learn. As the teaching of computing at high-school becomes more common, there is a growing need for approaches and tools that are effective and engaging for all students. Especially for students from groups that are traditional… ▽ More Knowledge of programming and computing is becoming increasingly valuable in today's world, and thus it is crucial that students from all backgrounds have the opportunity to learn. As the teaching of computing at high-school becomes more common, there is a growing need for approaches and tools that are effective and engaging for all students. Especially for students from groups that are traditionally underrepresented at university level, positive experiences at high-school can be an important factor for their future academic choices. In this paper we report on a hands-on programming workshop that we ran over multiple sessions for Maori and Pasifika high-school students who are underrepresented in computer science at the tertiary level in New Zealand. In the workshop, participants developed Scratch programs starting from a simple template we provided. In order to control the action in their programs, half of the participants used standard mouse and keyboard inputs, and the other half had access to plug-and-play sensors that provided real-time environmental data. We explore how students' perceptions of self-efficacy and outcome expectancy -- both key constructs driving academic career choices -- changed during the workshop and how these were impacted by the availability of the sensor toolkit. We found that participants enjoyed the workshop and reported improved self-efficacy with or without use of the toolkit, but outcome expectancy improved only for students who used the sensor toolkit. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2304.03938 [pdf, other]

doi 10.1145/3587102.3588785

Comparing Code Explanations Created by Students and Large Language Models

Authors: Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, Arto Hellas

Abstract: Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs cor… ▽ More Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, developing the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course ($n \approx 1000$) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: 8 pages, 3 figures. To be published in Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

arXiv:2304.03450 [pdf, other]

Striving for Authentic and Sustained Technology Use In the Classroom: Lessons Learned from a Longitudinal Evaluation of a Sensor-based Science Education Platform

Authors: Yvonne Chua, Sankha Cooray, Juan Pablo Forero Cortes, Paul Denny, Sonia Dupuch, Dawn L Garbett, Alaeddin Nassani, Jiashuo Cao, Hannah Qiao, Andrew Reis, Deviana Reis, Philipp M. Scholl, Priyashri Kamlesh Sridhar, Hussel Suriyaarachchi, Fiona Taimana, Vanessa Tanga, Chamod Weerasinghe, Elliott Wen, Michelle Wu, Qin Wu, Haimo Zhang, Suranga Nanayakkara

Abstract: Technology integration in educational settings has led to the development of novel sensor-based tools that enable students to measure and interact with their environment. Although reports from using such tools can be positive, evaluations are often conducted under controlled conditions and short timeframes. There is a need for longitudinal data collected in realistic classroom settings. However, s… ▽ More Technology integration in educational settings has led to the development of novel sensor-based tools that enable students to measure and interact with their environment. Although reports from using such tools can be positive, evaluations are often conducted under controlled conditions and short timeframes. There is a need for longitudinal data collected in realistic classroom settings. However, sustained and authentic classroom use requires technology platforms to be seen by teachers as both easy to use and of value. We describe our development of a sensor-based platform to support science teaching that followed a 14-month user-centered design process. We share insights from this design and development approach, and report findings from a 6-month large-scale evaluation involving 35 schools and 1245 students. We share lessons learnt, including that technology integration is not an educational goal per se and that technology should be a transparent tool to enable students to achieve their learning goals. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2304.02491 [pdf, other]

doi 10.1145/3617367

"It's Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers

Authors: James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, Eddie Antonio Santos

Abstract: Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code can now use free tools to automatically suggest solutions to programming exercises and assignments. However, little is currently known about how… ▽ More Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code can now use free tools to automatically suggest solutions to programming exercises and assignments. However, little is currently known about how novices interact with these tools in practice. We present the first study that observes students at the introductory level using one such code auto-generating tool, Github Copilot, on a typical introductory programming (CS1) assignment. Through observations and interviews we explore student perceptions of the benefits and pitfalls of this technology for learning, present new observed interaction patterns, and discuss cognitive and metacognitive difficulties faced by students. We consider design implications of these findings, specifically in terms of how tools like Copilot can better support and scaffold the novice programming experience. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 26 pages, 2 figures, TOCHI

arXiv:2303.13528 [pdf]

doi 10.1371/journal.pcbi.1011511

Many bioinformatics programming tasks can be automated with ChatGPT

Authors: Stephen R. Piccolo, Paul Denny, Andrew Luxton-Reilly, Samuel Payne, Perry G. Ridge

Abstract: Computer programming is a fundamental tool for life scientists, allowing them to carry out many essential research tasks. However, despite a variety of educational efforts, learning to write code can be a challenging endeavor for both researchers and students in life science disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functio… ▽ More Computer programming is a fundamental tool for life scientists, allowing them to carry out many essential research tasks. However, despite a variety of educational efforts, learning to write code can be a challenging endeavor for both researchers and students in life science disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such model -- OpenAI's ChatGPT -- can successfully complete basic- to moderate-level programming tasks. On its first attempt, ChatGPT solved 139 (75.5%) of the exercises. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have important implications for life-sciences research and education. For many programming tasks, researchers no longer need to write code from scratch. Instead, machine-learning models may produce usable solutions. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 13 pages, 4 figures, to be submitted for publication

arXiv:2303.13466 [pdf]

Mining Clinical Notes for Physical Rehabilitation Exercise Information: Natural Language Processing Algorithm Development and Validation Study

Authors: Sonish Sivarajkumar, Fengyi Gao, Parker E. Denny, Bayan M. Aldhahwani, Shyam Visweswaran, Allyn Bove, Yanshan Wang

Abstract: Post-stroke patient rehabilitation requires precise, personalized treatment plans. Natural Language Processing (NLP) offers potential to extract valuable exercise information from clinical notes, aiding in the development of more effective rehabilitation strategies. Objective: This study aims to develop and evaluate a variety of NLP algorithms to extract and categorize physical rehabilitation exer… ▽ More Post-stroke patient rehabilitation requires precise, personalized treatment plans. Natural Language Processing (NLP) offers potential to extract valuable exercise information from clinical notes, aiding in the development of more effective rehabilitation strategies. Objective: This study aims to develop and evaluate a variety of NLP algorithms to extract and categorize physical rehabilitation exercise information from the clinical notes of post-stroke patients treated at the University of Pittsburgh Medical Center. A cohort of 13,605 patients diagnosed with stroke was identified, and their clinical notes containing rehabilitation therapy notes were retrieved. A comprehensive clinical ontology was created to represent various aspects of physical rehabilitation exercises. State-of-the-art NLP algorithms were then developed and compared, including rule-based, machine learning-based algorithms, and large language model (LLM)-based algorithms (ChatGPT). Analysis was conducted on a dataset comprising 23,724 notes with detailed demographic and clinical characteristics. The rule-based NLP algorithm demonstrated superior performance in most areas, particularly in detecting the 'Right Side' location with an F1 score of 0.975, outperforming Gradient Boosting by 0.063. Gradient Boosting excelled in 'Lower Extremity' location detection (F1 score: 0.978), surpassing rule-based NLP by 0.023. It also showed notable performance in 'Passive Range of Motion' with an F1 score of 0.970, a 0.032 improvement over rule-based NLP. The rule-based algorithm efficiently handled 'Duration', 'Sets', and 'Reps' with F1 scores up to 0.65. LLM-based NLP, particularly ChatGPT with few-shot prompts, achieved high recall but generally lower precision and F1 scores. However, it notably excelled in 'Backward Plane' motion detection, achieving an F1 score of 0.846, surpassing the rule-based algorithm's 0.720. △ Less

Submitted 15 March, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2212.05113 [pdf, other]

doi 10.1145/3545947.3569630

Automatically Generating CS Learning Materials with Large Language Models

Authors: Stephen MacNeil, Andrew Tran, Juho Leinonen, Paul Denny, Joanne Kim, Arto Hellas, Seth Bernstein, Sami Sarsa

Abstract: Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advances may enable students to interact with code in ne… ▽ More Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advances may enable students to interact with code in new ways while helping instructors scale their learning materials. However, LLMs also introduce new implications for academic integrity, curriculum design, and software engineering careers. This workshop will demonstrate the capabilities of LLMs to help attendees evaluate whether and how LLMs might be integrated into their pedagogy and research. We will also engage attendees in brainstorming to consider how LLMs will impact our field. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: In Proceedings of the 54th ACM Technical Symposium on Computing Science Education

arXiv:2212.01020 [pdf, other]

Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation

Authors: Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, Eddie Antonio Santos

Abstract: The introductory programming sequence has been the focus of much research in computing education. The recent advent of several viable and freely-available AI-driven code generation tools present several immediate opportunities and challenges in this domain. In this position paper we argue that the community needs to act quickly in deciding what possible opportunities can and should be leveraged an… ▽ More The introductory programming sequence has been the focus of much research in computing education. The recent advent of several viable and freely-available AI-driven code generation tools present several immediate opportunities and challenges in this domain. In this position paper we argue that the community needs to act quickly in deciding what possible opportunities can and should be leveraged and how, while also working on how to overcome or otherwise mitigate the possible challenges. Assuming that the effectiveness and proliferation of these tools will continue to progress rapidly, without quick, deliberate, and concerted efforts, educators will lose advantage in helping shape what opportunities come to be, and what challenges will endure. With this paper we aim to seed this discussion within the computing education community. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 7 pages

ACM Class: K.3.2; I.2; H.5.2

arXiv:2211.04715 [pdf, other]

Robosourcing Educational Resources -- Leveraging Large Language Models for Learnersourcing

Authors: Paul Denny, Sami Sarsa, Arto Hellas, Juho Leinonen

Abstract: In this article, we introduce and evaluate the concept of robosourcing for creating educational content. Robosourcing lies in the intersection of crowdsourcing and large language models, where instead of a crowd of humans, requests to large language models replace some of the work traditionally performed by the crowd. Robosourcing includes a human-in-the-loop to provide priming (input) as well as… ▽ More In this article, we introduce and evaluate the concept of robosourcing for creating educational content. Robosourcing lies in the intersection of crowdsourcing and large language models, where instead of a crowd of humans, requests to large language models replace some of the work traditionally performed by the crowd. Robosourcing includes a human-in-the-loop to provide priming (input) as well as to evaluate and potentially adjust the generated artefacts; these evaluations could also be used to improve the large language models. We propose a system to outline the robosourcing process. We further study the feasibility of robosourcing in the context of education by conducting an evaluation of robosourced and programming exercises, generated using OpenAI Codex. Our results suggest that robosourcing could significantly reduce human effort in creating diverse educational content while maintaining quality similar to human-created content. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.02265 [pdf, other]

Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book

Authors: Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, Juho Leinonen

Abstract: Advances in natural language processing have resulted in large language models (LLMs) that are capable of generating understandable and sensible written text. Recent versions of these models, such as OpenAI Codex and GPT-3, can generate code and code explanations. However, it is unclear whether and how students might engage with such explanations. In this paper, we report on our experiences genera… ▽ More Advances in natural language processing have resulted in large language models (LLMs) that are capable of generating understandable and sensible written text. Recent versions of these models, such as OpenAI Codex and GPT-3, can generate code and code explanations. However, it is unclear whether and how students might engage with such explanations. In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. We modified the e-book to make LLM-generated code explanations accessible through buttons next to code snippets in the materials, which allowed us to track the use of the explanations as well as to ask for feedback on their utility. Three different types of explanations were available for students for each explainable code snippet; a line-by-line explanation, a list of important concepts, and a high-level summary of the code. Our preliminary results show that all varieties of explanations were viewed by students and that the majority of students perceived the code explanations as helpful to them. However, student engagement appeared to vary by code snippet complexity, explanation type, and code snippet length. Drawing on our experiences, we discuss future directions for integrating explanations generated by LLMs into existing computer science classrooms. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.15157 [pdf, other]

Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language

Authors: Paul Denny, Viraj Kumar, Nasser Giacaman

Abstract: GitHub Copilot is an artificial intelligence model for automatically generating source code from natural language problem descriptions. Since June 2022, Copilot has officially been available for free to all students as a plug-in to development environments like Visual Studio Code. Prior work exploring OpenAI Codex, the underlying model that powers Copilot, has shown it performs well on typical CS1… ▽ More GitHub Copilot is an artificial intelligence model for automatically generating source code from natural language problem descriptions. Since June 2022, Copilot has officially been available for free to all students as a plug-in to development environments like Visual Studio Code. Prior work exploring OpenAI Codex, the underlying model that powers Copilot, has shown it performs well on typical CS1 problems thus raising concerns about the impact it will have on how introductory programming courses are taught. However, little is known about the types of problems for which Copilot does not perform well, or about the natural language interactions that a student might have with Copilot when resolving errors. We explore these questions by evaluating the performance of Copilot on a publicly available dataset of 166 programming problems. We find that it successfully solves around half of these problems on its very first attempt, and that it solves 60\% of the remaining problems using only natural language changes to the problem description. We argue that this type of prompt engineering, which we believe will become a standard interaction between human and Copilot when it initially fails, is a potentially useful learning activity that promotes computational thinking skills, and is likely to change the nature of code writing skill development. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.11630 [pdf, ps, other]

doi 10.1145/3545945.3569770

Using Large Language Models to Enhance Programming Error Messages

Authors: Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, Brett A. Becker

Abstract: A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Rese… ▽ More A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix the error. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 7 pages, accepted for publication at SIGCSE TS 2023

arXiv:2206.11861 [pdf, other]

doi 10.1145/3501385.3543957

Automatic Generation of Programming Exercises and Code Explanations using Large Language Models

Authors: Sami Sarsa, Paul Denny, Arto Hellas, Juho Leinonen

Abstract: This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our result… ▽ More This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike. △ Less

Submitted 26 June, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: 18 pages, 1 figure, accepted in ICER

arXiv:2111.10058 [pdf, other]

DeepQR: Neural-based Quality Ratings for Learnersourced Multiple-Choice Questions

Authors: Lin Ni, Qiming Bao, Xiaoxuan Li, Qianqian Qi, Paul Denny, Jim Warren, Michael Witbrock, Jiamou Liu

Abstract: Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-art deep-learning techniques. We propose DeepQR, a n… ▽ More Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-art deep-learning techniques. We propose DeepQR, a novel neural-network model for AQQR that is trained using multiple-choice-question (MCQ) datasets collected from PeerWise, a widely-used learnersourcing platform. Along with designing DeepQR, we investigate models based on explicitly-defined features, or semantic features, or both. We also introduce a self-attention mechanism to capture semantic correlations between MCQ components, and a contrastive-learning approach to acquire question representations using quality ratings. Extensive experiments on datasets collected from eight university-level courses illustrate that DeepQR has superior performance over six comparative models. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: EAAI 22

arXiv:1902.03442 [pdf, other]

doi 10.2352/ISSN.2470-1173.2019.15.AVM-048

Yes, we GAN: Applying Adversarial Techniques for Autonomous Driving

Authors: Michal Uricar, Pavel Krizek, David Hurych, Ibrahim Sobh, Senthil Yogamani, Patrick Denny

Abstract: Generative Adversarial Networks (GAN) have gained a lot of popularity from their introduction in 2014 till present. Research on GAN is rapidly growing and there are many variants of the original GAN focusing on various aspects of deep learning. GAN are perceived as the most impactful direction of machine learning in the last decade. This paper focuses on the application of GAN in autonomous drivin… ▽ More Generative Adversarial Networks (GAN) have gained a lot of popularity from their introduction in 2014 till present. Research on GAN is rapidly growing and there are many variants of the original GAN focusing on various aspects of deep learning. GAN are perceived as the most impactful direction of machine learning in the last decade. This paper focuses on the application of GAN in autonomous driving including topics such as advanced data augmentation, loss function learning, semi-supervised learning, etc. We formalize and review key applications of adversarial techniques and discuss challenges and open problems to be addressed. △ Less

Submitted 2 February, 2020; v1 submitted 9 February, 2019; originally announced February 2019.

Comments: Accepted for publication in Electronic Imaging, Autonomous Vehicles and Machines 2019. arXiv admin note: text overlap with arXiv:1606.05908 by other authors

Showing 1–48 of 48 results for author: Denny, P