-
Gaining Insights into Group-Level Course Difficulty via Differential Course Functioning
Authors:
Frederik Baucks,
Robin Schmucker,
Conrad Borchers,
Zachary A. Pardos,
Laurenz Wiskott
Abstract:
Curriculum Analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. One desirable property of courses within curricula is that they are not unexpectedly more difficult for students of different backgrounds. While prior work points to likely variations in course difficulty across student groups, robust methodologies for capturing such variations ar…
▽ More
Curriculum Analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. One desirable property of courses within curricula is that they are not unexpectedly more difficult for students of different backgrounds. While prior work points to likely variations in course difficulty across student groups, robust methodologies for capturing such variations are scarce, and existing approaches do not adequately decouple course-specific difficulty from students' general performance levels. The present study introduces Differential Course Functioning (DCF) as an Item Response Theory (IRT)-based CA methodology. DCF controls for student performance levels and examines whether significant differences exist in how distinct student groups succeed in a given course. Leveraging data from over 20,000 students at a large public university, we demonstrate DCF's ability to detect inequities in undergraduate course difficulty across student groups described by grade achievement. We compare major pairs with high co-enrollment and transfer students to their non-transfer peers. For the former, our findings suggest a link between DCF effect sizes and the alignment of course content to student home department motivating interventions targeted towards improving course preparedness. For the latter, results suggest minor variations in course-specific difficulty between transfer and non-transfer students. While this is desirable, it also suggests that interventions targeted toward mitigating grade achievement gaps in transfer students should encompass comprehensive support beyond enhancing preparedness for individual courses. By providing more nuanced and equitable assessments of academic performance and difficulties experienced by diverse student populations, DCF could support policymakers, course articulation officers, and student advisors.
△ Less
Submitted 7 May, 2024;
originally announced June 2024.
-
Predicting Learning Performance with Large Language Models: A Study in Adult Literacy
Authors:
Liang Zhang,
Jionghao Lin,
Conrad Borchers,
John Sabatini,
John Hollander,
Meng Cao,
Xiangen Hu
Abstract:
Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the po…
▽ More
Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the potential of LLMs to predict learning performance based on its inherent reasoning and computational capabilities. By using reading comprehension datasets from the ITS, AutoTutor, we evaluate the predictive capabilities of GPT-4 versus traditional machine learning methods in predicting learning performance through five-fold cross-validation techniques. Our findings show that the GPT-4 presents the competitive predictive abilities with traditional machine learning methods such as Bayesian Knowledge Tracing, Performance Factor Analysis, Sparse Factor Analysis Lite (SPARFA-Lite), tensor factorization and eXtreme Gradient Boosting (XGBoost). While XGBoost (trained on local machine) outperforms GPT-4 in predictive accuracy, GPT-4-selected XGBoost and its subsequent tuning on the GPT-4 platform demonstrates superior performance compared to local machine execution. Moreover, our investigation into hyper-parameter tuning by GPT-4 versus grid-search suggests comparable performance, albeit with less stability in the automated approach, using XGBoost as the case study. Our study contributes to the field by highlighting the potential of integrating LLMs with traditional machine learning models to enhance predictive accuracy and personalize adult literacy education, setting a foundation for future research in applying LLMs within ITSs.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation
Authors:
Zifei FeiFei Han,
Jionghao Lin,
Ashish Gurung,
Danielle R. Thomas,
Eason Chen,
Conrad Borchers,
Shivang Gupta,
Kenneth R. Koedinger
Abstract:
One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies. Novice math tutors often prioritize content-specific guidance, neglecting aspects such as social-emotional learning. Social-emotional learning promotes equity and inclusion and nurturing relationships with students, which is crucial for holistic student development. Asses…
▽ More
One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies. Novice math tutors often prioritize content-specific guidance, neglecting aspects such as social-emotional learning. Social-emotional learning promotes equity and inclusion and nurturing relationships with students, which is crucial for holistic student development. Assessing the competencies of tutors accurately and efficiently can drive the development of tailored tutor training programs. However, evaluating novice tutor ability during real-time tutoring remains challenging as it typically requires experts-in-the-loop. To address this challenge, this preliminary study aims to harness Generative Pre-trained Transformers (GPT), such as GPT-3.5 and GPT-4 models, to automatically assess tutors' ability of using social-emotional tutoring strategies. Moreover, this study also reports on the financial dimensions and considerations of employing these models in real-time and at scale for automated assessment. The current study examined four prompting strategies: two basic Zero-shot prompt strategies, Tree of Thought prompt, and Retrieval-Augmented Generator (RAG) based prompt. The results indicate that the RAG prompt demonstrated more accurate performance (assessed by the level of hallucination and correctness in the generated assessment texts) and lower financial costs than the other strategies evaluated. These findings inform the development of personalized tutor training interventions to enhance the the educational effectiveness of tutored learning.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems
Authors:
Liang Zhang,
Jionghao Lin,
Conrad Borchers,
Meng Cao,
Xiangen Hu
Abstract:
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor f…
▽ More
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.
△ Less
Submitted 29 January, 2024;
originally announced February 2024.
-
Revealing Networks: Understanding Effective Teacher Practices in AI-Supported Classrooms using Transmodal Ordered Network Analysis
Authors:
Conrad Borchers,
Yeyu Wang,
Shamya Karumbaiah,
Muhammad Ashiq,
David Williamson Shaffer,
Vincent Aleven
Abstract:
Learning analytics research increasingly studies classroom learning with AI-based systems through rich contextual data from outside these systems, especially student-teacher interactions. One key challenge in leveraging such data is generating meaningful insights into effective teacher practices. Quantitative ethnography bears the potential to close this gap by combining multimodal data streams in…
▽ More
Learning analytics research increasingly studies classroom learning with AI-based systems through rich contextual data from outside these systems, especially student-teacher interactions. One key challenge in leveraging such data is generating meaningful insights into effective teacher practices. Quantitative ethnography bears the potential to close this gap by combining multimodal data streams into networks of co-occurring behavior that drive insight into favorable learning conditions. The present study uses transmodal ordered network analysis to understand effective teacher practices in relationship to traditional metrics of in-system learning in a mathematics classroom working with AI tutors. Incorporating teacher practices captured by position tracking and human observation codes into modeling significantly improved the inference of how efficiently students improved in the AI tutor beyond a model with tutor log data features only. Comparing teacher practices by student learning rates, we find that students with low learning rates exhibited more hint use after monitoring. However, after an extended visit, students with low learning rates showed learning behavior similar to their high learning rate peers, achieving repeated correct attempts in the tutor. Observation notes suggest conceptual and procedural support differences can help explain visit effectiveness. Taken together, offering early conceptual support to students with low learning rates could make classroom practice with AI tutors more effective. This study advances the scientific understanding of effective teacher practice in classrooms learning with AI tutors and methodologies to make such practices visible.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems
Authors:
Conrad Borchers,
Jiayi Zhang,
Ryan S. Baker,
Vincent Aleven
Abstract:
Numerous studies demonstrate the importance of self-regulation during learning by problem-solving. Recent work in learning analytics has largely examined students' use of SRL concerning overall learning gains. Limited research has related SRL to in-the-moment performance differences among learners. The present study investigates SRL behaviors in relationship to learners' moment-by-moment performan…
▽ More
Numerous studies demonstrate the importance of self-regulation during learning by problem-solving. Recent work in learning analytics has largely examined students' use of SRL concerning overall learning gains. Limited research has related SRL to in-the-moment performance differences among learners. The present study investigates SRL behaviors in relationship to learners' moment-by-moment performance while working with intelligent tutoring systems for stoichiometry chemistry. We demonstrate the feasibility of labeling SRL behaviors based on AI-generated think-aloud transcripts, identifying the presence or absence of four SRL categories (processing information, planning, enacting, and realizing errors) in each utterance. Using the SRL codes, we conducted regression analyses to examine how the use of SRL in terms of presence, frequency, cyclical characteristics, and recency relate to student performance on subsequent steps in multi-step problems. A model considering students' SRL cycle characteristics outperformed a model only using in-the-moment SRL assessment. In line with theoretical predictions, students' actions during earlier, process-heavy stages of SRL cycles exhibited lower moment-by-moment correctness during problem-solving than later SRL cycle stages. We discuss system re-design opportunities to add SRL support during stages of processing and paths forward for using machine learning to speed research depending on the assessment of SRL based on transcription of think-aloud data.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Insights into undergraduate pathways using course load analytics
Authors:
Conrad Borchers,
Zachary A. Pardos
Abstract:
Course load analytics (CLA) inferred from LMS and enrollment features can offer a more accurate representation of course workload to students than credit hours and potentially aid in their course selection decisions. In this study, we produce and evaluate the first machine-learned predictions of student course load ratings and generalize our model to the full 10,000 course catalog of a large publi…
▽ More
Course load analytics (CLA) inferred from LMS and enrollment features can offer a more accurate representation of course workload to students than credit hours and potentially aid in their course selection decisions. In this study, we produce and evaluate the first machine-learned predictions of student course load ratings and generalize our model to the full 10,000 course catalog of a large public university. We then retrospectively analyze longitudinal differences in the semester load of student course selections throughout their degree. CLA by semester shows that a student's first semester at the university is among their highest load semesters, as opposed to a credit hour-based analysis, which would indicate it is among their lowest. Investigating what role predicted course load may play in program retention, we find that students who maintain a semester load that is low as measured by credit hours but high as measured by CLA are more likely to leave their program of study. This discrepancy in course load is particularly pertinent in STEM and associated with high prerequisite courses. Our findings have implications for academic advising, institutional handling of the freshman experience, and student-facing analytics to help students better plan, anticipate, and prepare for their selected courses.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Authors:
Conrad Borchers,
Dalia Sara Gala,
Benjamin Gilburt,
Eduard Oravkin,
Wilfried Bounsi,
Yuki M. Asano,
Hannah Rose Kirk
Abstract:
The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal…
▽ More
The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal of writing unbiased and realistic job advertisements. We first assess the bias and realism of zero-shot generated advertisements and compare them to real-world advertisements. We then evaluate prompt-engineering and fine-tuning as debiasing methods. We find that prompt-engineering with diversity-encouraging prompts gives no significant improvement to bias, nor realism. Conversely, fine-tuning, especially on unbiased real advertisements, can improve realism and reduce bias.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.