-
Building Collaborative Learning: Exploring Social Annotation in Introductory Programming
Authors:
Francisco Gomes de Oliveira Neto,
Felix Dobslaw
Abstract:
The increasing demand for software engineering education presents learning challenges in courses due to the diverse range of topics that require practical applications, such as programming or software design, all of which are supported by group work and interaction. Social Annotation (SA) is an approach to teaching that can enhance collaborative learning among students. In SA, both students and te…
▽ More
The increasing demand for software engineering education presents learning challenges in courses due to the diverse range of topics that require practical applications, such as programming or software design, all of which are supported by group work and interaction. Social Annotation (SA) is an approach to teaching that can enhance collaborative learning among students. In SA, both students and teachers utilize platforms like Feedback Fruits, Perusall, and Diigo to collaboratively annotate and discuss course materials. This approach encourages students to share their thoughts and answers with their peers, fostering a more interactive learning environment. We share our experience of implementing social annotation via Perusall as a preparatory tool for lectures in an introductory programming course aimed at undergraduate students in Software Engineering. We report the impact of Perusall on the examination results of 112 students. Our results show that 81% of students engaged in meaningful social annotation successfully passed the course. Notably, the proportion of students passing the exam tends to rise as they complete more Perusall assignments. In contrast, only 56% of students who did not participate in Perusall discussions managed to pass the exam. We did not enforce mandatory Perusall participation in the course. Yet, the feedback from our course evaluation questionnaire reveals that most students ranked Perusall among their favorite components of the course and that their interest in the subject has increased.
△ Less
Submitted 17 June, 2024;
originally announced July 2024.
-
Experiences with Remote Examination Formats in Light of GPT-4
Authors:
Felix Dobslaw,
Peter Bergh
Abstract:
Sudden access to the rapidly improving large language model GPT by open-ai forces educational institutions worldwide to revisit their exam procedures. In the pre-GPT era, we successfully applied oral and open-book home exams for two courses in the third year of our predominantly remote Software Engineering BSc program. We ask in this paper whether our current open-book exams are still viable or wh…
▽ More
Sudden access to the rapidly improving large language model GPT by open-ai forces educational institutions worldwide to revisit their exam procedures. In the pre-GPT era, we successfully applied oral and open-book home exams for two courses in the third year of our predominantly remote Software Engineering BSc program. We ask in this paper whether our current open-book exams are still viable or whether a move back to a legally compliant but less scalable oral exam is the only workable alternative. We further compare work-effort estimates between oral and open-book exams and report on differences in throughput and grade distribution over eight years to better understand the impact of examination format on the outcome. Examining GPT v4 on the most recent open-book exams showed that our current Artificial Intelligence and Reactive Programming exams are not GPT v4 proof. Three potential weaknesses of GPT are outlined. We also found that grade distributions have largely been unaffected by the examination format, opening up for a move to oral examinations only if needed. Throughput was higher for open-book exam course instances (73% vs 64%), while fail rates were too (12% vs 7%), with teacher workload increasing even for smaller classes. We also report on our experience regarding effort. Oral examinations are efficient for smaller groups but come with caveats regarding intensity and stress.
△ Less
Submitted 27 March, 2023;
originally announced May 2023.
-
The Gap between Higher Education and the Software Industry -- A Case Study on Technology Differences
Authors:
Felix Dobslaw,
Kristian Angelin,
Lena-Maria Öberg,
Awais Ahmad
Abstract:
We see an explosive global labour demand in the Software Industry, and higher education institutions play a crucial role in supplying the industry with professionals with relevant education. Existing literature identifies a gap between what software engineering education teaches students and what the software industry demands. Using our open-sourced Job Market AnalyseR (JMAR) text-analysis tool, w…
▽ More
We see an explosive global labour demand in the Software Industry, and higher education institutions play a crucial role in supplying the industry with professionals with relevant education. Existing literature identifies a gap between what software engineering education teaches students and what the software industry demands. Using our open-sourced Job Market AnalyseR (JMAR) text-analysis tool, we compared keywords from higher education course syllabi and job posts to investigate the knowledge gap from a technology-focused departure point. We present a trend analysis of technology in job posts over the past six years in Sweden. We found that demand for cloud and automation technology such as Kubernetes and Docker is rising in job ads but not that much in higher education syllabi. The language used in higher education syllabi and job ads differs where the former emphasizes concepts and the latter technologies more heavily. We discuss possible remedies to bridge this mismatch to draw further conclusions in future work, including calibrating JMAR to other industry-relevant aspects, including soft skills, software concepts, or new demographics.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Automated Black-Box Boundary Value Detection
Authors:
Felix Dobslaw,
Robert Feldt,
Francisco de Oliveira Neto
Abstract:
The input domain of software systems can typically be divided into sub-domains for which the outputs are similar. To ensure high quality it is critical to test the software on the boundaries between these sub-domains. Consequently, boundary value analysis and testing has been part of the toolbox of software testers for long and is typically taught early to students. However, despite its many argue…
▽ More
The input domain of software systems can typically be divided into sub-domains for which the outputs are similar. To ensure high quality it is critical to test the software on the boundaries between these sub-domains. Consequently, boundary value analysis and testing has been part of the toolbox of software testers for long and is typically taught early to students. However, despite its many argued benefits, boundary value analysis for a given specification or piece of software is typically described in abstract terms which allow for variation in how testers apply it.
Here we propose an automated, black-box boundary value detection method to support software testers in systematic boundary value analysis with consistent results. The method builds on a metric to quantify the level of boundariness of test inputs: the program derivative. By coupling it with search algorithms we find and rank pairs of inputs as good boundary candidates, i.e. inputs close together but with outputs far apart. We implement our AutoBVA approach and evaluate it on a curated dataset of example programs. Our results indicate that even with a simple and generic program derivative variant in combination with broad sampling over the input space, interesting boundary candidates can be identified.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
On the Importance and Shortcomings of Code Readability Metrics: A Case Study on Reactive Programming
Authors:
Gustaf Holst,
Felix Dobslaw
Abstract:
Well structured and readable source code is a pre-requisite for maintainable software and successful collaboration among developers. Static analysis enables the automated extraction of code complexity and readability metrics which can be leveraged to highlight potential improvements in code to both attain software of high quality and reinforce good practices for developers as an educational tool.…
▽ More
Well structured and readable source code is a pre-requisite for maintainable software and successful collaboration among developers. Static analysis enables the automated extraction of code complexity and readability metrics which can be leveraged to highlight potential improvements in code to both attain software of high quality and reinforce good practices for developers as an educational tool. This assumes reliable readability metrics which are not trivial to obtain since code readability is somewhat subjective. Recent research has resulted in increasingly sophisticated models for predicting readability as perceived by humans primarily with a procedural and object oriented focus, while functional and declarative languages and language extensions advance as they often are said to lead to more concise and readable code. In this paper, we investigate whether the existing complexity and readability metrics reflect that wisdom or whether the notion of readability and its constituents requires overhaul in the light of programming language changes. We therefore compare traditional object oriented and reactive programming in terms of code complexity and readability in a case study. Reactive programming is claimed to increase code quality but few studies have substantiated these claims empirically. We refactored an object oriented open source project into a reactive candidate and compare readability with the original using cyclomatic complexity and two state-of-the-art readability metrics. More elaborate investigations are required, but our findings suggest that both cyclomatic complexity and readability decrease significantly at the same time in the reactive candidate, which seems counter-intuitive. We exemplify and substantiate why readability metrics may require adjustment to better suit popular programming styles other than imperative and object-oriented to better match human expectations.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Using mutation testing to measure behavioural test diversity
Authors:
Francisco Gomes de Oliveira Neto,
Felix Dobslaw,
Robert Feldt
Abstract:
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners' awareness about waste in test artefacts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the divers…
▽ More
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners' awareness about waste in test artefacts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the diversity has mainly been measured on and between artefacts (e.g., inputs, outputs or test scripts). Here, we introduce a family of measures to capture behavioural diversity (b-div) of test cases by comparing their executions and failure outcomes. Using failure information to capture the SUT behaviour has been shown to improve effectiveness of history-based test prioritisation approaches. However, history-based techniques require reliable test execution logs which are often not available or can be difficult to obtain due to flaky tests, scarcity of test executions, etc. To be generally applicable we instead propose to use mutation testing to measure behavioral diversity by running the set of test cases on various mutated versions of the SUT. Concretely, we propose two specific b-div measures (based on accuracy and Matthew's correlation coefficient, respectively) and compare them with artefact-based diversity (a-div) for prioritising the test suites of 6 different open-source projects. Our results show that our b-div measures outperform a-div and random selection in all of the studied projects. The improvement is substantial with an average increase in average percentage of faults detected (APFD) of between 19% to 31% depending on the size of the subset of prioritised tests.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Boundary Value Exploration for Software Analysis
Authors:
Felix Dobslaw,
Francisco Gomes de Oliveira Neto,
Robert Feldt
Abstract:
For software to be reliable and resilient, it is widely accepted that tests must be created and maintained alongside the software itself. One safeguard from vulnerabilities and failures in code is to ensure correct behavior on the boundaries between the input space sub-domains. So-called boundary value analysis (BVA) and boundary value testing (BVT) techniques aim to exercise those boundaries and…
▽ More
For software to be reliable and resilient, it is widely accepted that tests must be created and maintained alongside the software itself. One safeguard from vulnerabilities and failures in code is to ensure correct behavior on the boundaries between the input space sub-domains. So-called boundary value analysis (BVA) and boundary value testing (BVT) techniques aim to exercise those boundaries and increase test effectiveness. However, the concepts of BVA and BVT themselves are not generally well defined, and it is not clear how to identify relevant sub-domains, and thus the boundaries delineating them, given a specification. This has limited adoption and hindered automation. We clarify BVA and BVT and introduce Boundary Value Exploration (BVE) to describe techniques that support them by helping to detect and identify boundary inputs. Additionally, we propose two concrete BVE techniques based on information-theoretic distance functions: (i) an algorithm for boundary detection and (ii) the usage of software visualization to explore the behavior of the software under test and identify its boundary behavior. As an initial evaluation, we apply these techniques on a much used and well-tested date handling library. Our results reveal questionable behavior at boundaries highlighted by our techniques. In conclusion, we argue that the boundary value exploration that our techniques enable is a step towards automated boundary value analysis and testing, fostering their wider use and improving test effectiveness and efficiency.
△ Less
Submitted 12 October, 2020; v1 submitted 18 January, 2020;
originally announced January 2020.
-
Estimating Return on Investment for GUI Test Automation Tools
Authors:
Felix Dobslaw,
Robert Feldt,
David Michaelsson,
Patrick Haar,
Francisco G. de Oliveira Neto,
Richard Torkar
Abstract:
Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for est…
▽ More
Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for estimating maintenance cost and Return on Investment (ROI) for Automated GUI Testing (AGT). The method utilizes the existing source code change history and can be used for evaluation also of other testing or quality assurance automation technologies. We evaluate the method for a real-world, industrial software system and compare two fundamentally different AGT tools, namely Selenium and EyeAutomate, to estimate and compare their ROI. We also report on their defect-finding capabilities and usability. The quantitative data is complemented by interviews with employees at the case company. The method was successfully applied and estimated maintenance cost and ROI for both tools are reported. Overall, the study supports earlier results showing that implementation time is the leading cost for introducing AGT. The findings further suggest that while EyeAutomate tests are significantly faster to implement, Selenium tests require more of a programming background but less maintenance.
△ Less
Submitted 1 November, 2019; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Towards Automated Boundary Value Testing with Program Derivatives and Search
Authors:
Robert Feldt,
Felix Dobslaw
Abstract:
A natural and often used strategy when testing software is to use input values at boundaries, i.e. where behavior is expected to change the most, an approach often called boundary value testing or analysis (BVA). Even though this has been a key testing idea for long it has been hard to clearly define and formalize. Consequently, it has also been hard to automate.
In this research note we propose…
▽ More
A natural and often used strategy when testing software is to use input values at boundaries, i.e. where behavior is expected to change the most, an approach often called boundary value testing or analysis (BVA). Even though this has been a key testing idea for long it has been hard to clearly define and formalize. Consequently, it has also been hard to automate.
In this research note we propose one such formalization of BVA by, in a similar way as to how the derivative of a function is defined in mathematics, considering (software) program derivatives. Critical to our definition is the notion of distance between inputs and outputs which we can formalize and then quantify based on ideas from Information theory.
However, for our (black-box) approach to be practical one must search for test inputs with specific properties. Coupling it with search-based software engineering is thus required and we discuss how program derivatives can be used as and within fitness functions.
This brief note does not allow a deeper, empirical investigation but we use a simple illustrative example throughout to introduce the main ideas. By combining program derivatives with search, we thus propose a practical as well as theoretically interesting technique for automated boundary value (analysis and) testing.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
End-to-End Reliability-aware Scheduling for Wireless Sensor Networks
Authors:
Felix Dobslaw,
Tingting Zhang,
Mikael Gidlund
Abstract:
Wireless Sensor Networks (WSN) are gaining popularity as a flexible and economical alternative to field-bus installations for monitoring and control applications. For mission-critical applications, communication networks must provide end-to-end reliability guarantees, posing substantial challenges for WSN. Reliability can be improved by redundancy, and is often addressed on the MAC layer by re-sub…
▽ More
Wireless Sensor Networks (WSN) are gaining popularity as a flexible and economical alternative to field-bus installations for monitoring and control applications. For mission-critical applications, communication networks must provide end-to-end reliability guarantees, posing substantial challenges for WSN. Reliability can be improved by redundancy, and is often addressed on the MAC layer by re-submission of lost packets, usually applying slotted scheduling. Recently, researchers have proposed a strategy to optimally improve the reliability of a given schedule by repeating the most rewarding slots in a schedule incrementally until a deadline. This Incrementer can be used with most scheduling algorithms but has scalability issues which narrows its usability to offline calculations of schedules, for networks that are rather static. In this paper, we introduce SchedEx, a generic heuristic scheduling algorithm extension which guarantees a user-defined end-to-end reliability. SchedEx produces competitive schedules to the existing approach, and it does that consistently more than an order of magnitude faster. The harsher the end-to-end reliability demand of the network, the better SchedEx performs compared to the Incrementer. We further show that SchedEx has a more evenly distributed improvement impact on the scheduling algorithms, whereas the Incrementer favors schedules created by certain scheduling algorithms.
△ Less
Submitted 8 December, 2014;
originally announced December 2014.