-
On the Abuse and Detection of Polyglot Files
Authors:
Luke Koch,
Sean Oesch,
Amul Chaulagain,
Jared Dixon,
Matthew Dixon,
Mike Huettal,
Amir Sadovnik,
Cory Watson,
Brian Weber,
Jacob Hartman,
Richard Patulski
Abstract:
A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files…
▽ More
A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files used in the wild, leaving organizations vulnerable to attack. To address this issue, we studied the use of polyglot files by malicious actors in the wild, finding $30$ polyglot samples and $15$ attack chains that leveraged polyglot files. In this report, we highlight two well-known APTs whose cyber attack chains relied on polyglot files to bypass detection mechanisms. Using knowledge from our survey of polyglot usage in the wild -- the first of its kind -- we created a novel data set based on adversary techniques. We then trained a machine learning detection solution, PolyConv, using this data set. PolyConv achieves a precision-recall area-under-curve score of $0.999$ with an F1 score of $99.20$% for polyglot detection and $99.47$% for file-format identification, significantly outperforming all other tools tested. We developed a content disarmament and reconstruction tool, ImSan, that successfully sanitized $100$% of the tested image-based polyglots, which were the most common type found via the survey. Our work provides concrete tools and suggestions to enable defenders to better defend themselves against polyglot files, as well as directions for future work to create more robust file specifications and methods of disarmament.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
From Internet of Things Data to Business Processes: Challenges and a Framework
Authors:
Juergen Mangler,
Ronny Seiger,
Janik-Vasily Benzin,
Joscha Grüger,
Yusuf Kirikkayis,
Florian Gallik,
Lukas Malburg,
Matthias Ehrendorfer,
Yannis Bertrand,
Marco Franceschetti,
Barbara Weber,
Stefanie Rinderle-Ma,
Ralph Bergmann,
Estefanía Serral Asensio,
Manfred Reichert
Abstract:
The IoT and Business Process Management (BPM) communities co-exist in many shared application domains, such as manufacturing and healthcare. The IoT community has a strong focus on hardware, connectivity and data; the BPM community focuses mainly on finding, controlling, and enhancing the structured interactions among the IoT devices in processes. While the field of Process Mining deals with the e…
▽ More
The IoT and Business Process Management (BPM) communities co-exist in many shared application domains, such as manufacturing and healthcare. The IoT community has a strong focus on hardware, connectivity and data; the BPM community focuses mainly on finding, controlling, and enhancing the structured interactions among the IoT devices in processes. While the field of Process Mining deals with the extraction of process models and process analytics from process event logs, the data produced by IoT sensors often is at a lower granularity than these process-level events. The fundamental questions about extracting and abstracting process-related data from streams of IoT sensor values are: (1) Which sensor values can be clustered together as part of process events?, (2) Which sensor values signify the start and end of such events?, (3) Which sensor values are related but not essential? This work proposes a framework to semi-automatically perform a set of structured steps to convert low-level IoT sensor data into higher-level process events that are suitable for process mining. The framework is meant to provide a generic sequence of abstract steps to guide the event extraction, abstraction, and correlation, with variation points for plugging in specific analysis techniques and algorithms for each step. To assess the completeness of the framework, we present a set of challenges, how they can be tackled through the framework, and an example on how to instantiate the framework in a real-world demonstration from the field of smart manufacturing. Based on this framework, future research can be conducted in a structured manner through refining and improving individual steps.
△ Less
Submitted 22 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Stability of Information in the Heat Flow Clustering
Authors:
Brian Weber
Abstract:
Clustering methods must be tailored to the dataset it operates on, as there is no objective or universal definition of ``cluster,'' but nevertheless arbitrariness in the clustering method must be minimized. This paper develops a quantitative ``stability'' method of determining clusters, where stable or persistent clustering signals are used to indicate real structures have been identified in the u…
▽ More
Clustering methods must be tailored to the dataset it operates on, as there is no objective or universal definition of ``cluster,'' but nevertheless arbitrariness in the clustering method must be minimized. This paper develops a quantitative ``stability'' method of determining clusters, where stable or persistent clustering signals are used to indicate real structures have been identified in the underlying dataset. This method is based on modulating clustering methods by controlling a parameter -- through a thermodynamic analogy, the modulation parameter is considered ``time'' and the evolving clustering methodologies can be considered a ``heat flow.'' When the information entropy of the heat flow is stable over a wide range of times -- either globally or in the local sense which we define -- we interpret this stability as an indication that essential features of the data have been found, and create clusters on this basis.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
The Path To Autonomous Cyber Defense
Authors:
Sean Oesch,
Phillipe Austria,
Amul Chaulagain,
Brian Weber,
Cory Watson,
Matthew Dixson,
Amir Sadovnik
Abstract:
Defenders are overwhelmed by the number and scale of attacks against their networks.This problem will only be exacerbated as attackers leverage artificial intelligence to automate their workflows. We propose a path to autonomous cyber agents able to augment defenders by automating critical steps in the cyber defense life cycle.
Defenders are overwhelmed by the number and scale of attacks against their networks.This problem will only be exacerbated as attackers leverage artificial intelligence to automate their workflows. We propose a path to autonomous cyber agents able to augment defenders by automating critical steps in the cyber defense life cycle.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Data Needs and Challenges of Quantum Dot Devices Automation: Workshop Report
Authors:
Justyna P. Zwolak,
Jacob M. Taylor,
Reed Andrews,
Jared Benson,
Garnett Bryant,
Donovan Buterakos,
Anasua Chatterjee,
Sankar Das Sarma,
Mark A. Eriksson,
Eliška Greplová,
Michael J. Gullans,
Fabian Hader,
Tyler J. Kovach,
Pranav S. Mundada,
Mick Ramsey,
Torbjoern Rasmussen,
Brandon Severin,
Anthony Sigillito,
Brennan Undseth,
Brian Weber
Abstract:
Gate-defined quantum dots are a promising candidate system to realize scalable, coupled qubit systems and serve as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant…
▽ More
Gate-defined quantum dots are a promising candidate system to realize scalable, coupled qubit systems and serve as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant parameter space grows sufficiently to make heuristic control infeasible. Thus, it is imperative that reliable and scalable autonomous tuning approaches are developed. In this report, we outline current challenges in automating quantum dot device tuning and operation with a particular focus on datasets, benchmarking, and standardization. We also present ideas put forward by the quantum dot community on how to overcome them.
△ Less
Submitted 12 May, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
QDA$^2$: A principled approach to automatically annotating charge stability diagrams
Authors:
Brian Weber,
Justyna P. Zwolak
Abstract:
Gate-defined semiconductor quantum dot (QD) arrays are a promising platform for quantum computing. However, presently, the large configuration spaces and inherent noise make tuning of QD devices a nontrivial task and with the increasing number of QD qubits, the human-driven experimental control becomes unfeasible. Recently, researchers working with QD systems have begun putting considerable effort…
▽ More
Gate-defined semiconductor quantum dot (QD) arrays are a promising platform for quantum computing. However, presently, the large configuration spaces and inherent noise make tuning of QD devices a nontrivial task and with the increasing number of QD qubits, the human-driven experimental control becomes unfeasible. Recently, researchers working with QD systems have begun putting considerable effort into automating device control, with a particular focus on machine-learning-driven methods. Yet, the reported performance statistics vary substantially in both the meaning and the type of devices used for testing. While systematic benchmarking of the proposed tuning methods is necessary for developing reliable and scalable tuning approaches, the lack of openly available standardized datasets of experimental data makes such testing impossible. The QD auto-annotator -- a classical algorithm for automatic interpretation and labeling of experimentally acquired data -- is a critical step toward rectifying this. QD auto-annotator leverages the principles of geometry to produce state labels for experimental double-QD charge stability diagrams and is a first step towards building a large public repository of labeled QD data.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
AI ATAC 1: An Evaluation of Prominent Commercial Malware Detectors
Authors:
Robert A. Bridges,
Brian Weber,
Justin M. Beaver,
Jared M. Smith,
Miki E. Verma,
Savannah Norem,
Kevin Spakes,
Cory Watson,
Jeff A. Nichols,
Brian Jewell,
Michael. D. Iannacone,
Chelsey Dunivan Stahl,
Kelly M. T. Huffer,
T. Sean Oesch
Abstract:
This work presents an evaluation of six prominent commercial endpoint malware detectors, a network malware detector, and a file-conviction algorithm from a cyber technology vendor. The evaluation was administered as the first of the Artificial Intelligence Applications to Autonomous Cybersecurity (AI ATAC) prize challenges, funded by / completed in service of the US Navy. The experiment employed 1…
▽ More
This work presents an evaluation of six prominent commercial endpoint malware detectors, a network malware detector, and a file-conviction algorithm from a cyber technology vendor. The evaluation was administered as the first of the Artificial Intelligence Applications to Autonomous Cybersecurity (AI ATAC) prize challenges, funded by / completed in service of the US Navy. The experiment employed 100K files (50/50% benign/malicious) with a stratified distribution of file types, including ~1K zero-day program executables (increasing experiment size two orders of magnitude over previous work). We present an evaluation process of delivering a file to a fresh virtual machine donning the detection technology, waiting 90s to allow static detection, then executing the file and waiting another period for dynamic detection; this allows greater fidelity in the observational data than previous experiments, in particular, resource and time-to-detection statistics. To execute all 800K trials (100K files $\times$ 8 tools), a software framework is designed to choreographed the experiment into a completely automated, time-synced, and reproducible workflow with substantial parallelization. A cost-benefit model was configured to integrate the tools' recall, precision, time to detection, and resource requirements into a single comparable quantity by simulating costs of use. This provides a ranking methodology for cyber competitions and a lens through which to reason about the varied statistical viewpoints of the results. These statistical and cost-model results provide insights on state of commercial malware detection.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Testing SOAR Tools in Use
Authors:
Robert A. Bridges,
Ashley E. Rice,
Sean Oesch,
Jeff A. Nichols,
Cory Watson,
Kevin Spakes,
Savannah Norem,
Mike Huettel,
Brian Jewell,
Brian Weber,
Connor Gannon,
Olivia Bizovi,
Samuel C Hollifield,
Samantha Erwin
Abstract:
Modern security operation centers (SOCs) rely on operators and a tapestry of logging and alerting tools with large scale collection and query abilities. SOC investigations are tedious as they rely on manual efforts to query diverse data sources, overlay related logs, and correlate the data into information and then document results in a ticketing system. Security orchestration, automation, and res…
▽ More
Modern security operation centers (SOCs) rely on operators and a tapestry of logging and alerting tools with large scale collection and query abilities. SOC investigations are tedious as they rely on manual efforts to query diverse data sources, overlay related logs, and correlate the data into information and then document results in a ticketing system. Security orchestration, automation, and response (SOAR) tools are a new technology that promise to collect, filter, and display needed data; automate common tasks that require SOC analysts' time; facilitate SOC collaboration; and, improve both efficiency and consistency of SOCs. SOAR tools have never been tested in practice to evaluate their effect and understand them in use. In this paper, we design and administer the first hands-on user study of SOAR tools, involving 24 participants and 6 commercial SOAR tools. Our contributions include the experimental design, itemizing six characteristics of SOAR tools and a methodology for testing them. We describe configuration of the test environment in a cyber range, including network, user, and threat emulation; a full SOC tool suite; and creation of artifacts allowing multiple representative investigation scenarios to permit testing. We present the first research results on SOAR tools. We found that SOAR configuration is critical, as it involves creative design for data display and automation. We found that SOAR tools increased efficiency and reduced context switching during investigations, although ticket accuracy and completeness (indicating investigation quality) decreased with SOAR use. Our findings indicated that user preferences are slightly negatively correlated with their performance with the tool; overautomation was a concern of senior analysts, and SOAR tools that balanced automation with assisting a user to make decisions were preferred.
△ Less
Submitted 14 February, 2023; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Toward the Detection of Polyglot Files
Authors:
Luke Koch,
Sean Oesch,
Mary Adkisson,
Sam Erwin,
Brian Weber,
Amul Chaulagain
Abstract:
Standardized file formats play a key role in the development and use of computer software. However, it is possible to abuse standardized file formats by creating a file that is valid in multiple file formats. The resulting polyglot (many languages) file can confound file format identification, allowing elements of the file to evade analysis.This is especially problematic for malware detection syst…
▽ More
Standardized file formats play a key role in the development and use of computer software. However, it is possible to abuse standardized file formats by creating a file that is valid in multiple file formats. The resulting polyglot (many languages) file can confound file format identification, allowing elements of the file to evade analysis.This is especially problematic for malware detection systems that rely on file format identification for feature extraction. File format identification processes that depend on file signatures can be easily evaded thanks to flexibility in the format specifications of certain file formats. Although work has been done to identify file formats using more comprehensive methods than file signatures, accurate identification of polyglot files remains an open problem. Since malware detection systems routinely perform file format-specific feature extraction, polyglot files need to be filtered out prior to ingestion by these systems. Otherwise, malicious content could pass through undetected. To address the problem of polyglot detection we assembled a data set using the mitra tool. We then evaluated the performance of the most commonly used file identification tool, file. Finally, we demonstrated the accuracy, precision, recall and F1 score of a range of machine and deep learning models. Malconv2 and Catboost demonstrated the highest recall on our data set with 95.16% and 95.45%, respectively. These models can be incorporated into a malware detector's file processing pipeline to filter out potentially malicious polyglots before file format-dependent feature extraction takes place.
△ Less
Submitted 12 April, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Wiener Guided DIP for Unsupervised Blind Image Deconvolution
Authors:
Gustav Bredell,
Ertunc Erdil,
Bruno Weber,
Ender Konukoglu
Abstract:
Blind deconvolution is an ill-posed problem arising in various fields ranging from microscopy to astronomy. The ill-posed nature of the problem requires adequate priors to arrive to a desirable solution. Recently, it has been shown that deep learning architectures can serve as an image generation prior during unsupervised blind deconvolution optimization, however often exhibiting a performance flu…
▽ More
Blind deconvolution is an ill-posed problem arising in various fields ranging from microscopy to astronomy. The ill-posed nature of the problem requires adequate priors to arrive to a desirable solution. Recently, it has been shown that deep learning architectures can serve as an image generation prior during unsupervised blind deconvolution optimization, however often exhibiting a performance fluctuation even on a single image. We propose to use Wiener-deconvolution to guide the image generator during optimization by providing it a sharpened version of the blurry image using an auxiliary kernel estimate starting from a Gaussian. We observe that the high-frequency artifacts of deconvolution are reproduced with a delay compared to low-frequency features. In addition, the image generator reproduces low-frequency features of the deconvolved image faster than that of a blurry image. We embed the computational process in a constrained optimization framework and show that the proposed method yields higher stability and performance across multiple datasets. In addition, we provide the code.
△ Less
Submitted 19 December, 2021;
originally announced December 2021.
-
A Mathematical Framework for Evaluation of SOAR Tools with Limited Survey Data
Authors:
Savannah Norem,
Ashley E Rice,
Samantha Erwin,
Robert A Bridges,
Sean Oesch,
Brian Weber
Abstract:
Security operation centers (SOCs) all over the world are tasked with reacting to cybersecurity alerts ranging in severity. Security Orchestration, Automation, and Response (SOAR) tools streamline cybersecurity alert responses by SOC operators. SOAR tool adoption is expensive both in effort and finances. Hence, it is crucial to limit adoption to those most worthwhile; yet no research evaluating or…
▽ More
Security operation centers (SOCs) all over the world are tasked with reacting to cybersecurity alerts ranging in severity. Security Orchestration, Automation, and Response (SOAR) tools streamline cybersecurity alert responses by SOC operators. SOAR tool adoption is expensive both in effort and finances. Hence, it is crucial to limit adoption to those most worthwhile; yet no research evaluating or comparing SOAR tools exists. The goal of this work is to evaluate several SOAR tools using specific criteria pertaining to their usability. SOC operators were asked to first complete a survey about what SOAR tool aspects are most important. Operators were then assigned a set of SOAR tools for which they viewed demonstration and overview videos, and then operators completed a second survey wherein they were tasked with evaluating each of the tools on the aspects from the first survey. In addition, operators provided an overall rating to each of their assigned tools, and provided a ranking of their tools in order of preference. Due to time constraints on SOC operators for thorough testing, we provide a systematic method of downselecting a large pool of SOAR tools to a select few that merit next-step hands-on evaluation by SOC operators. Furthermore, the analyses conducted in this survey help to inform future development of SOAR tools to ensure that the appropriate functions are available for use in a SOC.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Theoretical bounds on data requirements for the ray-based classification
Authors:
Brian J. Weber,
Sandesh S. Kalantre,
Thomas McJunkin,
Jacob M. Taylor,
Justyna P. Zwolak
Abstract:
The problem of classifying high-dimensional shapes in real-world data grows in complexity as the dimension of the space increases. For the case of identifying convex shapes of different geometries, a new classification framework has recently been proposed in which the intersections of a set of one-dimensional representations, called rays, with the boundaries of the shape are used to identify the s…
▽ More
The problem of classifying high-dimensional shapes in real-world data grows in complexity as the dimension of the space increases. For the case of identifying convex shapes of different geometries, a new classification framework has recently been proposed in which the intersections of a set of one-dimensional representations, called rays, with the boundaries of the shape are used to identify the specific geometry. This ray-based classification (RBC) has been empirically verified using a synthetic dataset of two- and three-dimensional shapes (Zwolak et al. in Proceedings of Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada [December 11, 2020], arXiv:2010.00500, 2020) and, more recently, has also been validated experimentally (Zwolak et al., PRX Quantum 2:020335, 2021). Here, we establish a bound on the number of rays necessary for shape classification, defined by key angular metrics, for arbitrary convex shapes. For two dimensions, we derive a lower bound on the number of rays in terms of the shape's length, diameter, and exterior angles. For convex polytopes in $\mathbb{R}^N$, we generalize this result to a similar bound given as a function of the dihedral angle and the geometrical parameters of polygonal faces. This result enables a different approach for estimating high-dimensional shapes using substantially fewer data elements than volumetric or surface-based approaches.
△ Less
Submitted 26 February, 2022; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection
Authors:
Robert A. Bridges,
Sean Oesch,
Miki E. Verma,
Michael D. Iannacone,
Kelly M. T. Huffer,
Brian Jewell,
Jeff A. Nichols,
Brian Weber,
Justin M. Beaver,
Jared M. Smith,
Daniel Scofield,
Craig Miles,
Thomas Plummer,
Mark Daniell,
Anne M. Tall
Abstract:
In this paper, we present a scientific evaluation of four prominent malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously- and never-before-seen files? Is it worth purchasing a network-level malware detector? To identify weaknesses, we tested each tool against 3,536 total files (2,554 or 72\% malicious, 982 or…
▽ More
In this paper, we present a scientific evaluation of four prominent malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously- and never-before-seen files? Is it worth purchasing a network-level malware detector? To identify weaknesses, we tested each tool against 3,536 total files (2,554 or 72\% malicious, 982 or 28\% benign) of a variety of file types, including hundreds of malicious zero-days, polyglots, and APT-style files, delivered on multiple protocols. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of the recent cost-benefit evaluation procedure of Iannacone \& Bridges. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected. Priorities for researchers and takeaways for end users are given.
△ Less
Submitted 17 August, 2022; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Ray-based classification framework for high-dimensional data
Authors:
Justyna P. Zwolak,
Sandesh S. Kalantre,
Thomas McJunkin,
Brian J. Weber,
Jacob M. Taylor
Abstract:
While classification of arbitrary structures in high dimensions may require complete quantitative information, for simple geometrical structures, low-dimensional qualitative information about the boundaries defining the structures can suffice. Rather than using dense, multi-dimensional data, we propose a deep neural network (DNN) classification framework that utilizes a minimal collection of one-d…
▽ More
While classification of arbitrary structures in high dimensions may require complete quantitative information, for simple geometrical structures, low-dimensional qualitative information about the boundaries defining the structures can suffice. Rather than using dense, multi-dimensional data, we propose a deep neural network (DNN) classification framework that utilizes a minimal collection of one-dimensional representations, called \emph{rays}, to construct the "fingerprint" of the structure(s) based on substantially reduced information. We empirically study this framework using a synthetic dataset of double and triple quantum dot devices and apply it to the classification problem of identifying the device state. We show that the performance of the ray-based classifier is already on par with traditional 2D images for low dimensional systems, while significantly cutting down the data acquisition cost.
△ Less
Submitted 26 February, 2022; v1 submitted 1 October, 2020;
originally announced October 2020.
-
The SFS Summer Research Study at UMBC: Project-Based Learning Inspires Cybersecurity Students
Authors:
Alan Sherman,
Enis Golaszewski,
Edward LaFemina,
Ethan Goldschen,
Mohammed Khan,
Lauren Mundy,
Mykah Rather,
Bryan Solis,
Wubnyonga Tete,
Edwin Valdez,
Brian Weber,
Damian Doyle,
Casey O'Brien,
Linda Oliva,
Joseph Roundy,
Jack Suess
Abstract:
May 30-June 2, 2017, Scholarship for Service (SFS) scholars at the University of Maryland, Baltimore County (UMBC) analyzed the security of a targeted aspect of the UMBC computer systems. During this hands-on study, with complete access to source code, students identified vulnerabilities, devised and implemented exploits, and suggested mitigations. As part of a pioneering program at UMBC to extend…
▽ More
May 30-June 2, 2017, Scholarship for Service (SFS) scholars at the University of Maryland, Baltimore County (UMBC) analyzed the security of a targeted aspect of the UMBC computer systems. During this hands-on study, with complete access to source code, students identified vulnerabilities, devised and implemented exploits, and suggested mitigations. As part of a pioneering program at UMBC to extend SFS scholarships to community colleges, the study helped initiate six students from two nearby community colleges, who transferred to UMBC in fall 2017 to complete their four-year degrees in computer science and information systems.
The study examined the security of a set of "NetAdmin" custom scripts that enable UMBC faculty and staff to open the UMBC firewall to allow external access to machines they control for research purposes. Students discovered vulnerabilities stemming from weak architectural design, record overflow, and failure to sanitize inputs properly. For example, they implemented a record-overflow and code-injection exploit that exfiltrated the vital API key of the UMBC firewall.
This report summarizes student activities and findings, and reflects on lessons learned for students, educators, and system administrators. Our students found the collaborative experience inspirational, students and educators appreciated the authentic case study, and IT administrators gained access to future employees and received free recommendations for improving the security of their systems. We hope that other universities can benefit from our motivational and educational strategy of teaming educators and system administrators to engage students in active project-based learning centering on focused questions about their university computer systems.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes
Authors:
Giles Tetteh,
Velizar Efremov,
Nils D. Forkert,
Matthias Schneider,
Jan Kirschke,
Bruno Weber,
Claus Zimmer,
Marie Piraud,
Bjoern H. Menze
Abstract:
We present DeepVesselNet, an architecture tailored to the challenges faced when extracting vessel networks or trees and corresponding features in 3-D angiographic volumes using deep learning. We discuss the problems of low execution speed and high memory requirements associated with full 3-D convolutional networks, high-class imbalance arising from the low percentage of vessel voxels, and unavaila…
▽ More
We present DeepVesselNet, an architecture tailored to the challenges faced when extracting vessel networks or trees and corresponding features in 3-D angiographic volumes using deep learning. We discuss the problems of low execution speed and high memory requirements associated with full 3-D convolutional networks, high-class imbalance arising from the low percentage of vessel voxels, and unavailability of accurately annotated training data - and offer solutions as the building blocks of DeepVesselNet.
First, we formulate 2-D orthogonal cross-hair filters which make use of 3-D context information at a reduced computational burden. Second, we introduce a class balancing cross-entropy loss function with false positive rate correction to handle the high-class imbalance and high false positive rate problems associated with existing loss functions. Finally, we generate synthetic dataset using a computational angiogenesis model capable of generating vascular trees under physiological constraints on local network structure and topology and use these data for transfer learning.
DeepVesselNet is optimized for segmenting and analyzing vessels, and we test the performance on a range of angiographic volumes including clinical MRA data of the human brain, as well as X-ray tomographic microscopy scans of the rat brain. Our experiments show that, by replacing 3-D filters with cross-hair filters in our network, we achieve over 23% improvement in speed, lower memory footprint, lower network complexity which prevents overfitting and comparable accuracy (with a Cox-Wilcoxon paired sample significance test p-value of 0.07 when compared to full 3-D filters). Our class balancing metric is crucial for training the network and transfer learning with synthetic data is an efficient, robust, and very generalizable approach leading to a network that excels in a variety of angiography segmentation tasks.
△ Less
Submitted 13 August, 2019; v1 submitted 25 March, 2018;
originally announced March 2018.
-
The Internet-of-Things Meets Business Process Management: A Manifesto
Authors:
Christian Janiesch,
Agnes Koschmider,
Massimo Mecella,
Barbara Weber,
Andrea Burattin,
Claudio Di Ciccio,
Giancarlo Fortino,
Avigdor Gal,
Udo Kannengiesser,
Francesco Leotta,
Felix Mannhardt,
Andrea Marrella,
Jan Mendling,
Andreas Oberweis,
Manfred Reichert,
Stefanie Rinderle-Ma,
Estefania Serral,
WenZhan Song,
Jianwen Su,
Victoria Torres,
Matthias Weidlich,
Mathias Weske,
Liang Zhang
Abstract:
The Internet of Things (IoT) refers to a network of connected devices collecting and exchanging data over the Internet. These things can be artificial or natural, and interact as autonomous agents forming a complex system. In turn, Business Process Management (BPM) was established to analyze, discover, design, implement, execute, monitor and evolve collaborative business processes within and acros…
▽ More
The Internet of Things (IoT) refers to a network of connected devices collecting and exchanging data over the Internet. These things can be artificial or natural, and interact as autonomous agents forming a complex system. In turn, Business Process Management (BPM) was established to analyze, discover, design, implement, execute, monitor and evolve collaborative business processes within and across organizations. While the IoT and BPM have been regarded as separate topics in research and practice, we strongly believe that the management of IoT applications will strongly benefit from BPM concepts, methods and technologies on the one hand; on the other one, the IoT poses challenges that will require enhancements and extensions of the current state-of-the-art in the BPM field. In this paper, we question to what extent these two paradigms can be combined and we discuss the emerging challenges.
△ Less
Submitted 28 October, 2020; v1 submitted 11 September, 2017;
originally announced September 2017.
-
ChemKED: a human- and machine-readable data standard for chemical kinetics experiments
Authors:
Bryan W. Weber,
Kyle E. Niemeyer
Abstract:
Fundamental experimental measurements of quantities such as ignition delay times, laminar flame speeds, and species profiles (among others) serve important roles in understanding fuel chemistry and validating chemical kinetic models. However, despite both the importance and abundance of such information in the literature, the community lacks a widely adopted standard format for this data. This imp…
▽ More
Fundamental experimental measurements of quantities such as ignition delay times, laminar flame speeds, and species profiles (among others) serve important roles in understanding fuel chemistry and validating chemical kinetic models. However, despite both the importance and abundance of such information in the literature, the community lacks a widely adopted standard format for this data. This impedes both sharing and wide use by the community. Here we introduce a new chemical kinetics experimental data format, ChemKED, and the related Python-based package for validating and working with ChemKED-formatted files called PyKED. We also review past and related efforts, and motivate the need for a new solution. ChemKED currently supports the representation of autoignition delay time measurements from shock tubes and rapid compression machines. ChemKED-formatted files contain all of the information needed to simulate experimental data points, including the uncertainty of the data. ChemKED is based on the YAML data serialization language, and is intended as a human- and machine-readable standard for easy creation and automated use. Development of ChemKED and PyKED occurs openly on GitHub under the BSD 3-clause license, and contributions from the community are welcome. Plans for future development include support for experimental data from laminar flame, jet stirred reactor, and speciation measurements.
△ Less
Submitted 15 November, 2017; v1 submitted 6 June, 2017;
originally announced June 2017.
-
Blockchains for Business Process Management - Challenges and Opportunities
Authors:
Jan Mendling,
Ingo Weber,
Wil van der Aalst,
Jan vom Brocke,
Cristina Cabanillas,
Florian Daniel,
Soren Debois,
Claudio Di Ciccio,
Marlon Dumas,
Schahram Dustdar,
Avigdor Gal,
Luciano Garcia-Banuelos,
Guido Governatori,
Richard Hull,
Marcello La Rosa,
Henrik Leopold,
Frank Leymann,
Jan Recker,
Manfred Reichert,
Hajo A. Reijers,
Stefanie Rinderle-Ma,
Andreas Rogge-Solti,
Michael Rosemann,
Stefan Schulte,
Munindar P. Singh
, et al. (7 additional authors not shown)
Abstract:
Blockchain technology promises a sizable potential for executing inter-organizational business processes without requiring a central party serving as a single point of trust (and failure). This paper analyzes its impact on business process management (BPM). We structure the discussion using two BPM frameworks, namely the six BPM core capabilities and the BPM lifecycle. This paper provides research…
▽ More
Blockchain technology promises a sizable potential for executing inter-organizational business processes without requiring a central party serving as a single point of trust (and failure). This paper analyzes its impact on business process management (BPM). We structure the discussion using two BPM frameworks, namely the six BPM core capabilities and the BPM lifecycle. This paper provides research directions for investigating the application of blockchain technology to BPM.
△ Less
Submitted 31 January, 2018; v1 submitted 11 April, 2017;
originally announced April 2017.
-
Cheetah Experimental Platform Web 1.0: Cleaning Pupillary Data
Authors:
Stefan Zugal,
Jakob Pinggera,
Manuel Neurauter,
Thomas Maran,
Barbara Weber
Abstract:
Recently, researchers started using cognitive load in various settings, e.g., educational psychology, cognitive load theory, or human-computer interaction. Cognitive load characterizes a tasks' demand on the limited information processing capacity of the brain. The widespread adoption of eye-tracking devices led to increased attention for objectively measuring cognitive load via pupil dilation. Ho…
▽ More
Recently, researchers started using cognitive load in various settings, e.g., educational psychology, cognitive load theory, or human-computer interaction. Cognitive load characterizes a tasks' demand on the limited information processing capacity of the brain. The widespread adoption of eye-tracking devices led to increased attention for objectively measuring cognitive load via pupil dilation. However, this approach requires a standardized data processing routine to reliably measure cognitive load. This technical report presents CEP-Web, an open source platform to providing state of the art data processing routines for cleaning pupillary data combined with a graphical user interface, enabling the management of studies and subjects. Future developments will include the support for analyzing the cleaned data as well as support for Task-Evoked Pupillary Response (TEPR) studies.
△ Less
Submitted 21 April, 2018; v1 submitted 28 March, 2017;
originally announced March 2017.
-
Maximum-Likelihood Detection for Energy-Efficient Timing Acquisition in NB-IoT
Authors:
Harald Kroll,
Matthias Korb,
Benjamin Weber,
Samuel Willi,
Qiuting Huang
Abstract:
Initial timing acquisition in narrow-band IoT (NB-IoT) devices is done by detecting a periodically transmitted known sequence. The detection has to be done at lowest possible latency, because the RF-transceiver, which dominates downlink power consumption of an NB-IoT modem, has to be turned on throughout this time. Auto-correlation detectors show low computational complexity from a signal processi…
▽ More
Initial timing acquisition in narrow-band IoT (NB-IoT) devices is done by detecting a periodically transmitted known sequence. The detection has to be done at lowest possible latency, because the RF-transceiver, which dominates downlink power consumption of an NB-IoT modem, has to be turned on throughout this time. Auto-correlation detectors show low computational complexity from a signal processing point of view at the price of a higher detection latency. In contrast a maximum likelihood cross-correlation detector achieves low latency at a higher complexity as shown in this paper. We present a hardware implementation of the maximum likelihood cross-correlation detection. The detector achieves an average detection latency which is a factor of two below that of an auto-correlation method and is able to reduce the required energy per timing acquisition by up to 34%.
△ Less
Submitted 23 November, 2016; v1 submitted 5 August, 2016;
originally announced August 2016.
-
Detection and Quantification of Flow Consistency in Business Process Models
Authors:
Andrea Burattin,
Vered Bernstein,
Manuel Neurauter,
Pnina Soffer,
Barbara Weber
Abstract:
Business process models abstract complex business processes by representing them as graphical models. Their layout, solely determined by the modeler, affects their understandability. To support the construction of understandable models it would be beneficial to systematically study this effect. However, this requires a basic set of measurable key visual features, depicting the layout properties th…
▽ More
Business process models abstract complex business processes by representing them as graphical models. Their layout, solely determined by the modeler, affects their understandability. To support the construction of understandable models it would be beneficial to systematically study this effect. However, this requires a basic set of measurable key visual features, depicting the layout properties that are meaningful to the human user. The aim of this research is thus twofold. First, to empirically identify key visual features of business process models which are perceived as meaningful to the user. Second, to show how such features can be quantified into computational metrics, which are applicable to business process models. We focus on one particular feature, consistency of flow direction, and show the challenges that arise when transforming it into a precise metric. We propose three different metrics addressing these challenges, each following a different view of flow consistency. We then report the results of an empirical evaluation, which indicates which metric is more effective in predicting the human perception of this feature. Moreover, two other automatic evaluations describing the performance and the computational capabilities of our metrics are reported as well.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.
-
Investigating the Process of Process Modeling with Eye Movement Analysis
Authors:
Jakob Pinggera,
Marco Furtner,
Markus Martini,
Pierre Sachse,
Katharina Reiter,
Stefan Zugal,
Barbara Weber
Abstract:
Research on quality issues of business process models has recently begun to explore the process of creating process models by analyzing the modeler's interactions with the modeling environment. In this paper we aim to complement previous insights on the modeler's modeling behavior with data gathered by tracking the modeler's eye movements when engaged in the act of modeling. We present preliminary…
▽ More
Research on quality issues of business process models has recently begun to explore the process of creating process models by analyzing the modeler's interactions with the modeling environment. In this paper we aim to complement previous insights on the modeler's modeling behavior with data gathered by tracking the modeler's eye movements when engaged in the act of modeling. We present preliminary results and outline directions for future research to triangulate toward a more comprehensive understanding of the process of process modeling. We believe that combining different views on the process of process modeling constitutes another building block in understanding this process that will ultimately enable us to support modelers in creating better process models.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Change Patterns for Model Creation: Investigating the Role of Nesting Depth
Authors:
Barbara Weber,
Jakob Pinggera,
Victoria Torres,
Manfred Reichert
Abstract:
Process model quality has been an area of considerable research efforts. In this context, the correctness-by-construction principle of change patterns offers a promising perspective. However, using change patterns for model creation imposes a more structured way of modeling. While the process of process modeling (PPM) based on change primitives has been investigated, little is known about this pro…
▽ More
Process model quality has been an area of considerable research efforts. In this context, the correctness-by-construction principle of change patterns offers a promising perspective. However, using change patterns for model creation imposes a more structured way of modeling. While the process of process modeling (PPM) based on change primitives has been investigated, little is known about this process based on change patterns and factors that impact the cognitive complexity of pattern usage. Insights from the field of cognitive psychology as well as observations from a pilot study suggest that the nesting depth of the model to be created has a significant impact on cognitive complexity. This paper proposes a research design to test the impact of nesting depth on the cognitive complexity of change pattern usage in an experiment.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
How Advanced Change Patterns Impact the Process of Process Modeling
Authors:
Barbara Weber,
Sarah Zeitelhofer,
Jakob Pinggera,
Victoria Torres,
Manfred Reichert
Abstract:
Process model quality has been an area of considerable research efforts. In this context, correctness-by-construction as enabled by change patterns provides promising perspectives. While the process of process modeling (PPM) based on change primitives has been thoroughly investigated, only little is known about the PPM based on change patterns. In particular, it is unclear what set of change patte…
▽ More
Process model quality has been an area of considerable research efforts. In this context, correctness-by-construction as enabled by change patterns provides promising perspectives. While the process of process modeling (PPM) based on change primitives has been thoroughly investigated, only little is known about the PPM based on change patterns. In particular, it is unclear what set of change patterns should be provided and how the available change pattern set impacts the PPM. To obtain a better understanding of the latter as well as the (subjective) perceptions of process modelers, the arising challenges, and the pros and cons of different change pattern sets we conduct a controlled experiment. Our results indicate that process modelers face similar challenges irrespective of the used change pattern set (core pattern set versus extended pattern set, which adds two advanced change patterns to the core patterns set). An extended change pattern set, however, is perceived as more difficult to use, yielding a higher mental effort. Moreover, our results indicate that more advanced patterns were only used to a limited extent and frequently applied incorrectly, thus, lowering the potential benefits of an extended pattern set.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Change Patterns in Use: A Critical Evaluation
Authors:
Barbara Weber,
Jakob Pinggera,
Victoria Torres,
Manfred Reichert
Abstract:
Process model quality has been an area of considerable research efforts. In this context, the correctness-by-construction principle of change patterns provides promising perspectives. However, using change patterns for model creation imposes a more structured way of modeling. While the process of process modeling (PPM) based on change primitives has been investigated, little is known about this pr…
▽ More
Process model quality has been an area of considerable research efforts. In this context, the correctness-by-construction principle of change patterns provides promising perspectives. However, using change patterns for model creation imposes a more structured way of modeling. While the process of process modeling (PPM) based on change primitives has been investigated, little is known about this process based on change patterns. To obtain a better understanding of the PPM when using change patterns, the arising challenges, and the subjective perceptions of process designers, we conduct an exploratory study. The results indicate that process designers face little problems as long as control-flow is simple, but have considerable problems with the usage of change patterns when complex, nested models have to be created. Finally, we outline how effective tool support for change patterns should be realized.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Expressiveness and Understandability Considerations of Hierarchy in Declarative Business Process Models
Authors:
Stefan Zugal,
Pnina Soffer,
Jakob Pinggera,
Barbara Weber
Abstract:
Hierarchy has widely been recognized as a viable approach to deal with the complexity of conceptual models. For instance, in declarative business process models, hierarchy is realized by sub-processes. While technical implementations of declarative sub-processes exist, their application, semantics, and the resulting impact on understandability are less understood yet-this research gap is addressed…
▽ More
Hierarchy has widely been recognized as a viable approach to deal with the complexity of conceptual models. For instance, in declarative business process models, hierarchy is realized by sub-processes. While technical implementations of declarative sub-processes exist, their application, semantics, and the resulting impact on understandability are less understood yet-this research gap is addressed in this work. In particular, we discuss the semantics and the application of hierarchy and show how sub-processes enhance the expressiveness of declarative modeling languages. Then, we turn to the impact on the understandability of hierarchy on a declarative process model. To systematically assess this impact, we present a cognitive-psychology based framework that allows to assess the possible impact of hierarchy on the understandability of the process model.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Modeling Styles in Business Process Modeling
Authors:
Jakob Pinggera,
Pnina Soffer,
Stefan Zugal,
Barbara Weber,
Matthias Weidlich,
Dirk Fahland,
Hajo A. Reijers,
Jan Mendling
Abstract:
Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool. The recordings of pro…
▽ More
Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool. The recordings of process modeling were subsequently clustered. Results presented in this paper suggest the existence of three distinct modeling styles, exhibiting significantly different characteristics. We believe that this finding constitutes another building block toward a more comprehensive understanding of the process of process modeling that will ultimately enable us to support modelers in creating better business process models.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
A visual analysis of the process of process modeling
Authors:
Jan Claes,
Irene Vanderfeesten,
Jakob Pinggera,
Hajo A. Reijers,
Barbara Weber,
Geert Poels
Abstract:
The construction of business process models has become an important requisite in the analysis and optimization of processes. The success of the analysis and optimization efforts heavily depends on the quality of the models. Therefore, a research domain emerged that studies the process of process modeling. This paper contributes to this research by presenting a way of visualizing the different step…
▽ More
The construction of business process models has become an important requisite in the analysis and optimization of processes. The success of the analysis and optimization efforts heavily depends on the quality of the models. Therefore, a research domain emerged that studies the process of process modeling. This paper contributes to this research by presenting a way of visualizing the different steps a modeler undertakes to construct a process model, in a so-called process of process modeling Chart. The graphical representation lowers the cognitive efforts to discover properties of the modeling process, which facilitates the research and the development of theory, training and tool support for improving model quality. The paper contains an extensive overview of applications of the tool that demonstrate its usefulness for research and practice and discusses the observations from the visualization in relation to other work. The visualization was evaluated through a qualitative study that confirmed its usefulness and added value compared to the Dotted Chart on which the visualization was inspired.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Visualizing the Process of Process Modeling with PPMCharts
Authors:
Jan Claes,
Irene Vanderfeesten,
Jakob Pinggera,
Hajo A. Reijers,
Barbara Weber,
Geert Poels
Abstract:
In the quest for knowledge about how to make good process models, recent research focus is shifting from studying the quality of process models to studying the process of process modeling (often abbreviated as PPM) itself. This paper reports on our efforts to visualize this specific process in such a way that relevant characteristics of the modeling process can be observed graphically. By recordin…
▽ More
In the quest for knowledge about how to make good process models, recent research focus is shifting from studying the quality of process models to studying the process of process modeling (often abbreviated as PPM) itself. This paper reports on our efforts to visualize this specific process in such a way that relevant characteristics of the modeling process can be observed graphically. By recording each modeling operation in a modeling process, one can build an event log that can be used as input for the PPMChart Analysis plug-in we implemented in ProM. The graphical representation this plug-in generates allows for the discovery of different patterns of the process of process modeling. It also provides different views on the process of process modeling (by configuring and filtering the charts).
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Tying Process Model Quality to the Modeling Process: The Impact of Structuring, Movement, and Speed
Authors:
Jan Claes,
Irene Vanderfeesten,
Hajo A. Reijers,
Jakob Pinggera,
Matthias Weidlich,
Stefan Zugal,
Dirk Fahland,
Barbara Weber,
Jan Mendling,
Geert Poels
Abstract:
In an investigation into the process of process modeling, we examined how modeling behavior relates to the quality of the process model that emerges from that. Specifically, we considered whether (i) a modeler's structured modeling style, (ii) the frequency of moving existing objects over the modeling canvas, and (iii) the overall modeling speed is in any way connected to the ease with which the r…
▽ More
In an investigation into the process of process modeling, we examined how modeling behavior relates to the quality of the process model that emerges from that. Specifically, we considered whether (i) a modeler's structured modeling style, (ii) the frequency of moving existing objects over the modeling canvas, and (iii) the overall modeling speed is in any way connected to the ease with which the resulting process model can be understood. In this paper, we describe the exploratory study to build these three conjectures, clarify the experimental set-up and infrastructure that was used to collect data, and explain the used metrics for the various concepts to test the conjectures empirically. We discuss various implications for research and practice from the conjectures, all of which were confirmed by the experiment.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Making Sense of Declarative Process Models: Common Strategies and Typical Pitfalls
Authors:
Cornelia Haisjackl,
Stefan Zugal,
Pnina Soffer,
Irit Hadar,
Manfred Reichert,
Jakob Pinggera,
Barbara Weber
Abstract:
Declarative approaches to process modeling are regarded as well suited for highly volatile environments as they provide a high degree of flexibility. However, problems in understanding and maintaining declarative business process models impede often their usage. In particular, how declarative models are understood has not been investigated yet. This paper takes a first step toward addressing this…
▽ More
Declarative approaches to process modeling are regarded as well suited for highly volatile environments as they provide a high degree of flexibility. However, problems in understanding and maintaining declarative business process models impede often their usage. In particular, how declarative models are understood has not been investigated yet. This paper takes a first step toward addressing this question and reports on an exploratory study investigating how analysts make sense of declarative process models. We have handed out real-world declarative process models to subjects and asked them to describe the illustrated process. Our qualitative analysis shows that subjects tried to describe the processes in a sequential way although the models represent circumstantial information, namely, conditions that produce an outcome, rather than a sequence of activities. Finally, we observed difficulties with single building blocks and combinations of relations between activities.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Strategies for Addressing Spreadsheet Compliance Challenges
Authors:
Brandon Weber
Abstract:
Most organizations today use spreadsheets in some form or another to support critical business processes. However the financial resources, and developmental rigor dedicated to them are often minor in comparison to other enterprise technology. The increasing focus on achieving regulatory and other forms of compliance over key technology assets has made it clear that organizations must regard spre…
▽ More
Most organizations today use spreadsheets in some form or another to support critical business processes. However the financial resources, and developmental rigor dedicated to them are often minor in comparison to other enterprise technology. The increasing focus on achieving regulatory and other forms of compliance over key technology assets has made it clear that organizations must regard spreadsheets as an enterprise resource and account for them when developing an overall compliance strategy. This paper provides the reader with a set of practical strategies for addressing spreadsheet compliance from an organizational perspective. It then presents capabilities offered in the 2007 Microsoft Office System which can be used to help customers address compliance challenges.
△ Less
Submitted 28 November, 2007;
originally announced November 2007.