subscribe to arXiv mailings

PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents

Authors: Saber Zerhoudi, Michael Granitzer

Abstract: Large Language Models (LLMs) struggle with generating reliable outputs due to outdated knowledge and hallucinations. Retrieval-Augmented Generation (RAG) models address this by enhancing LLMs with external knowledge, but often fail to personalize the retrieval process. This paper introduces PersonaRAG, a novel framework incorporating user-centric agents to adapt retrieval and generation based on r… ▽ More Large Language Models (LLMs) struggle with generating reliable outputs due to outdated knowledge and hallucinations. Retrieval-Augmented Generation (RAG) models address this by enhancing LLMs with external knowledge, but often fail to personalize the retrieval process. This paper introduces PersonaRAG, a novel framework incorporating user-centric agents to adapt retrieval and generation based on real-time user data and interactions. Evaluated across various question answering datasets, PersonaRAG demonstrates superiority over baseline models, providing tailored answers to user needs. The results suggest promising directions for user-adapted information retrieval systems. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08275 [pdf, other]

Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems

Authors: Laura Caspari, Kanishka Ghosh Dastidar, Saber Zerhoudi, Jelena Mitrovic, Michael Granitzer

Abstract: The choice of embedding model is a crucial step in the design of Retrieval Augmented Generation (RAG) systems. Given the sheer volume of available options, identifying clusters of similar models streamlines this model selection process. Relying solely on benchmark performance scores only allows for a weak assessment of model similarity. Thus, in this study, we evaluate the similarity of embedding… ▽ More The choice of embedding model is a crucial step in the design of Retrieval Augmented Generation (RAG) systems. Given the sheer volume of available options, identifying clusters of similar models streamlines this model selection process. Relying solely on benchmark performance scores only allows for a weak assessment of model similarity. Thus, in this study, we evaluate the similarity of embedding models within the context of RAG systems. Our assessment is two-fold: We use Centered Kernel Alignment to compare embeddings on a pair-wise level. Additionally, as it is especially pertinent to RAG systems, we evaluate the similarity of retrieval results between these models using Jaccard and rank similarity. We compare different families of embedding models, including proprietary ones, across five datasets from the popular Benchmark Information Retrieval (BEIR). Through our experiments we identify clusters of models corresponding to model families, but interestingly, also some inter-family clusters. Furthermore, our analysis of top-k retrieval similarity reveals high-variance at low k values. We also identify possible open-source alternatives to proprietary models, with Mistral exhibiting the highest similarity to OpenAI models. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.06543 [pdf, ps, other]

doi 10.1145/3605098.3636098

DriftGAN: Using historical data for Unsupervised Recurring Drift Detection

Authors: Christofer Fellicious, Sahib Julka, Lorenz Wendlinger, Michael Granitzer

Abstract: In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detectio… ▽ More In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detection methods work on detecting concept drifts and signalling the requirement to retrain the model. However, in real-world cases, there could be concept drifts that recur over a period of time. In this paper, we present an unsupervised method based on Generative Adversarial Networks(GAN) to detect concept drifts and identify whether a specific concept drift occurred in the past. Our method reduces the time and data the model requires to get up to speed for recurring drifts. Our key results indicate that our proposed model can outperform the current state-of-the-art models in most datasets. We also test our method on a real-world use case from astrophysics, where we detect the bow shock and magnetopause crossings with better results than the existing methods in the domain. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Journal ref: In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 368-369. 2024

arXiv:2406.16674 [pdf, other]

Computational Approaches to the Detection of Lesser-Known Rhetorical Figures: A Systematic Survey and Research Challenges

Authors: Ramona Kühn, Jelena Mitrović, Michael Granitzer

Abstract: Rhetorical figures play a major role in our everyday communication as they make text more interesting, more memorable, or more persuasive. Therefore, it is important to computationally detect rhetorical figures to fully understand the meaning of a text. We provide a comprehensive overview of computational approaches to lesser-known rhetorical figures. We explore the linguistic and computational pe… ▽ More Rhetorical figures play a major role in our everyday communication as they make text more interesting, more memorable, or more persuasive. Therefore, it is important to computationally detect rhetorical figures to fully understand the meaning of a text. We provide a comprehensive overview of computational approaches to lesser-known rhetorical figures. We explore the linguistic and computational perspectives on rhetorical figures, emphasizing their significance for the domain of Natural Language Processing. We present different figures in detail, delving into datasets, definitions, rhetorical functions, and detection approaches. We identified challenges such as dataset scarcity, language limitations, and reliance on rule-based methods. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Submitted to ACM Computing Surveys. 35 pages

arXiv:2404.16218 [pdf, other]

doi 10.1007/978-3-031-58553-1_13

Efficient NAS with FaDE on Hierarchical Spaces

Authors: Simon Neumeyer, Julian Stier, Michael Granitzer

Abstract: Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite reg… ▽ More Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space. Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space. △ Less

Submitted 24 April, 2024; originally announced April 2024.

ACM Class: I.2.6

Journal ref: Advances in Intelligent Data Analysis XXII. IDA 2024. Lecture Notes in Computer Science, vol 14642. Springer, Cham

arXiv:2404.02309 [pdf, other]

A Survey of Web Content Control for Generative AI

Authors: Michael Dinzinger, Florian Heß, Michael Granitzer

Abstract: The groundbreaking advancements around generative AI have recently caused a wave of concern culminating in a row of lawsuits, including high-profile actions against Stability AI and OpenAI. This situation of legal uncertainty has sparked a broad discussion on the rights of content creators and publishers to protect their intellectual property on the web. European as well as US law already provides… ▽ More The groundbreaking advancements around generative AI have recently caused a wave of concern culminating in a row of lawsuits, including high-profile actions against Stability AI and OpenAI. This situation of legal uncertainty has sparked a broad discussion on the rights of content creators and publishers to protect their intellectual property on the web. European as well as US law already provides rough guidelines, setting a direction for technical solutions to regulate web data use. In this course, researchers and practitioners have worked on numerous web standards and opt-out formats that empower publishers to keep their data out of the development of generative AI models. The emerging AI/ML opt-out protocols are valuable in regards to data sovereignty, but again, it creates an adverse situation for a site owners who are overwhelmed by the multitude of recent ad hoc standards to consider. In our work, we want to survey the different proposals, ideas and initiatives, and provide a comprehensive legal and technical background in the context of the current discussion on web publishers control. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.02261 [pdf, other]

LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

Authors: Nataliia Kholodna, Sahib Julka, Mohammad Khodadadi, Muhammed Nurullah Gumus, Michael Granitzer

Abstract: Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling, rendering them rare and costly. The scarcity of data and the absence of preexisting tools exacerbate these challenges, especially since these languages may not be adequately represented in various NLP datasets. To address this gap, we propose leveraging the potent… ▽ More Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling, rendering them rare and costly. The scarcity of data and the absence of preexisting tools exacerbate these challenges, especially since these languages may not be adequately represented in various NLP datasets. To address this gap, we propose leveraging the potential of LLMs in the active learning loop for data annotation. Initially, we conduct evaluations to assess inter-annotator agreement and consistency, facilitating the selection of a suitable LLM annotator. The chosen annotator is then integrated into a training loop for a classifier using an active learning paradigm, minimizing the amount of queried data required. Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data requirements, as indicated by estimated potential cost savings of at least 42.45 times compared to human annotation. Our proposed solution shows promising potential to substantially reduce both the monetary and computational costs associated with automation in low-resource settings. By bridging the gap between low-resource languages and AI, this approach fosters broader inclusion and shows the potential to enable automation across diverse linguistic landscapes. △ Less

Submitted 23 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 20 pages, 6 tables. The source code related to this paper is available at https://github.com/mkandai/llms-in-the-loop. This paper has been accepted for publication at ECML PKDD 2024

ACM Class: I.2.7; I.2.6

arXiv:2312.02730 [pdf, other]

Towards Measuring Representational Similarity of Large Language Models

Authors: Max Klabunde, Mehdi Ben Amor, Michael Granitzer, Florian Lemmerich

Abstract: Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others.… ▽ More Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others. We identify challenges of using representational similarity measures that suggest the need of careful study of similarity scores to avoid false conclusions. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Extended abstract in UniReps Workshop @ NeurIPS 2023

arXiv:2308.02575 [pdf, other]

doi 10.3389/feduc.2023.1272229

Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings

Authors: Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, Maximilian Sailer

Abstract: This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to le… ▽ More This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to learn more about the interrater reliability, consistency of the ratings across iterations and the correlation between ratings in terms of content and style. The results revealed a high interrater reliability with ICC scores ranging between 0.94 and 0.99 for different timespans, suggesting that GPT-4 is capable of generating consistent ratings across repetitions with a clear prompt. Style and content ratings show a high correlation of 0.87. When applying a non-adequate style the average content ratings remained constant, while style ratings decreased, which indicates that the large language model (LLM) effectively distinguishes between these two criteria during evaluation. The prompt used in this study is furthermore presented and explained. Further research is necessary to assess the robustness and reliability of AI models in various use cases. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 14 pages, 7 tables, 1 figure

arXiv:2307.06709 [pdf, other]

GRAN is superior to GraphRNN: node orderings, kernel- and graph embeddings-based metrics for graph generators

Authors: Ousmane Touat, Julian Stier, Pierre-Edouard Portier, Michael Granitzer

Abstract: A wide variety of generative models for graphs have been proposed. They are used in drug discovery, road networks, neural architecture search, and program synthesis. Generating graphs has theoretical challenges, such as isomorphic representations -- evaluating how well a generative model performs is difficult. Which model to choose depending on the application domain? We extensively study kernel… ▽ More A wide variety of generative models for graphs have been proposed. They are used in drug discovery, road networks, neural architecture search, and program synthesis. Generating graphs has theoretical challenges, such as isomorphic representations -- evaluating how well a generative model performs is difficult. Which model to choose depending on the application domain? We extensively study kernel-based metrics on distributions of graph invariants and manifold-based and kernel-based metrics in graph embedding space. Manifold-based metrics outperform kernel-based metrics in embedding space. We use these metrics to compare GraphRNN and GRAN, two well-known generative models for graphs, and unveil the influence of node orderings. It shows the superiority of GRAN over GraphRNN - further, our proposed adaptation of GraphRNN with a depth-first search ordering is effective for small-sized graphs. A guideline on good practices regarding dataset selection and node feature initialization is provided. Our work is accompanied by open-source code and reproducible experiments. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Preprint for The 9th International Conference on machine Learning, Optimization and Data science - LOD 2023

arXiv:2305.07586 [pdf, other]

Knowledge distillation with Segment Anything (SAM) model for Planetary Geological Mapping

Authors: Sahib Julka, Michael Granitzer

Abstract: Planetary science research involves analysing vast amounts of remote sensing data, which are often costly and time-consuming to annotate and process. One of the essential tasks in this field is geological mapping, which requires identifying and outlining regions of interest in planetary images, including geological features and landforms. However, manually labelling these images is a complex and c… ▽ More Planetary science research involves analysing vast amounts of remote sensing data, which are often costly and time-consuming to annotate and process. One of the essential tasks in this field is geological mapping, which requires identifying and outlining regions of interest in planetary images, including geological features and landforms. However, manually labelling these images is a complex and challenging task that requires significant domain expertise and effort. To expedite this endeavour, we propose the use of knowledge distillation using the recently introduced cutting-edge Segment Anything (SAM) model. We demonstrate the effectiveness of this prompt-based foundation model for rapid annotation and quick adaptability to a prime use case of mapping planetary skylights. Our work reveals that with a small set of annotations obtained with the right prompts from the model and subsequently training a specialised domain decoder, we can achieve satisfactory semantic segmentation on this task. Key results indicate that the use of knowledge distillation can significantly reduce the effort required by domain experts for manual annotation and improve the efficiency of image segmentation tasks. This approach has the potential to accelerate extra-terrestrial discovery by automatically detecting and segmenting Martian landforms. △ Less

Submitted 15 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2304.13567 [pdf, other]

doi 10.1145/3605098.3636126

Technical Report: Impact of Position Bias on Language Models in Token Classification

Authors: Mehdi Ben Amor, Michael Granitzer, Jelena Mitrović

Abstract: Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models,… ▽ More Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification tasks. For completeness, we also include decoders in the evaluation. We evaluate the impact of position bias using different position embedding techniques, focusing on BERT with Absolute Position Embedding (APE), Relative Position Embedding (RPE), and Rotary Position Embedding (RoPE). Therefore, we conduct an in-depth evaluation of the impact of position bias on the performance of LMs when fine-tuned on token classification benchmarks. Our study includes CoNLL03 and OntoNote5.0 for NER, English Tree Bank UD\_en, and TweeBank for POS tagging. We propose an evaluation approach to investigate position bias in transformer models. We show that LMs can suffer from this bias with an average drop ranging from 3\% to 9\% in their performance. To mitigate this effect, we propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of $\approx$ 2\% in the performance of the model on CoNLL03, UD\_en, and TweeBank. △ Less

Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Updated content of the preprint

arXiv:2303.05388 [pdf, ps, other]

doi 10.5220/0011749400003393

German BERT Model for Legal Named Entity Recognition

Authors: Harshil Darji, Jelena Mitrović, Michael Granitzer

Abstract: The use of BERT, one of the most popular language models, has led to improvements in many Natural Language Processing (NLP) tasks. One such task is Named Entity Recognition (NER) i.e. automatic identification of named entities such as location, person, organization, etc. from a given text. It is also an important base step for many NLP tasks such as information extraction and argumentation mining.… ▽ More The use of BERT, one of the most popular language models, has led to improvements in many Natural Language Processing (NLP) tasks. One such task is Named Entity Recognition (NER) i.e. automatic identification of named entities such as location, person, organization, etc. from a given text. It is also an important base step for many NLP tasks such as information extraction and argumentation mining. Even though there is much research done on NER using BERT and other popular language models, the same is not explored in detail when it comes to Legal NLP or Legal Tech. Legal NLP applies various NLP techniques such as sentence similarity or NER specifically on legal data. There are only a handful of models for NER tasks using BERT language models, however, none of these are aimed at legal documents in German. In this paper, we fine-tune a popular BERT language model trained on German data (German BERT) on a Legal Entity Recognition (LER) dataset. To make sure our model is not overfitting, we performed a stratified 10-fold cross-validation. The results we achieve by fine-tuning German BERT on the LER dataset outperform the BiLSTM-CRF+ model used by the authors of the same LER dataset. Finally, we make the model openly available via HuggingFace. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: Presented at ICAART 2023

Journal ref: Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART (2023) 723-728

arXiv:2209.05243 [pdf, ps, other]

SmartKex: Machine Learning Assisted SSH Keys Extraction From The Heap Dump

Authors: Christofer Fellicious, Stewart Sentanoe, Michael Granitzer, Hans P. Reiser

Abstract: Digital forensics is the process of extracting, preserving, and documenting evidence in digital devices. A commonly used method in digital forensics is to extract data from the main memory of a digital device. However, the main challenge is identifying the important data to be extracted. Several pieces of crucial information reside in the main memory, like usernames, passwords, and cryptographic k… ▽ More Digital forensics is the process of extracting, preserving, and documenting evidence in digital devices. A commonly used method in digital forensics is to extract data from the main memory of a digital device. However, the main challenge is identifying the important data to be extracted. Several pieces of crucial information reside in the main memory, like usernames, passwords, and cryptographic keys such as SSH session keys. In this paper, we propose SmartKex, a machine-learning assisted method to extract session keys from heap memory snapshots of an OpenSSH process. In addition, we release an openly available dataset and the corresponding toolchain for creating additional data. Finally, we compare SmartKex with naive brute-force methods and empirically show that SmartKex can extract the session keys with high accuracy and high throughput. With the provided resources, we intend to strengthen the research on the intersection between digital forensics, cybersecurity, and machine learning. △ Less

Submitted 13 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

arXiv:2204.05265 [pdf, other]

The Importance of Future Information in Credit Card Fraud Detection

Authors: Van Bach Nguyen, Kanishka Ghosh Dastidar, Michael Granitzer, Wissam Siblini

Abstract: Fraud detection systems (FDS) mainly perform two tasks: (i) real-time detection while the payment is being processed and (ii) posterior detection to block the card retrospectively and avoid further frauds. Since human verification is often necessary and the payment processing time is limited, the second task manages the largest volume of transactions. In the literature, fraud detection challenges… ▽ More Fraud detection systems (FDS) mainly perform two tasks: (i) real-time detection while the payment is being processed and (ii) posterior detection to block the card retrospectively and avoid further frauds. Since human verification is often necessary and the payment processing time is limited, the second task manages the largest volume of transactions. In the literature, fraud detection challenges and algorithms performance are widely studied but the very formulation of the problem is never disrupted: it aims at predicting if a transaction is fraudulent based on its characteristics and the past transactions of the cardholder. Yet, in posterior detection, verification often takes days, so new payments on the card become available before a decision is taken. This is our motivation to propose a new paradigm: posterior fraud detection with "future" information. We start by providing evidence of the on-time availability of subsequent transactions, usable as extra context to improve detection. We then design a Bidirectional LSTM to make use of these transactions. On a real-world dataset with over 30 million transactions, it achieves higher performance than a regular LSTM, which is the state-of-the-art classifier for fraud detection that only uses the past context. We also introduce new metrics to show that the proposal catches more frauds, more compromised cards, and based on their earliest frauds. We believe that future works on this new paradigm will have a significant impact on the detection of compromised cards. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: 11 pages, 4 figures, to be published at AISTATS 2022

arXiv:2111.06679 [pdf, other]

doi 10.1016/j.simpa.2021.100193

deepstruct -- linking deep learning and graph theory

Authors: Julian Stier, Michael Granitzer

Abstract: deepstruct connects deep learning models and graph theory such that different graph structures can be imposed on neural networks or graph structures can be extracted from trained neural network models. For this, deepstruct provides deep neural network models with different restrictions which can be created based on an initial graph. Further, tools to extract graph structures from trained models ar… ▽ More deepstruct connects deep learning models and graph theory such that different graph structures can be imposed on neural networks or graph structures can be extracted from trained neural network models. For this, deepstruct provides deep neural network models with different restrictions which can be created based on an initial graph. Further, tools to extract graph structures from trained models are available. This step of extracting graphs can be computationally expensive even for models of just a few dozen thousand parameters and poses a challenging problem. deepstruct supports research in pruning, neural architecture search, automated network design and structure analysis of neural networks. △ Less

Submitted 5 December, 2021; v1 submitted 12 November, 2021; originally announced November 2021.

ACM Class: I.2.0; F.0

Journal ref: Software Impacts, 2021

arXiv:2107.12917 [pdf, other]

Experiments on Properties of Hidden Structures of Sparse Neural Networks

Authors: Julian Stier, Harshil Darji, Michael Granitzer

Abstract: Sparsity in the structure of Neural Networks can lead to less energy consumption, less memory usage, faster computation times on convenient hardware, and automated machine learning. If sparsity gives rise to certain kinds of structure, it can explain automatically obtained features during learning. We provide insights into experiments in which we show how sparsity can be achieved through prior i… ▽ More Sparsity in the structure of Neural Networks can lead to less energy consumption, less memory usage, faster computation times on convenient hardware, and automated machine learning. If sparsity gives rise to certain kinds of structure, it can explain automatically obtained features during learning. We provide insights into experiments in which we show how sparsity can be achieved through prior initialization, pruning, and during learning, and answer questions on the relationship between the structure of Neural Networks and their performance. This includes the first work of inducing priors from network theory into Recurrent Neural Networks and an architectural performance prediction during a Neural Architecture Search. Within our experiments, we show how magnitude class blinded pruning achieves 97.5% on MNIST with 80% compression and re-training, which is 0.5 points more than without compression, that magnitude class uniform pruning is significantly inferior to it and how a genetic search enhanced with performance prediction achieves 82.4% on CIFAR10. Further, performance prediction for Recurrent Networks learning the Reber grammar shows an $R^2$ of up to 0.81 given only structural information. △ Less

Submitted 27 July, 2021; originally announced July 2021.

arXiv:2107.06158 [pdf, other]

doi 10.1016/j.procs.2021.09.182

Correlation Analysis between the Robustness of Sparse Neural Networks and their Random Hidden Structural Priors

Authors: M. Ben Amor, J. Stier, M. Granitzer

Abstract: Deep learning models have been shown to be vulnerable to adversarial attacks. This perception led to analyzing deep learning models not only from the perspective of their performance measures but also their robustness to certain types of adversarial attacks. We take another step forward in relating the architectural structure of neural networks from a graph theoretic perspective to their robustnes… ▽ More Deep learning models have been shown to be vulnerable to adversarial attacks. This perception led to analyzing deep learning models not only from the perspective of their performance measures but also their robustness to certain types of adversarial attacks. We take another step forward in relating the architectural structure of neural networks from a graph theoretic perspective to their robustness. We aim to investigate any existing correlations between graph theoretic properties and the robustness of Sparse Neural Networks. Our hypothesis is, that graph theoretic properties as a prior of neural network structures are related to their robustness. To answer to this hypothesis, we designed an empirical study with neural network models obtained through random graphs used as sparse structural priors for the networks. We additionally investigated the evaluation of a randomly pruned fully connected network as a point of reference. We found that robustness measures are independent of initialization methods but show weak correlations with graph properties: higher graph densities correlate with lower robustness, but higher average path lengths and average node eccentricities show negative correlations with robustness measures. We hope to motivate further empirical and analytical research to tightening an answer to our hypothesis. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Journal ref: Procedia Computer Science 192C (2021) pp. 4073-4082

arXiv:2103.11471 [pdf, other]

Conditional Generative Adversarial Networks for Speed Control in Trajectory Simulation

Authors: Sahib Julka, Vishal Sowrirajan, Joerg Schloetterer, Michael Granitzer

Abstract: Motion behaviour is driven by several factors -- goals, presence and actions of neighbouring agents, social relations, physical and social norms, the environment with its variable characteristics, and further. Most factors are not directly observable and must be modelled from context. Trajectory prediction, is thus a hard problem, and has seen increasing attention from researchers in the recent ye… ▽ More Motion behaviour is driven by several factors -- goals, presence and actions of neighbouring agents, social relations, physical and social norms, the environment with its variable characteristics, and further. Most factors are not directly observable and must be modelled from context. Trajectory prediction, is thus a hard problem, and has seen increasing attention from researchers in the recent years. Prediction of motion, in application, must be realistic, diverse and controllable. In spite of increasing focus on multimodal trajectory generation, most methods still lack means for explicitly controlling different modes of the data generation. Further, most endeavours invest heavily in designing special mechanisms to learn the interactions in latent space. We present Conditional Speed GAN (CSG), that allows controlled generation of diverse and socially acceptable trajectories, based on user controlled speed. During prediction, CSG forecasts future speed from latent space and conditions its generation based on it. CSG is comparable to state-of-the-art GAN methods in terms of the benchmark distance metrics, while being simple and useful for simulation and data augmentation for different contexts such as fast or slow paced environments. Additionally, we compare the effect of different aggregation mechanisms and show that a naive approach of concatenation works comparable to its attention and pooling alternatives. △ Less

Submitted 21 March, 2021; originally announced March 2021.

arXiv:2010.15996 [pdf, other]

Lessons Learned from the 1st ARIEL Machine Learning Challenge: Correcting Transiting Exoplanet Light Curves for Stellar Spots

Authors: Nikolaos Nikolaou, Ingo P. Waldmann, Angelos Tsiaras, Mario Morvan, Billy Edwards, Kai Hou Yip, Giovanna Tinetti, Subhajit Sarkar, James M. Dawson, Vadim Borisov, Gjergji Kasneci, Matej Petkovic, Tomaz Stepisnik, Tarek Al-Ubaidi, Rachel Louise Bailey, Michael Granitzer, Sahib Julka, Roman Kern, Patrick Ofner, Stefan Wagner, Lukas Heppe, Mirko Bunse, Katharina Morik

Abstract: The last decade has witnessed a rapid growth of the field of exoplanet discovery and characterisation. However, several big challenges remain, many of which could be addressed using machine learning methodology. For instance, the most prolific method for detecting exoplanets and inferring several of their characteristics, transit photometry, is very sensitive to the presence of stellar spots. The… ▽ More The last decade has witnessed a rapid growth of the field of exoplanet discovery and characterisation. However, several big challenges remain, many of which could be addressed using machine learning methodology. For instance, the most prolific method for detecting exoplanets and inferring several of their characteristics, transit photometry, is very sensitive to the presence of stellar spots. The current practice in the literature is to identify the effects of spots visually and correct for them manually or discard the affected data. This paper explores a first step towards fully automating the efficient and precise derivation of transit depths from transit light curves in the presence of stellar spots. The methods and results we present were obtained in the context of the 1st Machine Learning Challenge organized for the European Space Agency's upcoming Ariel mission. We first present the problem, the simulated Ariel-like data and outline the Challenge while identifying best practices for organizing similar challenges in the future. Finally, we present the solutions obtained by the top-5 winning teams, provide their code and discuss their implications. Successful solutions either construct highly non-linear (w.r.t. the raw data) models with minimal preprocessing -deep neural networks and ensemble methods- or amount to obtaining meaningful statistics from the light curves, constructing linear models on which yields comparably good predictive performance. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: 20 pages, 7 figures, 2 tables, Submitted to The Astrophysics Journal (ApJ)

arXiv:2010.12472 [pdf, other]

HateBERT: Retraining BERT for Abusive Language Detection in English

Authors: Tommaso Caselli, Valerio Basile, Jelena Mitrović, Michael Granitzer

Abstract: In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language mo… ▽ More In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the abuse-inclined version obtained by retraining with posts from the banned communities on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the generic pre-trained language model and its corresponding abusive language-inclined counterpart across the datasets, indicating that portability is affected by compatibility of the annotated phenomena. △ Less

Submitted 4 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

arXiv:2007.06985 [pdf, other]

ADSAGE: Anomaly Detection in Sequences of Attributed Graph Edges applied to insider threat detection at fine-grained level

Authors: Mathieu Garchery, Michael Granitzer

Abstract: Previous works on the CERT insider threat detection case have neglected graph and text features despite their relevance to describe user behavior. Additionally, existing systems heavily rely on feature engineering and audit data aggregation to detect malicious activities. This is time consuming, requires expert knowledge and prevents tracing back alerts to precise user actions. To address these is… ▽ More Previous works on the CERT insider threat detection case have neglected graph and text features despite their relevance to describe user behavior. Additionally, existing systems heavily rely on feature engineering and audit data aggregation to detect malicious activities. This is time consuming, requires expert knowledge and prevents tracing back alerts to precise user actions. To address these issues we introduce ADSAGE to detect anomalies in audit log events modeled as graph edges. Our general method is the first to perform anomaly detection at edge level while supporting both edge sequences and attributes, which can be numeric, categorical or even text. We describe how ADSAGE can be used for fine-grained, event level insider threat detection in different audit logs from the CERT use case. Remarking that there is no standard benchmark for the CERT problem, we use a previously proposed evaluation setting based on realistic recall-based metrics. We evaluate ADSAGE on authentication, email traffic and web browsing logs from the CERT insider threat datasets, as well as on real-world authentication events. ADSAGE is effective to detect anomalies in authentications, modeled as user to computer interactions, and in email communications. Simple baselines give surprisingly strong results as well. We also report performance split by malicious scenarios present in the CERT datasets: interestingly, several detectors are complementary and could be combined to improve detection. Overall, our results show that graph features are informative to characterize malicious insider activities, and that detection at fine-grained level is possible. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2006.04159 [pdf, other]

doi 10.1007/978-3-030-74251-5_25

DeepGG: a Deep Graph Generator

Authors: Julian Stier, Michael Granitzer

Abstract: Learning distributions of graphs can be used for automatic drug discovery, molecular design, complex network analysis, and much more. We present an improved framework for learning generative models of graphs based on the idea of deep state machines. To learn state transition decisions we use a set of graph and node embedding techniques as memory of the state machine. Our analysis is based on lea… ▽ More Learning distributions of graphs can be used for automatic drug discovery, molecular design, complex network analysis, and much more. We present an improved framework for learning generative models of graphs based on the idea of deep state machines. To learn state transition decisions we use a set of graph and node embedding techniques as memory of the state machine. Our analysis is based on learning the distribution of random graph generators for which we provide statistical tests to determine which properties can be learned and how well the original distribution of graphs is represented. We show that the design of the state machine favors specific distributions. Models of graphs of size up to 150 vertices are learned. Code and parameters are publicly available to reproduce our results. △ Less

Submitted 25 November, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: 8 pages condensed preprint with github link, under review

Journal ref: IDA 2021: Advances in Intelligent Data Analysis XIX

arXiv:2002.07252 [pdf, ps, other]

doi 10.1109/ICCC.2019.00026

Investigating Extensions to Random Walk Based Graph Embedding

Authors: Joerg Schloetterer, Martin Wehking, Fatemeh Salehi Rizi, Michael Granitzer

Abstract: Graph embedding has recently gained momentum in the research community, in particular after the introduction of random walk and neural network based approaches. However, most of the embedding approaches focus on representing the local neighborhood of nodes and fail to capture the global graph structure, i.e. to retain the relations to distant nodes. To counter that problem, we propose a novel exte… ▽ More Graph embedding has recently gained momentum in the research community, in particular after the introduction of random walk and neural network based approaches. However, most of the embedding approaches focus on representing the local neighborhood of nodes and fail to capture the global graph structure, i.e. to retain the relations to distant nodes. To counter that problem, we propose a novel extension to random walk based graph embedding, which removes a percentage of least frequent nodes from the walks at different levels. By this removal, we simulate farther distant nodes to reside in the close neighborhood of a node and hence explicitly represent their connection. Besides the common evaluation tasks for graph embeddings, such as node classification and link prediction, we evaluate and compare our approach against related methods on shortest path approximation. The results indicate, that extensions to random walk based methods (including our own) improve the predictive performance only slightly - if at all. △ Less

Submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.06685 [pdf, other]

doi 10.1109/DEXA.2017.36

Global and Local Feature Learning for Ego-Network Analysis

Authors: Fatemeh Salehi Rizi, Michael Granitzer, Konstantin Ziegler

Abstract: In an ego-network, an individual (ego) organizes its friends (alters) in different groups (social circles). This social network can be efficiently analyzed after learning representations of the ego and its alters in a low-dimensional, real vector space. These representations are then easily exploited via statistical models for tasks such as social circle detection and prediction. Recent advances i… ▽ More In an ego-network, an individual (ego) organizes its friends (alters) in different groups (social circles). This social network can be efficiently analyzed after learning representations of the ego and its alters in a low-dimensional, real vector space. These representations are then easily exploited via statistical models for tasks such as social circle detection and prediction. Recent advances in language modeling via deep learning have inspired new methods for learning network representations. These methods can capture the global structure of networks. In this paper, we evolve these techniques to also encode the local structure of neighborhoods. Therefore, our local representations capture network features that are hidden in the global representation of large networks. We show that the task of social circle prediction benefits from a combination of global and local features generated by our technique. △ Less

Submitted 16 February, 2020; originally announced February 2020.

arXiv:2002.06665 [pdf, other]

doi 10.1145/3297280.3297622

Predicting event attendance exploring social influence

Authors: Fatemeh Salehi Rizi, Michael Granitzer

Abstract: The problem of predicting people's participation in real-world events has received considerable attention as it offers valuable insights for human behavior analysis and event-related advertisement. Today social networks (e.g. Twitter) widely reflect large popular events where people discuss their interest with friends. Event participants usually stimulate friends to join the event which propagates… ▽ More The problem of predicting people's participation in real-world events has received considerable attention as it offers valuable insights for human behavior analysis and event-related advertisement. Today social networks (e.g. Twitter) widely reflect large popular events where people discuss their interest with friends. Event participants usually stimulate friends to join the event which propagates a social influence in the network. In this paper, we propose to model the social influence of friends on event attendance. We consider non-geotagged posts besides structures of social groups to infer users' attendance. To leverage the information on network topology we apply some of recent graph embedding techniques such as node2vec, HARP and Poincar`e. We describe the approach followed to design the feature space and feed it to a neural network. The performance evaluation is conducted using two large music festivals datasets, namely the VFestival and Creamfields. The experimental results show that our classifier outperforms the state-of-the-art baseline with 89% accuracy observed for the VFestival dataset. △ Less

Submitted 16 February, 2020; originally announced February 2020.

arXiv:2002.05257 [pdf, other]

doi 10.1109/ASONAM.2018.8508763

Shortest path distance approximation using deep learning techniques

Authors: Fatemeh Salehi Rizi, Joerg Schloetterer, Michael Granitzer

Abstract: Computing shortest path distances between nodes lies at the heart of many graph algorithms and applications. Traditional exact methods such as breadth-first-search (BFS) do not scale up to contemporary, rapidly evolving today's massive networks. Therefore, it is required to find approximation methods to enable scalable graph processing with a significant speedup. In this paper, we utilize vector e… ▽ More Computing shortest path distances between nodes lies at the heart of many graph algorithms and applications. Traditional exact methods such as breadth-first-search (BFS) do not scale up to contemporary, rapidly evolving today's massive networks. Therefore, it is required to find approximation methods to enable scalable graph processing with a significant speedup. In this paper, we utilize vector embeddings learnt by deep learning techniques to approximate the shortest paths distances in large graphs. We show that a feedforward neural network fed with embeddings can approximate distances with relatively low distortion error. The suggested method is evaluated on the Facebook, BlogCatalog, Youtube and Flickr social networks. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:1912.12283 [pdf, other]

Competitive Influence Maximization: Integrating Budget Allocation and Seed Selection

Authors: Amirhossein Ansari, Masoud Dadgar, Ali Hamzeh, Jörg Schlötterer, Michael Granitzer

Abstract: Today, many companies take advantage of viral marketing to promote their new products, and since there are several competing companies in many markets, Competitive Influence Maximization has attracted much attention. Two categories of studies exist in the literature. First, studies that focus on which nodes from the network to select considering the existence of the opponents. Second, studies that… ▽ More Today, many companies take advantage of viral marketing to promote their new products, and since there are several competing companies in many markets, Competitive Influence Maximization has attracted much attention. Two categories of studies exist in the literature. First, studies that focus on which nodes from the network to select considering the existence of the opponents. Second, studies that focus on the problem of budget allocation. In this work, we integrate these two lines of research and propose a new scenario where competition happens in two phases. First, parties identify the most influential nodes of the network. Then they compete over only these influential nodes by the amount of budget they allocate to each node. We provide a strong motivation for this integration. Also, in previous budget allocation studies, any fraction of the budget could be allocated to a node, which means that the action space was continuous. However, in our scenario, we consider that the action space is discrete. This consideration is far more realistic, and it significantly changes the problem. We model the scenario as a game and propose a novel framework for calculating the Nash equilibrium. Notably, building an efficient framework for this problem with such a huge action space and also handling the stochastic environment of influence maximization is very challenging. To tackle these difficulties, we devise a new payoff estimation method and a novel best response oracle to boost the efficiency of our framework. Also, we validate each aspect of our work through extensive experiments. △ Less

Submitted 27 December, 2019; originally announced December 2019.

Comments: 16 pages, 2 figure

arXiv:1912.04022 [pdf, other]

Parallel Total Variation Distance Estimation with Neural Networks for Merging Over-Clusterings

Authors: Christian Reiser, Jörg Schlötterer, Michael Granitzer

Abstract: We consider the initial situation where a dataset has been over-partitioned into $k$ clusters and seek a domain independent way to merge those initial clusters. We identify the total variation distance (TVD) as suitable for this goal. By exploiting the relation of the TVD to the Bayes accuracy we show how neural networks can be used to estimate TVDs between all pairs of clusters in parallel. Cruci… ▽ More We consider the initial situation where a dataset has been over-partitioned into $k$ clusters and seek a domain independent way to merge those initial clusters. We identify the total variation distance (TVD) as suitable for this goal. By exploiting the relation of the TVD to the Bayes accuracy we show how neural networks can be used to estimate TVDs between all pairs of clusters in parallel. Crucially, the needed memory space is decreased by reducing the required number of output neurons from $k^2$ to $k$. On realistically obtained over-clusterings of ImageNet subsets it is demonstrated that our TVD estimates lead to better merge decisions than those obtained by relying on state-of-the-art unsupervised representations. Further the generality of the approach is verified by evaluating it on a a point cloud dataset. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1910.08926 [pdf, other]

Policy Learning for Malaria Control

Authors: Van Bach Nguyen, Belaid Mohamed Karim, Bao Long Vu, Jörg Schlötterer, Michael Granitzer

Abstract: Sequential decision making is a typical problem in reinforcement learning with plenty of algorithms to solve it. However, only a few of them can work effectively with a very small number of observations. In this report, we introduce the progress to learn the policy for Malaria Control as a Reinforcement Learning problem in the KDD Cup Challenge 2019 and propose diverse solutions to deal with the l… ▽ More Sequential decision making is a typical problem in reinforcement learning with plenty of algorithms to solve it. However, only a few of them can work effectively with a very small number of observations. In this report, we introduce the progress to learn the policy for Malaria Control as a Reinforcement Learning problem in the KDD Cup Challenge 2019 and propose diverse solutions to deal with the limited observations problem. We apply the Genetic Algorithm, Bayesian Optimization, Q-learning with sequence breaking to find the optimal policy for five years in a row with only 20 episodes/100 evaluations. We evaluate those algorithms and compare their performance with Random Search as a baseline. Among these algorithms, Q-Learning with sequence breaking has been submitted to the challenge and got ranked 7th in KDD Cup. △ Less

Submitted 20 October, 2019; originally announced October 2019.

arXiv:1910.07225 [pdf, other]

doi 10.1016/j.procs.2019.09.165

Structural Analysis of Sparse Neural Networks

Authors: Julian Stier, Michael Granitzer

Abstract: Sparse Neural Networks regained attention due to their potential for mathematical and computational advantages. We give motivation to study Artificial Neural Networks (ANNs) from a network science perspective, provide a technique to embed arbitrary Directed Acyclic Graphs into ANNs and report study results on predicting the performance of image classifiers based on the structural properties of the… ▽ More Sparse Neural Networks regained attention due to their potential for mathematical and computational advantages. We give motivation to study Artificial Neural Networks (ANNs) from a network science perspective, provide a technique to embed arbitrary Directed Acyclic Graphs into ANNs and report study results on predicting the performance of image classifiers based on the structural properties of the networks' underlying graph. Results could further progress neuroevolution and add explanations for the success of distinct architectures from a structural perspective. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Comments: 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

Journal ref: Procedia Computer Science, 2018

arXiv:1909.01185 [pdf, other]

Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Authors: Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Liyun He-Guelton, Olivier Caelen, Michael Granitzer, Sylvie Calabretto

Abstract: Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn't contain a fraud (ii) The seque… ▽ More Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn't contain a fraud (ii) The sequence is obtained by fixing the card-holder or the payment terminal (iii) It is a sequence of spent amount or of elapsed time between the current and previous transactions. Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sequences is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. Our multiple perspectives HMM-based approach offers automated feature engineering to model temporal correlations so as to improve the effectiveness of the classification task and allows for an increase in the detection of fraudulent transactions when combined with the state of the art expert based feature engineering strategy for credit card fraud detection. In extension to previous works, we show that this approach goes beyond ecommerce transactions and provides a robust feature engineering over different datasets, hyperparameters and classifiers. Moreover, we compare strategies to deal with structural missing values. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: published in the journal "future generation computer systems", in the special issue: "data exploration in the web 3.0 age"

arXiv:1906.06977 [pdf, other]

Dataset shift quantification for credit card fraud detection

Authors: Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Sylvie Calabretto, Liyun He-Guelton, Frederic Oblé, Michael Granitzer

Abstract: Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions datas… ▽ More Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions dataset (card holder located in the shop) . In practice, we classify the days against each other and measure the efficiency of the classification. The more efficient the classification, the more different the buying behaviour between two days, and vice versa. Therefore, we obtain a distance matrix characterizing the dataset shift. After an agglomerative clustering of the distance matrix, we observe that the dataset shift pattern matches the calendar events for this time period (holidays, week-ends, etc). We then incorporate this dataset shift knowledge in the credit card fraud detection task as a new feature. This leads to a small improvement of the detection. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: Presented at IEEE Artificial Intelligence and Knowledge Engineering (AIKE 2019)

arXiv:1905.06247 [pdf, other]

Multiple perspectives HMM-based feature engineering for credit card fraud detection

Authors: Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Olivier Caelen, Liyun He-Guelton, Sylvie Calabretto, Michael Granitzer

Abstract: Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this article, we model a sequence of credit card transactions from three different perspectives, namely (i) does the sequence contain a Fraud? (ii) Is the sequence obtaine… ▽ More Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this article, we model a sequence of credit card transactions from three different perspectives, namely (i) does the sequence contain a Fraud? (ii) Is the sequence obtained by fixing the card-holder or the payment terminal? (iii) Is it a sequence of spent amount or of elapsed time between the current and previous transactions? Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sets is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. This multiple perspectives HMM-based approach enables an automatic feature engineering in order to model the sequential properties of the dataset with respect to the classification task. This strategy allows for a 15% increase in the precision-recall AUC compared to the state of the art feature engineering strategy for credit card fraud detection. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: Presented as a poster in the conference SAC 2019: 34th ACM/SIGAPP Symposium on Applied Computing in April 2019

arXiv:1904.08166 [pdf, other]

doi 10.1016/j.procs.2018.07.257

Analysing Neural Network Topologies: a Game Theoretic Approach

Authors: Julian Stier, Gabriele Gianini, Michael Granitzer, Konstantin Ziegler

Abstract: Artificial Neural Networks have shown impressive success in very different application cases. Choosing a proper network architecture is a critical decision for a network's success, usually done in a manual manner. As a straightforward strategy, large, mostly fully connected architectures are selected, thereby relying on a good optimization strategy to find proper weights while at the same time avo… ▽ More Artificial Neural Networks have shown impressive success in very different application cases. Choosing a proper network architecture is a critical decision for a network's success, usually done in a manual manner. As a straightforward strategy, large, mostly fully connected architectures are selected, thereby relying on a good optimization strategy to find proper weights while at the same time avoiding overfitting. However, large parts of the final network are redundant. In the best case, large parts of the network become simply irrelevant for later inferencing. In the worst case, highly parameterized architectures hinder proper optimization and allow the easy creation of adverserial examples fooling the network. A first step in removing irrelevant architectural parts lies in identifying those parts, which requires measuring the contribution of individual components such as neurons. In previous work, heuristics based on using the weight distribution of a neuron as contribution measure have shown some success, but do not provide a proper theoretical understanding. Therefore, in our work we investigate game theoretic measures, namely the Shapley value (SV), in order to separate relevant from irrelevant parts of an artificial neural network. We begin by designing a coalitional game for an artificial neural network, where neurons form coalitions and the average contributions of neurons to coalitions yield to the Shapley value. In order to measure how well the Shapley value measures the contribution of individual neurons, we remove low-contributing neurons and measure its impact on the network performance. In our experiments we show that the Shapley value outperforms other heuristics for measuring the contribution of neurons. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Journal ref: Procedia Computer Science 126 (2018): 234-243

arXiv:1409.1357 [pdf, other]

Recommending Scientific Literature: Comparing Use-Cases and Algorithms

Authors: Roman Kern, Kris Jack, Michael Granitzer

Abstract: An important aspect of a researcher's activities is to find relevant and related publications. The task of a recommender system for scientific publications is to provide a list of papers that match these criteria. Based on the collection of publications managed by Mendeley, four data sets have been assembled that reflect different aspects of relatedness. Each of these relatedness scenarios reflect… ▽ More An important aspect of a researcher's activities is to find relevant and related publications. The task of a recommender system for scientific publications is to provide a list of papers that match these criteria. Based on the collection of publications managed by Mendeley, four data sets have been assembled that reflect different aspects of relatedness. Each of these relatedness scenarios reflect a user's search strategy. These scenarios are public groups, venues, author publications and user libraries. The first three of these data sets are being made publicly available for other researchers to compare algorithms against. Three recommender systems have been implemented: a collaborative filtering system; a content-based filtering system; and a hybrid of these two systems. Results from testing demonstrate that collaborative filtering slightly outperforms the content-based approach, but fails in some scenarios. The hybrid system, that combines the two recommendation methods, provides the best performance, achieving a precision of up to 70%. This suggests that both techniques contribute complementary information in the context of recommending scientific literature and different approaches suite for different information needs. △ Less

Submitted 4 September, 2014; originally announced September 2014.

Comments: 12 pages, 2 figures, 5 tables

ACM Class: H.3.3; H.3.7

arXiv:1406.3188 [pdf, ps, other]

Assessing the Quality of Web Content

Authors: Elisabeth Lex, Inayat Khan, Horst Bischof, Michael Granitzer

Abstract: This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a… ▽ More This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010. △ Less

Submitted 12 June, 2014; originally announced June 2014.

Comments: 4 pages, ECML/PKDD 2010 Discovery Challenge Workshop

ACM Class: H.4; D.2.8

Showing 1–37 of 37 results for author: Granitzer, M