Skip to main content

Showing 1–37 of 37 results for author: Granitzer, M

  1. arXiv:2407.09394  [pdf, other

    cs.IR

    PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents

    Authors: Saber Zerhoudi, Michael Granitzer

    Abstract: Large Language Models (LLMs) struggle with generating reliable outputs due to outdated knowledge and hallucinations. Retrieval-Augmented Generation (RAG) models address this by enhancing LLMs with external knowledge, but often fail to personalize the retrieval process. This paper introduces PersonaRAG, a novel framework incorporating user-centric agents to adapt retrieval and generation based on r… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.08275  [pdf, other

    cs.IR

    Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems

    Authors: Laura Caspari, Kanishka Ghosh Dastidar, Saber Zerhoudi, Jelena Mitrovic, Michael Granitzer

    Abstract: The choice of embedding model is a crucial step in the design of Retrieval Augmented Generation (RAG) systems. Given the sheer volume of available options, identifying clusters of similar models streamlines this model selection process. Relying solely on benchmark performance scores only allows for a weak assessment of model similarity. Thus, in this study, we evaluate the similarity of embedding… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. DriftGAN: Using historical data for Unsupervised Recurring Drift Detection

    Authors: Christofer Fellicious, Sahib Julka, Lorenz Wendlinger, Michael Granitzer

    Abstract: In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detectio… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Journal ref: In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 368-369. 2024

  4. arXiv:2406.16674  [pdf, other

    cs.CL

    Computational Approaches to the Detection of Lesser-Known Rhetorical Figures: A Systematic Survey and Research Challenges

    Authors: Ramona Kühn, Jelena Mitrović, Michael Granitzer

    Abstract: Rhetorical figures play a major role in our everyday communication as they make text more interesting, more memorable, or more persuasive. Therefore, it is important to computationally detect rhetorical figures to fully understand the meaning of a text. We provide a comprehensive overview of computational approaches to lesser-known rhetorical figures. We explore the linguistic and computational pe… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Submitted to ACM Computing Surveys. 35 pages

  5. Efficient NAS with FaDE on Hierarchical Spaces

    Authors: Simon Neumeyer, Julian Stier, Michael Granitzer

    Abstract: Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite reg… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    ACM Class: I.2.6

    Journal ref: Advances in Intelligent Data Analysis XXII. IDA 2024. Lecture Notes in Computer Science, vol 14642. Springer, Cham

  6. arXiv:2404.02309  [pdf, other

    cs.IR

    A Survey of Web Content Control for Generative AI

    Authors: Michael Dinzinger, Florian Heß, Michael Granitzer

    Abstract: The groundbreaking advancements around generative AI have recently caused a wave of concern culminating in a row of lawsuits, including high-profile actions against Stability AI and OpenAI. This situation of legal uncertainty has sparked a broad discussion on the rights of content creators and publishers to protect their intellectual property on the web. European as well as US law already provides… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  7. arXiv:2404.02261  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

    Authors: Nataliia Kholodna, Sahib Julka, Mohammad Khodadadi, Muhammed Nurullah Gumus, Michael Granitzer

    Abstract: Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling, rendering them rare and costly. The scarcity of data and the absence of preexisting tools exacerbate these challenges, especially since these languages may not be adequately represented in various NLP datasets. To address this gap, we propose leveraging the potent… ▽ More

    Submitted 23 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 20 pages, 6 tables. The source code related to this paper is available at https://github.com/mkandai/llms-in-the-loop. This paper has been accepted for publication at ECML PKDD 2024

    ACM Class: I.2.7; I.2.6

  8. arXiv:2312.02730  [pdf, other

    cs.LG cs.CL

    Towards Measuring Representational Similarity of Large Language Models

    Authors: Max Klabunde, Mehdi Ben Amor, Michael Granitzer, Florian Lemmerich

    Abstract: Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others.… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Extended abstract in UniReps Workshop @ NeurIPS 2023

  9. Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings

    Authors: Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, Maximilian Sailer

    Abstract: This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to le… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 14 pages, 7 tables, 1 figure

  10. arXiv:2307.06709  [pdf, other

    cs.LG cs.AI cs.NE

    GRAN is superior to GraphRNN: node orderings, kernel- and graph embeddings-based metrics for graph generators

    Authors: Ousmane Touat, Julian Stier, Pierre-Edouard Portier, Michael Granitzer

    Abstract: A wide variety of generative models for graphs have been proposed. They are used in drug discovery, road networks, neural architecture search, and program synthesis. Generating graphs has theoretical challenges, such as isomorphic representations -- evaluating how well a generative model performs is difficult. Which model to choose depending on the application domain? We extensively study kernel… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Preprint for The 9th International Conference on machine Learning, Optimization and Data science - LOD 2023

  11. arXiv:2305.07586  [pdf, other

    cs.CV eess.SY

    Knowledge distillation with Segment Anything (SAM) model for Planetary Geological Mapping

    Authors: Sahib Julka, Michael Granitzer

    Abstract: Planetary science research involves analysing vast amounts of remote sensing data, which are often costly and time-consuming to annotate and process. One of the essential tasks in this field is geological mapping, which requires identifying and outlining regions of interest in planetary images, including geological features and landforms. However, manually labelling these images is a complex and c… ▽ More

    Submitted 15 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

  12. Technical Report: Impact of Position Bias on Language Models in Token Classification

    Authors: Mehdi Ben Amor, Michael Granitzer, Jelena Mitrović

    Abstract: Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models,… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Updated content of the preprint

  13. German BERT Model for Legal Named Entity Recognition

    Authors: Harshil Darji, Jelena Mitrović, Michael Granitzer

    Abstract: The use of BERT, one of the most popular language models, has led to improvements in many Natural Language Processing (NLP) tasks. One such task is Named Entity Recognition (NER) i.e. automatic identification of named entities such as location, person, organization, etc. from a given text. It is also an important base step for many NLP tasks such as information extraction and argumentation mining.… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Presented at ICAART 2023

    Journal ref: Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART (2023) 723-728

  14. arXiv:2209.05243  [pdf, ps, other

    cs.CR cs.LG

    SmartKex: Machine Learning Assisted SSH Keys Extraction From The Heap Dump

    Authors: Christofer Fellicious, Stewart Sentanoe, Michael Granitzer, Hans P. Reiser

    Abstract: Digital forensics is the process of extracting, preserving, and documenting evidence in digital devices. A commonly used method in digital forensics is to extract data from the main memory of a digital device. However, the main challenge is identifying the important data to be extracted. Several pieces of crucial information reside in the main memory, like usernames, passwords, and cryptographic k… ▽ More

    Submitted 13 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

  15. arXiv:2204.05265  [pdf, other

    cs.LG

    The Importance of Future Information in Credit Card Fraud Detection

    Authors: Van Bach Nguyen, Kanishka Ghosh Dastidar, Michael Granitzer, Wissam Siblini

    Abstract: Fraud detection systems (FDS) mainly perform two tasks: (i) real-time detection while the payment is being processed and (ii) posterior detection to block the card retrospectively and avoid further frauds. Since human verification is often necessary and the payment processing time is limited, the second task manages the largest volume of transactions. In the literature, fraud detection challenges… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: 11 pages, 4 figures, to be published at AISTATS 2022

  16. deepstruct -- linking deep learning and graph theory

    Authors: Julian Stier, Michael Granitzer

    Abstract: deepstruct connects deep learning models and graph theory such that different graph structures can be imposed on neural networks or graph structures can be extracted from trained neural network models. For this, deepstruct provides deep neural network models with different restrictions which can be created based on an initial graph. Further, tools to extract graph structures from trained models ar… ▽ More

    Submitted 5 December, 2021; v1 submitted 12 November, 2021; originally announced November 2021.

    ACM Class: I.2.0; F.0

    Journal ref: Software Impacts, 2021

  17. arXiv:2107.12917  [pdf, other

    cs.LG cs.AI cs.NE

    Experiments on Properties of Hidden Structures of Sparse Neural Networks

    Authors: Julian Stier, Harshil Darji, Michael Granitzer

    Abstract: Sparsity in the structure of Neural Networks can lead to less energy consumption, less memory usage, faster computation times on convenient hardware, and automated machine learning. If sparsity gives rise to certain kinds of structure, it can explain automatically obtained features during learning. We provide insights into experiments in which we show how sparsity can be achieved through prior i… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  18. Correlation Analysis between the Robustness of Sparse Neural Networks and their Random Hidden Structural Priors

    Authors: M. Ben Amor, J. Stier, M. Granitzer

    Abstract: Deep learning models have been shown to be vulnerable to adversarial attacks. This perception led to analyzing deep learning models not only from the perspective of their performance measures but also their robustness to certain types of adversarial attacks. We take another step forward in relating the architectural structure of neural networks from a graph theoretic perspective to their robustnes… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Journal ref: Procedia Computer Science 192C (2021) pp. 4073-4082

  19. arXiv:2103.11471  [pdf, other

    cs.CV

    Conditional Generative Adversarial Networks for Speed Control in Trajectory Simulation

    Authors: Sahib Julka, Vishal Sowrirajan, Joerg Schloetterer, Michael Granitzer

    Abstract: Motion behaviour is driven by several factors -- goals, presence and actions of neighbouring agents, social relations, physical and social norms, the environment with its variable characteristics, and further. Most factors are not directly observable and must be modelled from context. Trajectory prediction, is thus a hard problem, and has seen increasing attention from researchers in the recent ye… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

  20. arXiv:2010.15996  [pdf, other

    astro-ph.IM cs.LG

    Lessons Learned from the 1st ARIEL Machine Learning Challenge: Correcting Transiting Exoplanet Light Curves for Stellar Spots

    Authors: Nikolaos Nikolaou, Ingo P. Waldmann, Angelos Tsiaras, Mario Morvan, Billy Edwards, Kai Hou Yip, Giovanna Tinetti, Subhajit Sarkar, James M. Dawson, Vadim Borisov, Gjergji Kasneci, Matej Petkovic, Tomaz Stepisnik, Tarek Al-Ubaidi, Rachel Louise Bailey, Michael Granitzer, Sahib Julka, Roman Kern, Patrick Ofner, Stefan Wagner, Lukas Heppe, Mirko Bunse, Katharina Morik

    Abstract: The last decade has witnessed a rapid growth of the field of exoplanet discovery and characterisation. However, several big challenges remain, many of which could be addressed using machine learning methodology. For instance, the most prolific method for detecting exoplanets and inferring several of their characteristics, transit photometry, is very sensitive to the presence of stellar spots. The… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: 20 pages, 7 figures, 2 tables, Submitted to The Astrophysics Journal (ApJ)

  21. arXiv:2010.12472  [pdf, other

    cs.CL

    HateBERT: Retraining BERT for Abusive Language Detection in English

    Authors: Tommaso Caselli, Valerio Basile, Jelena Mitrović, Michael Granitzer

    Abstract: In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language mo… ▽ More

    Submitted 4 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

  22. arXiv:2007.06985  [pdf, other

    cs.LG cs.CR stat.ML

    ADSAGE: Anomaly Detection in Sequences of Attributed Graph Edges applied to insider threat detection at fine-grained level

    Authors: Mathieu Garchery, Michael Granitzer

    Abstract: Previous works on the CERT insider threat detection case have neglected graph and text features despite their relevance to describe user behavior. Additionally, existing systems heavily rely on feature engineering and audit data aggregation to detect malicious activities. This is time consuming, requires expert knowledge and prevents tracing back alerts to precise user actions. To address these is… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  23. DeepGG: a Deep Graph Generator

    Authors: Julian Stier, Michael Granitzer

    Abstract: Learning distributions of graphs can be used for automatic drug discovery, molecular design, complex network analysis, and much more. We present an improved framework for learning generative models of graphs based on the idea of deep state machines. To learn state transition decisions we use a set of graph and node embedding techniques as memory of the state machine. Our analysis is based on lea… ▽ More

    Submitted 25 November, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: 8 pages condensed preprint with github link, under review

    Journal ref: IDA 2021: Advances in Intelligent Data Analysis XIX

  24. Investigating Extensions to Random Walk Based Graph Embedding

    Authors: Joerg Schloetterer, Martin Wehking, Fatemeh Salehi Rizi, Michael Granitzer

    Abstract: Graph embedding has recently gained momentum in the research community, in particular after the introduction of random walk and neural network based approaches. However, most of the embedding approaches focus on representing the local neighborhood of nodes and fail to capture the global graph structure, i.e. to retain the relations to distant nodes. To counter that problem, we propose a novel exte… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  25. arXiv:2002.06685  [pdf, other

    cs.SI cs.LG stat.ML

    Global and Local Feature Learning for Ego-Network Analysis

    Authors: Fatemeh Salehi Rizi, Michael Granitzer, Konstantin Ziegler

    Abstract: In an ego-network, an individual (ego) organizes its friends (alters) in different groups (social circles). This social network can be efficiently analyzed after learning representations of the ego and its alters in a low-dimensional, real vector space. These representations are then easily exploited via statistical models for tasks such as social circle detection and prediction. Recent advances i… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

  26. arXiv:2002.06665  [pdf, other

    cs.SI cs.LG stat.ML

    Predicting event attendance exploring social influence

    Authors: Fatemeh Salehi Rizi, Michael Granitzer

    Abstract: The problem of predicting people's participation in real-world events has received considerable attention as it offers valuable insights for human behavior analysis and event-related advertisement. Today social networks (e.g. Twitter) widely reflect large popular events where people discuss their interest with friends. Event participants usually stimulate friends to join the event which propagates… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

  27. Shortest path distance approximation using deep learning techniques

    Authors: Fatemeh Salehi Rizi, Joerg Schloetterer, Michael Granitzer

    Abstract: Computing shortest path distances between nodes lies at the heart of many graph algorithms and applications. Traditional exact methods such as breadth-first-search (BFS) do not scale up to contemporary, rapidly evolving today's massive networks. Therefore, it is required to find approximation methods to enable scalable graph processing with a significant speedup. In this paper, we utilize vector e… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

  28. arXiv:1912.12283  [pdf, other

    cs.SI cs.GT

    Competitive Influence Maximization: Integrating Budget Allocation and Seed Selection

    Authors: Amirhossein Ansari, Masoud Dadgar, Ali Hamzeh, Jörg Schlötterer, Michael Granitzer

    Abstract: Today, many companies take advantage of viral marketing to promote their new products, and since there are several competing companies in many markets, Competitive Influence Maximization has attracted much attention. Two categories of studies exist in the literature. First, studies that focus on which nodes from the network to select considering the existence of the opponents. Second, studies that… ▽ More

    Submitted 27 December, 2019; originally announced December 2019.

    Comments: 16 pages, 2 figure

  29. arXiv:1912.04022  [pdf, other

    cs.LG cs.CV stat.ML

    Parallel Total Variation Distance Estimation with Neural Networks for Merging Over-Clusterings

    Authors: Christian Reiser, Jörg Schlötterer, Michael Granitzer

    Abstract: We consider the initial situation where a dataset has been over-partitioned into $k$ clusters and seek a domain independent way to merge those initial clusters. We identify the total variation distance (TVD) as suitable for this goal. By exploiting the relation of the TVD to the Bayes accuracy we show how neural networks can be used to estimate TVDs between all pairs of clusters in parallel. Cruci… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

  30. arXiv:1910.08926  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Policy Learning for Malaria Control

    Authors: Van Bach Nguyen, Belaid Mohamed Karim, Bao Long Vu, Jörg Schlötterer, Michael Granitzer

    Abstract: Sequential decision making is a typical problem in reinforcement learning with plenty of algorithms to solve it. However, only a few of them can work effectively with a very small number of observations. In this report, we introduce the progress to learn the policy for Malaria Control as a Reinforcement Learning problem in the KDD Cup Challenge 2019 and propose diverse solutions to deal with the l… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

  31. Structural Analysis of Sparse Neural Networks

    Authors: Julian Stier, Michael Granitzer

    Abstract: Sparse Neural Networks regained attention due to their potential for mathematical and computational advantages. We give motivation to study Artificial Neural Networks (ANNs) from a network science perspective, provide a technique to embed arbitrary Directed Acyclic Graphs into ANNs and report study results on predicting the performance of image classifiers based on the structural properties of the… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

    Journal ref: Procedia Computer Science, 2018

  32. arXiv:1909.01185  [pdf, other

    cs.LG stat.ML

    Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

    Authors: Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Liyun He-Guelton, Olivier Caelen, Michael Granitzer, Sylvie Calabretto

    Abstract: Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn't contain a fraud (ii) The seque… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: published in the journal "future generation computer systems", in the special issue: "data exploration in the web 3.0 age"

  33. arXiv:1906.06977  [pdf, other

    cs.LG cs.AI stat.ML

    Dataset shift quantification for credit card fraud detection

    Authors: Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Sylvie Calabretto, Liyun He-Guelton, Frederic Oblé, Michael Granitzer

    Abstract: Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions datas… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: Presented at IEEE Artificial Intelligence and Knowledge Engineering (AIKE 2019)

  34. arXiv:1905.06247  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Multiple perspectives HMM-based feature engineering for credit card fraud detection

    Authors: Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Olivier Caelen, Liyun He-Guelton, Sylvie Calabretto, Michael Granitzer

    Abstract: Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this article, we model a sequence of credit card transactions from three different perspectives, namely (i) does the sequence contain a Fraud? (ii) Is the sequence obtaine… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: Presented as a poster in the conference SAC 2019: 34th ACM/SIGAPP Symposium on Applied Computing in April 2019

  35. Analysing Neural Network Topologies: a Game Theoretic Approach

    Authors: Julian Stier, Gabriele Gianini, Michael Granitzer, Konstantin Ziegler

    Abstract: Artificial Neural Networks have shown impressive success in very different application cases. Choosing a proper network architecture is a critical decision for a network's success, usually done in a manual manner. As a straightforward strategy, large, mostly fully connected architectures are selected, thereby relying on a good optimization strategy to find proper weights while at the same time avo… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

    Journal ref: Procedia Computer Science 126 (2018): 234-243

  36. arXiv:1409.1357  [pdf, other

    cs.IR

    Recommending Scientific Literature: Comparing Use-Cases and Algorithms

    Authors: Roman Kern, Kris Jack, Michael Granitzer

    Abstract: An important aspect of a researcher's activities is to find relevant and related publications. The task of a recommender system for scientific publications is to provide a list of papers that match these criteria. Based on the collection of publications managed by Mendeley, four data sets have been assembled that reflect different aspects of relatedness. Each of these relatedness scenarios reflect… ▽ More

    Submitted 4 September, 2014; originally announced September 2014.

    Comments: 12 pages, 2 figures, 5 tables

    ACM Class: H.3.3; H.3.7

  37. arXiv:1406.3188  [pdf, ps, other

    cs.IR

    Assessing the Quality of Web Content

    Authors: Elisabeth Lex, Inayat Khan, Horst Bischof, Michael Granitzer

    Abstract: This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a… ▽ More

    Submitted 12 June, 2014; originally announced June 2014.

    Comments: 4 pages, ECML/PKDD 2010 Discovery Challenge Workshop

    ACM Class: H.4; D.2.8