Skip to main content

Showing 1–16 of 16 results for author: Bogatinovski, J

  1. arXiv:2406.05354  [pdf, other

    cs.AR cs.AI cs.DC

    Investigating Memory Failure Prediction Across CPU Architectures

    Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Industry Track

  2. arXiv:2404.16446  [pdf, other

    cs.DC

    On Software Ageing Indicators in OpenStack

    Authors: Yevhen Yazvinskyi, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Distributed systems in general and cloud systems in particular, are susceptible to failures that can lead to substantial economic and data losses, security breaches, and even potential threats to human safety. Software ageing is an example of one such vulnerability. It emerges due to routine re-usage of computational systems units which induce fatigue within the components, resulting in an increas… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2212.10441  [pdf, other

    cs.DC

    First CE Matters: On the Importance of Long Term Properties on Memory Failure Prediction

    Authors: Jasmin Bogatinovski, Qiao Yu, Jorge Cardoso, Odej Kao

    Abstract: Dynamic random access memory failures are a threat to the reliability of data centres as they lead to data loss and system crashes. Timely predictions of memory failures allow for taking preventive measures such as server migration and memory replacement. Thereby, memory failure prediction prevents failures from externalizing, and it is a vital task to improve system reliability. In this paper, we… ▽ More

    Submitted 21 November, 2022; originally announced December 2022.

    Comments: This paper is accepted to appear in the proceedings of IEEE Big Data 2022. All publishing licenses belong to IEEE

  4. arXiv:2211.12757  [pdf, other

    cs.LG cs.AI cs.CY

    FAIRification of MLC data

    Authors: Ana Kostovska, Jasmin Bogatinovski, Andrej Treven, Sašo Džeroski, Dragi Kocev, Panče Panov

    Abstract: The multi-label classification (MLC) task has increasingly been receiving interest from the machine learning (ML) community, as evidenced by the growing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: This paper was accepted ECML PKDD 2022

  5. arXiv:2207.03206  [pdf, other

    cs.AI

    Leveraging Log Instructions in Log-based Anomaly Detection

    Authors: Jasmin Bogatinovski, Gjorgji Madjarov, Sasho Nedelkoski, Jorge Cardoso, Odej Kao

    Abstract: Artificial Intelligence for IT Operations (AIOps) describes the process of maintaining and operating large IT systems using diverse AI-enabled methods and tools for, e.g., anomaly detection and root cause analysis, to support the remediation, optimization, and automatic initiation of self-stabilizing IT activities. The core step of any AIOps workflow is anomaly detection, typically performed on hi… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted for publication in IEEE Service Computing Conference, 2022, Barcelona

  6. arXiv:2204.02636  [pdf, other

    cs.SE cs.LG

    Failure Identification from Unstable Log Data using Deep Learning

    Authors: Jasmin Bogatinovski, Sasho Nedelkoski, Li Wu, Jorge Cardoso, Odej Kao

    Abstract: The reliability of cloud platforms is of significant relevance because society increasingly relies on complex software systems running on the cloud. To improve it, cloud providers are automating various maintenance tasks, with failure identification frequently being considered. The precondition for automation is the availability of observability tools, with system logs commonly being used. The foc… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: This paper is accepted for publication at IEEE CCGrid 2022. For fairest citation, please use the original proceedings credentials

  7. Data-Driven Approach for Log Instruction Quality Assessment

    Authors: Jasmin Bogatinovski, Sasho Nedelkoski, Alexander Acker, Jorge Cardoso, Odej Kao

    Abstract: In the current IT world, developers write code while system operators run the code mostly as a black box. The connection between both worlds is typically established with log messages: the developer provides hints to the (unknown) operator, where the cause of an occurred issue is, and vice versa, the operator can report bugs during operation. To fulfil this purpose, developers write log instructio… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: This paper is accepted for publication at the 30th International Conference on Program Comprehension under doi: 10.1145/3524610.3527906. The copyrights are handled following the corresponding agreement between the author and publisher

  8. arXiv:2109.09537  [pdf, other

    cs.LG

    A2Log: Attentive Augmented Log Anomaly Detection

    Authors: Thorsten Wittkopp, Alexander Acker, Sasho Nedelkoski, Jasmin Bogatinovski, Dominik Scheinert, Wu Fan, Odej Kao

    Abstract: Anomaly detection becomes increasingly important for the dependability and serviceability of IT services. As log lines record events during the execution of IT services, they are a primary source for diagnostics. Thereby, unsupervised methods provide a significant benefit since not all anomalies can be known at training time. Existing unsupervised methods need anomaly examples to obtain a suitable… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: This paper has been accepted for HICSS 2022 and will appear in the conference proceedings

  9. arXiv:2106.15411  [pdf, other

    cs.LG cs.AI

    Explaining the Performance of Multi-label Classification Methods with Data Set Properties

    Authors: Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, Dragi Kocev

    Abstract: Meta learning generalizes the empirical experience with different learning tasks and holds promise for providing important empirical insight into the behaviour of machine learning algorithms. In this paper, we present a comprehensive meta-learning study of data sets and methods for multi-label classification (MLC). MLC is a practically relevant machine learning task where each example is labelled… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

  10. arXiv:2102.11570  [pdf, other

    cs.AI cs.CL cs.SE

    Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

    Authors: Harold Ott, Jasmin Bogatinovski, Alexander Acker, Sasho Nedelkoski, Odej Kao

    Abstract: Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users that communicate, compute, and store information. Therefore, timely and accurate anomaly detection is necessary for reliability, security, safe operation, and mitigation of losses in these increasingly important systems. Recently, the evolution of the software industry opens up several pro… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

  11. arXiv:2102.07113  [pdf, other

    cs.LG cs.AI cs.CC

    Comprehensive Comparative Study of Multi-Label Classification Methods

    Authors: Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, Dragi Kocev

    Abstract: Multi-label classification (MLC) has recently received increasing interest from the machine learning community. Several studies provide reviews of methods and datasets for MLC and a few provide empirical comparisons of MLC methods. However, they are limited in the number of methods and datasets considered. This work provides a comprehensive empirical study of a wide range of MLC methods on a pleth… ▽ More

    Submitted 16 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

  12. arXiv:2101.06054  [pdf, other

    cs.LG cs.SE

    Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

    Authors: Jasmin Bogatinovski, Sasho Nedelkoski, Alexander Acker, Florian Schmidt, Thorsten Wittkopp, Soeren Becker, Jorge Cardoso, Odej Kao

    Abstract: Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between the research areas of machine learning, big data, streaming analytics, and the management of IT operations. AIOps, as a field, is a candidate to produce the future standard for IT operation management. To that end, AIOps has several challenges. First, it needs to combine sep… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

    Comments: 8 pages, white paper for the AIOPS 2020 workshop at ICSOC 2020

  13. arXiv:2101.04977  [pdf, other

    cs.LG cs.DC cs.SC cs.SE

    Multi-Source Anomaly Detection in Distributed IT Systems

    Authors: Jasmin Bogatinovski, Sasho Nedelkoski

    Abstract: The multi-source data generated by distributed systems, provide a holistic description of the system. Harnessing the joint distribution of the different modalities by a learning model can be beneficial for critical applications for maintenance of the distributed systems. One such important task is the task of anomaly detection where we are interested in detecting the deviation of the current behav… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: 12 pages. Presented at AIOPS 2020 workshop

  14. arXiv:2008.09340  [pdf, other

    cs.LG cs.IR stat.ML

    Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs

    Authors: Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao

    Abstract: The detection of anomalies is essential mining task for the security and reliability in computer systems. Logs are a common and major data source for anomaly detection methods in almost every computer system. They collect a range of significant events describing the runtime system status. Recent studies have focused predominantly on one-class deep learning methods on predefined non-learnable numer… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

    Comments: 11 pages, 8 figures, Accepted at ICDM 2020: 20th IEEE International Conference on Data Mining

  15. arXiv:2007.03568  [pdf, other

    cs.LG eess.SY stat.ML

    Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction

    Authors: Alexander Acker, Thorsten Wittkopp, Sasho Nedelkoski, Jasmin Bogatinovski, Odej Kao

    Abstract: The rapid growth and distribution of IT systems increases their complexity and aggravates operation and maintenance. To sustain control over large sets of hosts and the connecting networks, monitoring solutions are employed and constantly enhanced. They collect diverse key performance indicators (KPIs) (e.g. CPU utilization, allocated memory, etc.) and provide detailed information about the system… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

  16. arXiv:2003.07905  [pdf, other

    cs.LG cs.SE

    Self-Supervised Log Parsing

    Authors: Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao

    Abstract: Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-stru… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.