-
ESRO: Experience Assisted Service Reliability against Outages
Authors:
Sarthak Chakraborty,
Shubham Agarwal,
Shaddy Garg,
Abhimanyu Sethia,
Udit Narayan Pandey,
Videh Aggarwal,
Shiv Saini
Abstract:
Modern cloud services are prone to failures due to their complex architecture, making diagnosis a critical process. Site Reliability Engineers (SREs) spend hours leveraging multiple sources of data, including the alerts, error logs, and domain expertise through past experiences to locate the root cause(s). These experiences are documented as natural language text in outage reports for previous out…
▽ More
Modern cloud services are prone to failures due to their complex architecture, making diagnosis a critical process. Site Reliability Engineers (SREs) spend hours leveraging multiple sources of data, including the alerts, error logs, and domain expertise through past experiences to locate the root cause(s). These experiences are documented as natural language text in outage reports for previous outages. However, utilizing the raw yet rich semi-structured information in the reports systematically is time-consuming. Structured information, on the other hand, such as alerts that are often used during fault diagnosis, is voluminous and requires expert knowledge to discern. Several strategies have been proposed to use each source of data separately for root cause analysis. In this work, we build a diagnostic service called ESRO that recommends root causes and remediation for failures by utilizing structured as well as semi-structured sources of data systematically. ESRO constructs a causal graph using alerts and a knowledge graph using outage reports, and merges them in a novel way to form a unified graph during training. A retrieval-based mechanism is then used to search the unified graph and rank the likely root causes and remediation techniques based on the alerts fired during an outage at inference time. Not only the individual alerts, but their respective importance in predicting an outage group is taken into account during recommendation. We evaluated our model on several cloud service outages of a large SaaS enterprise over the course of ~2 years, and obtained an average improvement of 27% in rouge scores after comparing the likely root causes against the ground truth over state-of-the-art baselines. We further establish the effectiveness of ESRO through qualitative analysis on multiple real outage examples.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
ScriptWorld: Text Based Environment For Learning Procedural Knowledge
Authors:
Abhinav Joshi,
Areeb Ahmad,
Umang Pandey,
Ashutosh Modi
Abstract:
Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agent…
▽ More
Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorld
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Data Mining Application to Attract Students in HEI
Authors:
Umesh Kumar Pandey,
Surjeet Kumar Yadav,
Saurabh Pal
Abstract:
In the last two decades, number of Higher Education Institutions (HEI) grows in leaps and bounds. This causes a cut throat competition among these institutions while attracting the student get admission in these institutions. To make reach up to the students institution makes effort of advertisement. Similarly developing and developed both type of institution launch several services also to attrac…
▽ More
In the last two decades, number of Higher Education Institutions (HEI) grows in leaps and bounds. This causes a cut throat competition among these institutions while attracting the student get admission in these institutions. To make reach up to the students institution makes effort of advertisement. Similarly developing and developed both type of institution launch several services also to attract students. Most of the institutions are opened in self finance mode. So all time they feel short hand in expenditure. Now a day a number of advertisement methods are available. So it is difficult for an institution to make advertisement through all modes and launch all services at the same time due to different constraints. In this paper we use support and confidence method to find out the best way of advertisement.
△ Less
Submitted 17 June, 2012;
originally announced June 2012.
-
Data Mining as a Torch Bearer in Education Sector
Authors:
Umesh Kumar Pandey,
Brijesh Kumar Bhardwaj,
Saurabh pal
Abstract:
Every data has a lot of hidden information. The processing method of data decides what type of information data produce. In India education sector has a lot of data that can produce valuable information. This information can be used to increase the quality of education. But educational institution does not use any knowledge discovery process approach on these data. Information and communication te…
▽ More
Every data has a lot of hidden information. The processing method of data decides what type of information data produce. In India education sector has a lot of data that can produce valuable information. This information can be used to increase the quality of education. But educational institution does not use any knowledge discovery process approach on these data. Information and communication technology puts its leg into the education sector to capture and compile low cost information. Now a day a new research community, educational data mining (EDM), is growing which is intersection of data mining and pedagogy. In this paper we present roadmap of research done in EDM in various segment of education sector.
△ Less
Submitted 24 January, 2012;
originally announced January 2012.
-
A Data Mining view on Class Room Teaching Language
Authors:
Umesh Kumar Pandey,
Saurabh Pal
Abstract:
From ancient period in India, educational institution embarked to use class room teaching. Where a teacher explains the material and students understand and learn the lesson. There is no absolute scale for measuring knowledge but examination score is one scale which shows the performance indicator of students. So it is important that appropriate material is taught but it is vital that while teachi…
▽ More
From ancient period in India, educational institution embarked to use class room teaching. Where a teacher explains the material and students understand and learn the lesson. There is no absolute scale for measuring knowledge but examination score is one scale which shows the performance indicator of students. So it is important that appropriate material is taught but it is vital that while teaching which language is chosen, class notes must be prepared and attendance. This study analyses the impact of language on the presence of students in class room. The main idea is to find out the support, confidence and interestingness level for appropriate language and attendance in the classroom. For this purpose association rule is used.
△ Less
Submitted 20 April, 2011;
originally announced April 2011.
-
Data Mining : A prediction of performer or underperformer using classification
Authors:
Umesh Kumar Pandey,
Saurabh Pal
Abstract:
Now a day's students have a large set of data having precious information hidden. Data mining technique can help to find this hidden information. In this paper, data mining techniques name Byes classification method is used on these data to help an institution. Institutions can find those students who are consistently perform well. This study will help to institution reduce the drop put ratio to a…
▽ More
Now a day's students have a large set of data having precious information hidden. Data mining technique can help to find this hidden information. In this paper, data mining techniques name Byes classification method is used on these data to help an institution. Institutions can find those students who are consistently perform well. This study will help to institution reduce the drop put ratio to a significant level and improve the performance level of the institution.
△ Less
Submitted 20 April, 2011;
originally announced April 2011.