ProvLet: A Provenance Management Service for Long Tail Microscopy Data
Authors:
Hessam Moeini,
Todd Nicholson,
Klara Nahrstedt,
Gianni Pezzarossi
Abstract:
Provenance management must be present to enhance the overall security and reliability of long-tail microscopy (LTM) data management systems. However, there are challenges in provenance for domains with LTM data. The provenance data need to be collected more frequently, which increases system overheads (in terms of computation and storage) and results in scalability issues. Moreover, in most scient…
▽ More
Provenance management must be present to enhance the overall security and reliability of long-tail microscopy (LTM) data management systems. However, there are challenges in provenance for domains with LTM data. The provenance data need to be collected more frequently, which increases system overheads (in terms of computation and storage) and results in scalability issues. Moreover, in most scientific application domains a provenance solution must consider network-related events as well. Therefore, provenance data in LTM data management systems are highly diverse and must be organized and processed carefully. In this paper, we introduce a novel provenance service, called ProvLet, to collect, distribute, analyze, and visualize provenance data in LTM data management systems. This means (1) we address how to filter and store the desired transactions on disk; (2) we consider a data organization model at higher level data abstractions, suitable for step-by-step scientific experiments, such as datasets and collections, and develop provenance algorithms over these data abstractions, rather than solutions considering low-level abstractions such as files and folders. (3) We utilize ProvLet's log files and visualize provenance information for further forensics explorations. The validation of ProvLet with actual long tail microscopy data, collected over a period of six years, shows a provenance service that yields a low system overhead and enables scalability.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
Summarization in Semantic Based Service Discovery in Dynamic IoT-Edge Networks
Authors:
Hessam Moeini,
I-Ling Yen,
Farokh Bastani
Abstract:
In the last decade, many semantic-based routing protocols had been designed for peer-to-peer systems. However, they are not suitable for IoT systems, mainly due to their high demands in memory and computing power which are not available in many IoT devices. In this paper, we develop a semantic-based routing protocol for dynamic IoT systems to facilitate dynamic IoT capability discovery and composi…
▽ More
In the last decade, many semantic-based routing protocols had been designed for peer-to-peer systems. However, they are not suitable for IoT systems, mainly due to their high demands in memory and computing power which are not available in many IoT devices. In this paper, we develop a semantic-based routing protocol for dynamic IoT systems to facilitate dynamic IoT capability discovery and composition. Our protocol is a fully decentralized routing protocol. To reduce the space requirement for routing, each node maintains a summarized routing table. We design an ontology-based summarization algorithm to smartly group similar capabilities in the routing tables and support adaptive routing table compression. We also design an ontology coding scheme to code keywords used in the routing tables and query messages. To complete the summarization scheme, we consider the metrics for choosing the summarization candidates in an overflowing routing table. Some of these metrics are novel and are difficult to measure, such as coverage and stability. Our solutions significantly reduce the routing table size, ensuring that the routing table size can be bounded by the available memory of the IoT devices, while supporting efficient IoT capability lookup. Experimental results show that our approach can yield significantly lower network traffic and memory requirement for IoT capability lookup when compared with existing semantic-based routing algorithms including a centralized solution, a DHT-based approach, a controlled flooding scheme, and a cache-based solution.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.