subscribe to arXiv mailings

Recording provenance of workflow runs with RO-Crate

Authors: Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno P. Kinoshita, Stian Soiland-Reyes

Abstract: Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to… ▽ More Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain. A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.10368989 △ Less

Submitted 16 July, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 38 pages, 5 figures, 3 tables. Resubmitted to PLOS ONE following peer review

arXiv:2305.03107 [pdf, other]

Homomorphisms between graphs embedded on surfaces

Authors: Delia Garijo, Andrew Goodall, Lluís Vena

Abstract: We extend the notion of graph homomorphism to cellularly embedded graphs (maps) by designing operations on vertices and edges that respect the surface topology; we thus obtain the first definition of map homomorphism that preserves both the combinatorial structure (as a graph homomorphism) and the topological structure of the surface (in particular, orientability and genus). Notions such as the co… ▽ More We extend the notion of graph homomorphism to cellularly embedded graphs (maps) by designing operations on vertices and edges that respect the surface topology; we thus obtain the first definition of map homomorphism that preserves both the combinatorial structure (as a graph homomorphism) and the topological structure of the surface (in particular, orientability and genus). Notions such as the core of a graph and the homomorphism order on cores are then extended to maps. We also develop a purely combinatorial framework for various topological features of a map such as the contractibility of closed walks, which in particular allows us to characterize map cores. We then show that the poset of map cores ordered by the existence of a homomorphism is connected and, in contrast to graph homomorphisms, does not contain any dense interval (so it is not universal for countable posets). Finally, we give examples of a pair of cores with an infinite number of cores between them, an infinite chain of gaps, and arbitrarily large antichains with a common homomorphic image. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 46 pages, 11 figures

arXiv:2304.00019 [pdf, other]

doi 10.5281/zenodo.7750670

Workflows Community Summit 2022: A Roadmap Revolution

Authors: Rafael Ferreira da Silva, Rosa M. Badia, Venkat Bala, Debbie Bard, Peer-Timo Bremer, Ian Buckley, Silvina Caino-Lores, Kyle Chard, Carole Goble, Shantenu Jha, Daniel S. Katz, Daniel Laney, Manish Parashar, Frederic Suter, Nick Tyler, Thomas Uram, Ilkay Altintas, Stefan Andersson, William Arndt, Juan Aznar, Jonathan Bader, Bartosz Balis, Chris Blanton, Kelly Rosa Braghetto, Aharon Brodutch , et al. (80 additional authors not shown)

Abstract: Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t… ▽ More Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Report number: ORNL/TM-2023/2885

arXiv:2201.12650 [pdf, other]

New results on the robust coloring problem

Authors: Delia Garijo, Alberto Márquez, Rafael Robles

Abstract: Many variations of the classical graph coloring model have been intensively studied due to their multiple applications; scheduling problems and aircraft assignments, for instance, motivate the robust coloring problem. This model gets to capture natural constraints of those optimization problems by combining the information provided by two colorings: a vertex coloring of a graph and the induced edg… ▽ More Many variations of the classical graph coloring model have been intensively studied due to their multiple applications; scheduling problems and aircraft assignments, for instance, motivate the robust coloring problem. This model gets to capture natural constraints of those optimization problems by combining the information provided by two colorings: a vertex coloring of a graph and the induced edge coloring on a subgraph of its complement; the goal is to minimize, among all proper colorings of the graph for a fixed number of colors, the number of edges in the subgraph with the endpoints of the same color. The study of the robust coloring model has been focused on the search for heuristics due to its NP-hard character when using at least three colors, but little progress has been made in other directions. We present a new approach on the problem obtaining the first collection of non-heuristic results for general graphs; among them, we prove that robust coloring is the model that better approaches the equitable partition of the vertex set, even when the graph does not admit a so-called \emph{equitable coloring}. We also show the NP-completeness of its decision problem for the unsolved case of two colors, obtain bounds on the associated robust coloring parameter, and solve a conjecture on paths that illustrates the complexity of studying this coloring model. △ Less

Submitted 16 May, 2023; v1 submitted 29 January, 2022; originally announced January 2022.

MSC Class: 05C15; 68R10

arXiv:2110.02168 [pdf, ps, other]

doi 10.1109/WORKS54523.2021.00016

A Community Roadmap for Scientific Workflows Research and Development

Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Ilkay Altintas, Rosa M Badia, Bartosz Balis, Tainã Coleman, Frederik Coppens, Frank Di Natale, Bjoern Enders, Thomas Fahringer, Rosa Filgueira, Grigori Fursin, Daniel Garijo, Carole Goble, Dorran Howell, Shantenu Jha, Daniel S. Katz, Daniel Laney, Ulf Leser, Maciej Malawski, Kshitij Mehta, Loïc Pottier, Jonathan Ozik, J. Luc Peterson , et al. (4 additional authors not shown)

Abstract: The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows… ▽ More The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual "Workflows Community Summits" (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes. △ Less

Submitted 8 October, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.09181

arXiv:2108.07119 [pdf, ps, other]

Creating and Querying Personalized Versions of Wikidata on a Laptop

Authors: Hans Chalupsky, Pedro Szekely, Filip Ilievski, Daniel Garijo, Kartik Shenoy

Abstract: Application developers today have three choices for exploiting the knowledge present in Wikidata: they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such as retrieval of large amounts of data or aggregati… ▽ More Application developers today have three choices for exploiting the knowledge present in Wikidata: they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such as retrieval of large amounts of data or aggregations over large fractions of Wikidata. This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop. We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop, combining data from external resources such as DBpedia. The Kypher queries for these use cases run much faster on a laptop than the equivalent SPARQL queries on a Wikidata clone running on a powerful server with 24h time-out limits. △ Less

Submitted 18 August, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

ACM Class: H.3.3; I.2

arXiv:2108.06503 [pdf]

doi 10.3233/DS-210053

Packaging research artefacts with RO-Crate

Authors: Stian Soiland-Reyes, Peter Sefton, Mercè Crosas, Leyla Jael Castro, Frederik Coppens, José M. Fernández, Daniel Garijo, Björn Grüning, Marco La Rosa, Simone Leo, Eoghan Ó Carragáin, Marc Portier, Ana Trisovic, RO-Crate Community, Paul Groth, Carole Goble

Abstract: An increasing number of researchers support reproducibility by including pointers to and descriptions of datasets, software and methods in their publications. However, scientific articles may be ambiguous, incomplete and difficult to process by automated systems. In this paper we introduce RO-Crate, an open, community-driven, and lightweight approach to packaging research artefacts along with thei… ▽ More An increasing number of researchers support reproducibility by including pointers to and descriptions of datasets, software and methods in their publications. However, scientific articles may be ambiguous, incomplete and difficult to process by automated systems. In this paper we introduce RO-Crate, an open, community-driven, and lightweight approach to packaging research artefacts along with their metadata in a machine readable manner. RO-Crate is based on Schema$.$org annotations in JSON-LD, aiming to establish best practices to formally describe metadata in an accessible and practical way for their use in a wide variety of situations. An RO-Crate is a structured archive of all the items that contributed to a research outcome, including their identifiers, provenance, relations and annotations. As a general purpose packaging approach for data and their metadata, RO-Crate is used across multiple areas, including bioinformatics, digital humanities and regulatory sciences. By applying "just enough" Linked Data standards, RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility. An RO-Crate for this article is available at https://w3id.org/ro/doi/10.5281/zenodo.5146227 △ Less

Submitted 6 December, 2021; v1 submitted 14 August, 2021; originally announced August 2021.

Comments: 44 pages. Accepted for Data Science

ACM Class: H.1.1; H.3.2

Journal ref: Data Science 2022

arXiv:2107.00156 [pdf, other]

A Study of the Quality of Wikidata

Authors: Kartik Shenoy, Filip Ilievski, Daniel Garijo, Daniel Schwabe, Pedro Szekely

Abstract: Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indicators of data quality in Wikidata, based on: 1) co… ▽ More Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indicators of data quality in Wikidata, based on: 1) community consensus on the currently recorded knowledge, assuming that statements that have been removed and not added back are implicitly agreed to be of low quality; 2) statements that have been deprecated; and 3) constraint violations in the data. We combine these indicators to detect low-quality statements, revealing challenges with duplicate entities, missing triples, violated type rules, and taxonomic distinctions. Our findings complement ongoing efforts by the Wikidata community to improve data quality, aiming to make it easier for users and editors to find and correct mistakes. △ Less

Submitted 18 November, 2021; v1 submitted 30 June, 2021; originally announced July 2021.

Comments: 12 pages

Journal ref: Journal of Web Semantics, Special issue on Community-Based Knowledge Bases, 2021

arXiv:2106.05177 [pdf, other]

doi 10.5281/zenodo.4915801

Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Tainã Coleman, Dan Laney, Dong Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reys, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caino-Lores, Scott Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu M. A. Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux , et al. (33 additional authors not shown)

Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i… ▽ More Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moore's computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: https://workflowsri.org/summits/technical △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2103.11676 [pdf, other]

Continuous mean distance of a weighted graph

Authors: Delia Garijo, Alberto Márquez, Rodrigo I. Silveira

Abstract: We study the concept of the continuous mean distance of a weighted graph. For connected unweighted graphs, the mean distance can be defined as the arithmetic mean of the distances between all pairs of vertices. This parameter provides a natural measure of the compactness of the graph, and has been intensively studied, together with several variants, including its version for weighted graphs. The c… ▽ More We study the concept of the continuous mean distance of a weighted graph. For connected unweighted graphs, the mean distance can be defined as the arithmetic mean of the distances between all pairs of vertices. This parameter provides a natural measure of the compactness of the graph, and has been intensively studied, together with several variants, including its version for weighted graphs. The continuous analog of the (discrete) mean distance is the mean of the distances between all pairs of points on the edges of the graph. Despite being a very natural generalization, to the best of our knowledge this concept has been barely studied, since the jump from discrete to continuous implies having to deal with an infinite number of distances, something that increases the difficulty of the parameter. In this paper we show that the continuous mean distance of a weighted graph can be computed in time quadratic in the number of edges, by two different methods that apply fundamental concepts in discrete algorithms and computational geometry. We also present structural results that allow a faster computation of this continuous parameter for several classes of weighted graphs. Finally, we study the relation between the (discrete) mean distance and its continuous counterpart, mainly focusing on the relevant question of the convergence when iteratively subdividing the edges of the weighted graph. △ Less

Submitted 13 January, 2023; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: Revised version

arXiv:2012.13117 [pdf, other]

Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

Authors: Task Force on Best Practices for Software Registries, :, Alain Monteil, Alejandra Gonzalez-Beltran, Alexandros Ioannidis, Alice Allen, Allen Lee, Anita Bandrowski, Bruce E. Wilson, Bryce Mecum, Cai Fan Du, Carly Robinson, Daniel Garijo, Daniel S. Katz, David Long, Genevieve Milliken, Hervé Ménager, Jessica Hausman, Jurriaan H. Spaaks, Katrina Fenlon, Kristin Vanderbilt, Lorraine Hwang, Lynn Davis, Martin Fenner, Michael R. Crusoe , et al. (8 additional authors not shown)

Abstract: Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, developing these r… ▽ More Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, developing these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines. △ Less

Submitted 24 December, 2020; originally announced December 2020.

Comments: 18 pages

arXiv:2009.10263 [pdf, other]

Semantic Workflows and Machine Learning for the Assessment of Carbon Storage by Urban Trees

Authors: Juan Carrillo, Daniel Garijo, Mark Crowley, Rober Carrillo, Yolanda Gil, Katherine Borda

Abstract: Climate science is critical for understanding both the causes and consequences of changes in global temperatures and has become imperative for decisive policy-making. However, climate science studies commonly require addressing complex interoperability issues between data, software, and experimental approaches from multiple fields. Scientific workflow systems provide unparalleled advantages to add… ▽ More Climate science is critical for understanding both the causes and consequences of changes in global temperatures and has become imperative for decisive policy-making. However, climate science studies commonly require addressing complex interoperability issues between data, software, and experimental approaches from multiple fields. Scientific workflow systems provide unparalleled advantages to address these issues, including reproducibility of experiments, provenance capture, software reusability and knowledge sharing. In this paper, we introduce a novel workflow with a series of connected components to perform spatial data preparation, classification of satellite imagery with machine learning algorithms, and assessment of carbon stored by urban trees. To the best of our knowledge, this is the first study that estimates carbon storage for a region in Africa following the guidelines from the Intergovernmental Panel on Climate Change (IPCC). △ Less

Submitted 21 September, 2020; originally announced September 2020.

Comments: Previously published as part of the SciKnow 2019 Workshop, November 19th, 2019. Los Angeles, California, USA. Collocated with the tenth International Conference on Knowledge Capture (K-CAP)

Journal ref: Proceedings of the Third International Workshop on Capturing Scientific Knowledge co-located with the 10th International Conference on Knowledge Capture (K-CAP 2019)

arXiv:2007.09206 [pdf, other]

OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs

Authors: Daniel Garijo, Maximiliano Osorio

Abstract: In recent years, Semantic Web technologies have been increasingly adopted by researchers, industry and public institutions to describe and link data on the Web, create web annotations and consume large knowledge graphs like Wikidata and DBPedia. However, there is still a knowledge gap between ontology engineers, who design, populate and create knowledge graphs; and web developers, who need to unde… ▽ More In recent years, Semantic Web technologies have been increasingly adopted by researchers, industry and public institutions to describe and link data on the Web, create web annotations and consume large knowledge graphs like Wikidata and DBPedia. However, there is still a knowledge gap between ontology engineers, who design, populate and create knowledge graphs; and web developers, who need to understand, access and query these knowledge graphs but are not familiar with ontologies, RDF or SPARQL. In this paper we describe the Ontology-Based APIs framework (OBA), our approach to automatically create REST APIs from ontologies while following RESTful API best practices. Given an ontology (or ontology network) OBA uses standard technologies familiar to web developers (OpenAPI Specification, JSON) and combines them with W3C standards (OWL, JSON-LD frames and SPARQL) to create maintainable APIs with documentation, units tests, automated validation of resources and clients (in Python, Javascript, etc.) for non Semantic Web experts to access the contents of a target knowledge graph. We showcase OBA with three examples that illustrate the capabilities of the framework for different ontologies. △ Less

Submitted 17 July, 2020; originally announced July 2020.

ACM Class: D.2.12

arXiv:2006.00088 [pdf, other]

KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis

Authors: Filip Ilievski, Daniel Garijo, Hans Chalupsky, Naren Teja Divvala, Yixiang Yao, Craig Rogers, Rongpeng Li, Jun Liu, Amandeep Singh, Daniel Schwabe, Pedro Szekely

Abstract: Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper we… ▽ More Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper we present KGTK, a data science-centric toolkit designed to represent, create, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applications, enabling a wide audience of developers to easily construct knowledge graph pipelines for their applications. We illustrate the framework with real-world scenarios where we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet. △ Less

Submitted 26 May, 2021; v1 submitted 29 May, 2020; originally announced June 2020.

Comments: 16 pages

arXiv:2003.13084 [pdf, ps, other]

Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web

Authors: Daniel Garijo, María Poveda-Villalón

Abstract: With the adoption of Semantic Web technologies, an increasing number of vocabularies and ontologies have been developed in different domains, ranging from Biology to Agronomy or Geosciences. However, many of these ontologies are still difficult to find, access and understand by researchers due to a lack of documentation, URI resolving issues, versioning problems, etc. In this chapter we describe g… ▽ More With the adoption of Semantic Web technologies, an increasing number of vocabularies and ontologies have been developed in different domains, ranging from Biology to Agronomy or Geosciences. However, many of these ontologies are still difficult to find, access and understand by researchers due to a lack of documentation, URI resolving issues, versioning problems, etc. In this chapter we describe guidelines and best practices for creating accessible, understandable and reusable ontologies on the Web, using standard practices and pointing to existing tools and frameworks developed by the Semantic Web community. We illustrate our guidelines with concrete examples, in order to help researchers implement these practices in their future vocabularies. △ Less

Submitted 29 March, 2020; originally announced March 2020.

Comments: 16 pages, 4 figures

ACM Class: I.2; I.2.4

arXiv:1807.10093 [pdf, other]

Computing optimal shortcuts for networks

Authors: Delia Garijo, Alberto Márquez, Natalia Rodríguez, Rodrigo I. Silveira

Abstract: We study augmenting a plane Euclidean network with a segment, called a shortcut, to minimize the largest distance between any two points along the edges of the resulting network. Problems of this type have received considerable attention recently, mostly for discrete variants of the problem. We consider a fully continuous setting, where the problem of computing distances and placing a shortcut i… ▽ More We study augmenting a plane Euclidean network with a segment, called a shortcut, to minimize the largest distance between any two points along the edges of the resulting network. Problems of this type have received considerable attention recently, mostly for discrete variants of the problem. We consider a fully continuous setting, where the problem of computing distances and placing a shortcut is much harder as all points on the network, instead of only the vertices, must be taken into account. We present the first results on the computation of optimal shortcuts for general networks in this model: a polynomial time algorithm and a discretization of the problem that leads to an approximation algorithm. We also improve the general method for networks that are paths, restricted to two types of shortcuts: those with a fixed orientation and simple shortcuts. △ Less

Submitted 26 July, 2018; originally announced July 2018.

Comments: 21 pages

arXiv:1703.04329 [pdf, other]

Stabbing segments with rectilinear objects

Authors: Mercè Claverol, Delia Garijo, Matias Korman, Carlos Seara, Rodrigo I. Silveira

Abstract: Given a set $S$ of $n$ line segments in the plane, we say that a region $\mathcal{R}\subseteq \mathbb{R}^2$ is a {\em stabber} for $S$ if $\mathcal{R}$ contains exactly one endpoint of each segment of $S$. In this paper we provide optimal or near-optimal algorithms for reporting all combinatorially different stabbers for several shapes of stabbers. Specifically, we consider the case in which the s… ▽ More Given a set $S$ of $n$ line segments in the plane, we say that a region $\mathcal{R}\subseteq \mathbb{R}^2$ is a {\em stabber} for $S$ if $\mathcal{R}$ contains exactly one endpoint of each segment of $S$. In this paper we provide optimal or near-optimal algorithms for reporting all combinatorially different stabbers for several shapes of stabbers. Specifically, we consider the case in which the stabber can be described as the intersection of axis-parallel halfplanes (thus the stabbers are halfplanes, strips, quadrants, $3$-sided rectangles, or rectangles). The running times are $O(n)$ (for the halfplane case), $O(n\log n)$ (for strips, quadrants, and 3-sided rectangles), and $O(n^2 \log n)$ (for rectangles). △ Less

Submitted 13 March, 2017; originally announced March 2017.

arXiv:1603.06764 [pdf, other]

On Hamiltonian alternating cycles and paths

Authors: Mercè Claverol, Alfredo García, Delia Garijo, Carlos Seara, Javier Tejel

Abstract: We undertake a study on computing Hamiltonian alternating cycles and paths on bicolored point sets. This has been an intensively studied problem, not always with a solution, when the paths and cycles are also required to be plane. In this paper, we relax the constraint on the cycles and paths from being plane to being 1-plane, and deal with the same type of questions as those for the plane case, o… ▽ More We undertake a study on computing Hamiltonian alternating cycles and paths on bicolored point sets. This has been an intensively studied problem, not always with a solution, when the paths and cycles are also required to be plane. In this paper, we relax the constraint on the cycles and paths from being plane to being 1-plane, and deal with the same type of questions as those for the plane case, obtaining a remarkable variety of results. Among them, we prove that a 1-plane Hamiltonian alternating cycle on a bicolored point set in general position can always be obtained, and that when the point set is in convex position, every Hamiltonian alternating cycle with minimum number of crossings is 1-plane. Further, for point sets in convex position, we provide $O(n)$ and $O(n^2)$ time algorithms for computing, respectively, Hamiltonian alternating cycles and paths with minimum number of crossings. △ Less

Submitted 31 March, 2017; v1 submitted 22 March, 2016; originally announced March 2016.

arXiv:1401.4307 [pdf, other]

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

Authors: Khalid Belhajjame, Jun Zhao, Daniel Garijo, Kristina Hettne, Raul Palma, Óscar Corcho, José-Manuel Gómez-Pérez, Sean Bechhofer, Graham Klyne, Carole Goble

Abstract: Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVers… ▽ More Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption. △ Less

Submitted 3 February, 2014; v1 submitted 17 January, 2014; originally announced January 2014.

Comments: 20 pages

Showing 1–19 of 19 results for author: Garijo, D