subscribe to arXiv mailings

Network Representation and Modular Decomposition of Combinatorial Structures: A Galled-Tree Perspective

Authors: Anna Lindeberg, Guillaume E. Scholz, Marc Hellmuth

Abstract: In phylogenetics, reconstructing rooted trees from distances between taxa is a common task. Böcker and Dress generalized this concept by introducing symbolic dated maps $δ:X \times X \to Υ$, where distances are replaced by symbols, and showed that there is a one-to-one correspondence between symbolic ultrametrics and labeled rooted phylogenetic trees. Many combinatorial structures fall under the u… ▽ More In phylogenetics, reconstructing rooted trees from distances between taxa is a common task. Böcker and Dress generalized this concept by introducing symbolic dated maps $δ:X \times X \to Υ$, where distances are replaced by symbols, and showed that there is a one-to-one correspondence between symbolic ultrametrics and labeled rooted phylogenetic trees. Many combinatorial structures fall under the umbrella of symbolic dated maps, such as 2-dissimilarities, symmetric labeled 2-structures, or edge-colored complete graphs, and are here referred to as strudigrams. Strudigrams have a unique decomposition into non-overlapping modules, which can be represented by a modular decomposition tree (MDT). In the absence of prime modules, strudigrams are equivalent to symbolic ultrametrics, and the MDT fully captures the relationships $δ(x,y)$ between pairs of vertices $x,y$ in $X$ through the label of their least common ancestor in the MDT. However, in the presence of prime vertices, this information is generally hidden. To provide this missing structural information, we aim to locally replace the prime vertices in the MDT to obtain networks that capture full information about the strudigrams. While starting with the general framework of prime-vertex replacement networks, we then focus on a specific type of such networks obtained by replacing prime vertices with so-called galls, resulting in labeled galled-trees. We introduce the concept of galled-tree explainable (GATEX) strudigrams, provide their characterization, and demonstrate that recognizing these structures and reconstructing the labeled networks that explain them can be achieved in polynomial time. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2311.18284 [pdf, other]

The Complement of the Djokovic-Winkler Relation

Authors: Marc Hellmuth, Bruno J. Schmidt, Guillaume E. Scholz, Sandhya Thekkumpadan Puthiyaveedu

Abstract: The Djoković-Winkler relation $Θ$ is a binary relation defined on the edge set of a given graph that is based on the distances of certain vertices and which plays a prominent role in graph theory. In this paper, we explore the relatively uncharted ``reflexive complement'' $\overlineΘ$ of $Θ$, where $(e,f)\in \overlineΘ$ if and only if $e=f$ or $(e,f)\notin Θ$ for edges $e$ and $f$. We establish th… ▽ More The Djoković-Winkler relation $Θ$ is a binary relation defined on the edge set of a given graph that is based on the distances of certain vertices and which plays a prominent role in graph theory. In this paper, we explore the relatively uncharted ``reflexive complement'' $\overlineΘ$ of $Θ$, where $(e,f)\in \overlineΘ$ if and only if $e=f$ or $(e,f)\notin Θ$ for edges $e$ and $f$. We establish the relationship between $\overlineΘ$ and the set $Δ_{ef}$, comprising the distances between the vertices of $e$ and $f$ and shed some light on the intricacies of its transitive closure $\overlineΘ^*$. Notably, we demonstrate that $\overlineΘ^*$ exhibits multiple equivalence classes only within a restricted subclass of complete multipartite graphs. In addition, we characterize non-trivial relations $R$ that coincide with $\overlineΘ$ as those where the graph representation is disconnected, with each connected component being the (join of) Cartesian product of complete graphs. The latter results imply, somewhat surprisingly, that knowledge about the distances between vertices is not required to determine $\overlineΘ^*$. Moreover, $\overlineΘ^*$ has either exactly one or three equivalence classes. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2308.06139 [pdf, other]

Shared ancestry graphs and symbolic arboreal maps

Authors: Katharina T. Huber, Vincent Moulton, Guillaume E. Scholz

Abstract: A network $N$ on a finite set $X$, $|X|\geq 2$, is a connected directed acyclic graph with leaf set $X$ in which every root in $N$ has outdegree at least 2 and no vertex in $N$ has indegree and outdegree equal to 1; $N$ is arboreal if the underlying unrooted, undirected graph of $N$ is a tree. Networks are of interest in evolutionary biology since they are used, for example, to represent the evolu… ▽ More A network $N$ on a finite set $X$, $|X|\geq 2$, is a connected directed acyclic graph with leaf set $X$ in which every root in $N$ has outdegree at least 2 and no vertex in $N$ has indegree and outdegree equal to 1; $N$ is arboreal if the underlying unrooted, undirected graph of $N$ is a tree. Networks are of interest in evolutionary biology since they are used, for example, to represent the evolutionary history of a set $X$ of species whose ancestors have exchanged genes in the past. For $M$ some arbitrary set of symbols, $d:{X \choose 2} \to M \cup \{\odot\}$ is a symbolic arboreal map if there exists some arboreal network $N$ whose vertices with outdegree two or more are labelled by elements in $M$ and so that $d(\{x,y\})$, $\{x,y\} \in {X \choose 2}$, is equal to the label of the least common ancestor of $x$ and $y$ in $N$ if this exists and $\odot$ else. Important examples of symbolic arboreal maps include the symbolic ultrametrics, which arise in areas such as game theory, phylogenetics and cograph theory. In this paper we show that a map $d:{X \choose 2} \to M \cup \{\odot\}$ is a symbolic arboreal map if and only if $d$ satisfies certain 3- and 4-point conditions and the graph with vertex set $X$ and edge set consisting of those pairs $\{x,y\} \in {X \choose 2}$ with $d(\{x,y\}) \neq \odot$ is Ptolemaic. To do this, we introduce and prove a key theorem concerning the shared ancestry graph for a network $N$ on $X$, where this is the graph with vertex set $X$ and edge set consisting of those $\{x,y\} \in {X \choose 2}$ such that $x$ and $y$ share a common ancestor in $N$. In particular, we show that for any connected graph $G$ with vertex set $X$ and edge clique cover $K$ in which there are no two distinct sets in $K$ with one a subset of the other, there is some network with $|K|$ roots and leaf set $X$ whose shared ancestry graph is $G$. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 19 pages, 5 figures

MSC Class: 05C05; 05C20; 05C90

arXiv:2306.04367 [pdf, other]

Solving NP-hard Problems on \textsc{GaTEx} Graphs: Linear-Time Algorithms for Perfect Orderings, Cliques, Colorings, and Independent Sets

Authors: Marc Hellmuth, Guillaume E. Scholz

Abstract: The class of $\mathsf{Ga}$lled-$\mathsf{T}$ree $\mathsf{Ex}$plainable ($\mathsf{GaTEx}$) graphs has recently been discovered as a natural generalization of cographs. Cographs are precisely those graphs that can be uniquely represented by a rooted tree where the leaves correspond to the vertices of the graph. As a generalization, $\mathsf{GaTEx}$ graphs are precisely those that can be uniquely repr… ▽ More The class of $\mathsf{Ga}$lled-$\mathsf{T}$ree $\mathsf{Ex}$plainable ($\mathsf{GaTEx}$) graphs has recently been discovered as a natural generalization of cographs. Cographs are precisely those graphs that can be uniquely represented by a rooted tree where the leaves correspond to the vertices of the graph. As a generalization, $\mathsf{GaTEx}$ graphs are precisely those that can be uniquely represented by a particular rooted acyclic network, called a galled-tree. This paper explores the use of galled-trees to solve combinatorial problems on $\mathsf{GaTEx}$ graphs that are, in general, NP-hard. We demonstrate that finding a maximum clique, an optimal vertex coloring, a perfect order, as well as a maximum independent set in $\mathsf{GaTEx}$ graphs can be efficiently done in linear time. The key idea behind the linear-time algorithms is to utilize the galled-trees that explain the $\mathsf{GaTEx}$ graphs as a guide for computing the respective cliques, colorings, perfect orders, or independent sets. △ Less

Submitted 26 April, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2211.16854 [pdf, other]

Resolving Prime Modules: The Structure of Pseudo-cographs and Galled-Tree Explainable Graphs

Authors: Marc Hellmuth, Guillaume E. Scholz

Abstract: The modular decomposition of a graph $G$ is a natural construction to capture key features of $G$ in terms of a labeled tree $(T,t)$ whose vertices are labeled as "series" ($1$), "parallel" ($0$) or "prime". However, full information of $G$ is provided by its modular decomposition tree $(T,t)$ only, if $G$ is a cograph, i.e., $G$ does not contain prime modules. In this case, $(T,t)$ explains $G$,… ▽ More The modular decomposition of a graph $G$ is a natural construction to capture key features of $G$ in terms of a labeled tree $(T,t)$ whose vertices are labeled as "series" ($1$), "parallel" ($0$) or "prime". However, full information of $G$ is provided by its modular decomposition tree $(T,t)$ only, if $G$ is a cograph, i.e., $G$ does not contain prime modules. In this case, $(T,t)$ explains $G$, i.e., $\{x,y\}\in E(G)$ if and only if the lowest common ancestor $\mathrm{lca}_T(x,y)$ of $x$ and $y$ has label "$1$". Pseudo-cographs, or, more general, GaTEx graphs $G$ are graphs that can be explained by labeled galled-trees, i.e., labeled networks $(N,t)$ that are obtained from the modular decomposition tree $(T,t)$ of $G$ by replacing the prime vertices in $T$ by simple labeled cycles. GaTEx graphs can be recognized and labeled galled-trees that explain these graphs can be constructed in linear time. In this contribution, we provide a novel characterization of GaTEx graphs in terms of a set $\mathfrak{F}_{\mathrm{GT}}$ of 25 forbidden induced subgraphs. This characterization, in turn, allows us to show that GaTEx graphs are closely related to many other well-known graph classes such as $P_4$-sparse and $P_4$-reducible graphs, weakly-chordal graphs, perfect graphs with perfect order, comparability and permutation graphs, murky graphs as well as interval graphs, Meyniel graphs or very strongly-perfect and brittle graphs. Moreover, we show that every GaTEx graph as twin-width at most 1 and and provide linear-time algorithms to solve several NP-hard problems (clique, coloring, independent set) on GaTEx graphs by utilizing the structure of the underlying galled-trees they explain. △ Less

Submitted 4 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: 18 pages, 3 figures

arXiv:2211.04322 [pdf, other]

Injective split systems

Authors: M. Hellmuth, K. T. Huber, V. Moulton, G. E. Scholz, P. F. Stadler

Abstract: A split system $\mathcal S$ on a finite set $X$, $|X|\ge3$, is a set of bipartitions or splits of $X$ which contains all splits of the form $\{x,X-\{x\}\}$, $x \in X$. To any such split system $\mathcal S$ we can associate the Buneman graph $\mathcal B(\mathcal S)$ which is essentially a median graph with leaf-set $X$ that displays the splits in $\mathcal S$. In this paper, we consider properties… ▽ More A split system $\mathcal S$ on a finite set $X$, $|X|\ge3$, is a set of bipartitions or splits of $X$ which contains all splits of the form $\{x,X-\{x\}\}$, $x \in X$. To any such split system $\mathcal S$ we can associate the Buneman graph $\mathcal B(\mathcal S)$ which is essentially a median graph with leaf-set $X$ that displays the splits in $\mathcal S$. In this paper, we consider properties of injective split systems, that is, split systems $\mathcal S$ with the property that $\mathrm{med}_{\mathcal B(\mathcal S)}(Y) \neq \mathrm{med}_{\mathrm B(\mathcal S)}(Y')$ for any 3-subsets $Y,Y'$ in $X$, where $\mathrm {med}_{\mathcal B(\mathcal S)}(Y)$ denotes the median in $\mathcal B(\mathcal S)$ of the three elements in $Y$ considered as leaves in $\mathcal B(\mathcal S)$. In particular, we show that for any set $X$ there always exists an injective split system on $X$, and we also give a characterization for when a split system is injective. We also consider how complex the Buneman graph $\mathcal B(\mathcal S)$ needs to become in order for a split system $\mathcal S$ on $X$ to be injective. We do this by introducing a quantity for $|X|$ which we call the injective dimension for $|X|$, as well as two related quantities, called the injective 2-split and the rooted-injective dimension. We derive some upper and lower bounds for all three of these dimensions and also prove that some of these bounds are tight. An underlying motivation for studying injective split systems is that they can be used to obtain a natural generalization of symbolic tree maps. An important consequence of our results is that any three-way symbolic map on $X$ can be represented using Buneman graphs. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 22 pages, 3 figures

arXiv:2203.04630 [pdf, other]

Compatible split systems on a multiset

Authors: Vincent Moulton, Guillaume E. Scholz

Abstract: A split system on a multiset $\mathcal M$ is a set of bipartitions of $\mathcal M$. Such a split system $\mathfrak S$ is compatible if it can be represented by a tree in such a way that the vertices of the tree are labelled by the elements in $\mathcal M$, the removal of each edge in the tree yields a bipartition in $\mathfrak S$ by taking the labels of the two resulting components, and every bipa… ▽ More A split system on a multiset $\mathcal M$ is a set of bipartitions of $\mathcal M$. Such a split system $\mathfrak S$ is compatible if it can be represented by a tree in such a way that the vertices of the tree are labelled by the elements in $\mathcal M$, the removal of each edge in the tree yields a bipartition in $\mathfrak S$ by taking the labels of the two resulting components, and every bipartition in $\mathfrak S$ can be obtained from the tree in this way. In this contribution, we present a novel characterization for compatible split systems, and for split systems admitting a unique tree representation. In addition, we show that a conjecture on compatibility stated in 2008 holds for some large classes of split systems. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 16 pages, 4 figures

arXiv:2202.06644 [pdf, other]

Forest-based networks

Authors: Katharina T. Huber, Vincent Moulton, Guillaume E. Scholz

Abstract: In evolutionary studies it is common to use phylogenetic trees to represent the evolutionary history of a set of species. However, in case the transfer of genes or other genetic information between the species or their ancestors has occurred in the past, a tree may not provide a complete picture of their history. In such cases,tree-based phylogenetic networks can provide a useful, more refined rep… ▽ More In evolutionary studies it is common to use phylogenetic trees to represent the evolutionary history of a set of species. However, in case the transfer of genes or other genetic information between the species or their ancestors has occurred in the past, a tree may not provide a complete picture of their history. In such cases,tree-based phylogenetic networks can provide a useful, more refined representation of the species evolution. Such a network is essentially a phylogenetic tree with some arcs added between the tree edges so as to represent reticulate events such as gene transfer. Even so, this model does not permit the representation of evolutionary scenarios where reticulate events have taken place between different subfamilies or lineages of species. To represent such scenarios, in this paper we introduce the notion of a forest-based phylogenetic network, that is, a collection of leaf-disjoint phylogenetic trees on a set of species with arcs added between the edges of distinct trees within the collection. Forest-based networks include the recently introduced class of overlaid species forests which are used to model introgression. As we shall see, even though the definition of forest-based networks is closely related to that of tree-based networks, they lead to new mathematical theory which complements that of tree-based networks. As well as studying the relationship of forest-based networks with other classes of phylogenetic networks, such as tree-child networks and universal tree-based networks, we present some characterizations of some special classes of forest-based networks. We expect that our results will be useful for developing new models and algorithms to understand reticulate evolution, such as gene transfer between collections of bacteria that live in different environments. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 17 pages, 11 figures

arXiv:2112.05537 [pdf, other]

From Modular Decomposition Trees to Level-1 Networks: Pseudo-Cographs, Polar-Cats and Prime Polar-Cats

Authors: Marc Hellmuth, Guillaume E. Scholz

Abstract: The modular decomposition of a graph $G$ is a natural construction to capture key features of $G$ in terms of a labeled tree $(T,t)$ whose vertices are labeled as "series" ($1$), "parallel" ($0$) or "prime". However, full information of $G$ is provided by its modular decomposition tree $(T,t)$ only, if $G$ does not contain prime modules. In this case, $(T,t)$ explains $G$, i.e., $\{x,y\}\in E(G)$… ▽ More The modular decomposition of a graph $G$ is a natural construction to capture key features of $G$ in terms of a labeled tree $(T,t)$ whose vertices are labeled as "series" ($1$), "parallel" ($0$) or "prime". However, full information of $G$ is provided by its modular decomposition tree $(T,t)$ only, if $G$ does not contain prime modules. In this case, $(T,t)$ explains $G$, i.e., $\{x,y\}\in E(G)$ if and only if the lowest common ancestor $\mathrm{lca}_T(x,y)$ of $x$ and $y$ has label "$1$". This information, however, gets lost whenever $(T,t)$ contains vertices with label "prime". In this contribution, we aim at replacing "prime" vertices in $(T,t)$ by simple 0/1-labeled cycles, which leads to the concept of rooted labeled level-1 networks $(N,t)$. We characterize graphs that can be explained by such level-1 networks $(N,t)$, which generalizes the concept of graphs that can be explained by labeled trees, that is, cographs. We provide three novel graph classes: \emph{polar-cats} are a proper subclass of \emph{pseudo-cographs} which forms a proper subclass of \emph{prime polar-cats}. In particular, every cograph is a pseudo-cograph and prime polar-cats are precisely those graphs that can be explained by a labeled level-1 network. The class of prime polar-cats is defined in terms of the modular decomposition of graphs and the property that all prime modules "induce" polar-cats. We provide a plethora of structural results and characterizations for graphs of these new classes. In addition, we show under which conditions there is a unique least-resolved labeled level-1 network that explains a given graph and provide linear-time algorithms to recognize all these types of graphs and to construct level-1 networks to explain them. △ Less

Submitted 15 June, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

arXiv:2006.14275 [pdf, other]

Overlaid species forests

Authors: K. T. Huber, V. Moulton, G. E. Scholz

Abstract: Introgression is an evolutionary process in which genes or other types of genetic material are introduced into a genome. It is an important evolutionary process that can, for example, play a fundamental role in speciation. Recently the concept of an overlaid species forest was introduced to represent introgression histories. Basically this approach takes a putative gene history in the form of a ph… ▽ More Introgression is an evolutionary process in which genes or other types of genetic material are introduced into a genome. It is an important evolutionary process that can, for example, play a fundamental role in speciation. Recently the concept of an overlaid species forest was introduced to represent introgression histories. Basically this approach takes a putative gene history in the form of a phylogenetic gene tree and tries to overlay this onto a forest which usually consists of a collection of lineage trees for the species of interest. The result is a network called an overlaid species forest in which genes jump or introgress between lineages. In this paper we study properties of overlaid species forests, showing that they have various connections with models for lateral gene transfer, maximum parsimony, and unfolding of phylogenetic networks. In particular, we show that a certain algorithm called OSF-B UILDER for constructing overlaid species forests is guaranteed to a produce a special type of overlaid species forest with a minimum number introgressions, as well as providing some characterizations for networks that can arise from overlaid species forests. We expect that these results will be useful in developing new methods for representing introgression histories, a growing area of interest in phylogenetics. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 18 pages, 5 figures

MSC Class: 05C90; 92D15

arXiv:1804.01841 [pdf, other]

Phylogenetic networks that are their own fold-ups

Authors: Katharina T. Huber, Guillaume E. Scholz

Abstract: Phylogenetic networks are becoming of increasing interest to evolutionary biologists due to their ability to capture complex non-treelike evolutionary processes. From a combinatorial point of view, such networks are certain types of rooted directed acyclic graphs whose leaves are labelled by, for example, species. A number of mathematically interesting classes of phylogenetic networks are known. T… ▽ More Phylogenetic networks are becoming of increasing interest to evolutionary biologists due to their ability to capture complex non-treelike evolutionary processes. From a combinatorial point of view, such networks are certain types of rooted directed acyclic graphs whose leaves are labelled by, for example, species. A number of mathematically interesting classes of phylogenetic networks are known. These include the biologically relevant class of stable phylogenetic networks whose members are defined via certain "fold-up" and "un-fold" operations that link them with concepts arising within the theory of, for example, graph fibrations. Despite this exciting link, the structural complexity of stable phylogenetic networks is still relatively poorly understood. Employing the popular tree-based, reticulation-visible, and tree-child properties which allow one to gauge this complexity in one way or another, we provide novel characterizations for when a stable phylogenetic network satisfies either one of these three properties. △ Less

Submitted 18 October, 2019; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 25 pages, 6 figures

MSC Class: 05C05; 05C20; 05C85; 05D15; 92D15

arXiv:1707.08010 [pdf, other]

Three-way symbolic tree-maps and ultrametrics

Authors: Katharina T. Huber, Vincent Moulton, Guillaume E. Scholz

Abstract: Three-way dissimilarities are a generalization of (two-way) dissimilarities which can be used to indicate the lack of homogeneity or resemblance between any three objects. Such maps have applications in cluster analysis, and have been used in areas such as psychology and phylogenetics, where three-way data tables can arise. Special examples of such dissimilarities are three-way tree-metrics and ul… ▽ More Three-way dissimilarities are a generalization of (two-way) dissimilarities which can be used to indicate the lack of homogeneity or resemblance between any three objects. Such maps have applications in cluster analysis, and have been used in areas such as psychology and phylogenetics, where three-way data tables can arise. Special examples of such dissimilarities are three-way tree-metrics and ultrametrics, which arise from leaf-labelled trees with edges labelled by positive real numbers. Here we consider three-way maps which arise from leaf-labelled trees where instead the interior vertices are labelled by an arbitrary set of values. For unrooted trees we call such maps three-way symbolic tree-maps; for rooted trees we call them three-way symbolic ultrametrics since they can be considered as a generalization of the (two-way) symbolic ultrametrics of Böcker and Dress. We show that, as with two- and three-way tree-metrics and ultrametrics, three-way symbolic tree-maps and ultrametrics can be characterized via certain $k$-point conditions. In the unrooted case, our characterization is mathematically equivalent to one presented by Gurvich for a certain class of edge-labelled hypergraphs. We also show that it can be decided whether or not an arbitrary three-way symbolic map is a tree-map or a symbolic ultrametric using a triplet-based approach that relies on the so-called BUILD algorithm for deciding when a set of 3-leaved trees or triplets can be displayed by a single tree. We envisage that our results will be useful in developing new approaches and algorithms for understanding 3-way data, especially within the area of phylogenetics. △ Less

Submitted 18 October, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

arXiv:1511.08387 [pdf, other]

Bridging the gap between rooted and unrooted phylogenetic networks

Authors: Philippe Gambette, Katharina T. Huber, Guillaume E. Scholz

Abstract: The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard phylogenetic tree model by also allowing cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees, however, surprisingly little is known about the interp… ▽ More The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard phylogenetic tree model by also allowing cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees, however, surprisingly little is known about the interplay between both types thus hampering our ability to make much needed progress for rooted phylogenetic networks by drawing on insights from their much better understood unrooted counterparts. Unrooted phylogenetic networks are underpinned by split systems and by focusing on them we establish a first link between both types. More precisely, we develop a link between 1-nested phylogenetic networks which are examples of rooted phylogenetic networks and the well-studied median networks (aka Buneman graph) which are examples of unrooted phylogenetic networks. In particular, we show that not only can a 1-nested network be obtained from a median network but also that that network is, in a well-defined sense, optimal. Along the way, we characterize circular split systems in terms of the novel $\mathcal I$-intersection closure of a split system and establish the 1-nested analogue of the fundamental "Splits Equivalence Theorem" for phylogenetic trees. △ Less

Submitted 26 November, 2015; originally announced November 2015.

Comments: 28 pages, 5 figures

MSC Class: 92D15; 92B10

Showing 1–13 of 13 results for author: Scholz, G E