subscribe to arXiv mailings

A Spectrum of Approximate Probabilistic Bisimulations

Authors: Timm Spork, Christel Baier, Joost-Pieter Katoen, Jakob Piribauer, Tim Quatmann

Abstract: This paper studies various notions of approximate probabilistic bisimulation on labeled Markov chains (LMCs). We introduce approximate versions of weak and branching bisimulation, as well as a notion of $\varepsilon$-perturbed bisimulation that relates LMCs that can be made (exactly) probabilistically bisimilar by small perturbations of their transition probabilities. We explore how the notions in… ▽ More This paper studies various notions of approximate probabilistic bisimulation on labeled Markov chains (LMCs). We introduce approximate versions of weak and branching bisimulation, as well as a notion of $\varepsilon$-perturbed bisimulation that relates LMCs that can be made (exactly) probabilistically bisimilar by small perturbations of their transition probabilities. We explore how the notions interrelate and establish their connections to other well-known notions like $\varepsilon$-bisimulation. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Full version of a paper accepted for publication at CONCUR 2024

arXiv:2407.06887 [pdf, other]

Risk-averse optimization of total rewards in Markovian models using deviation measures

Authors: Christel Baier, Jakob Piribauer, Maximilian Starke

Abstract: This paper addresses objectives tailored to the risk-averse optimization of accumulated rewards in Markov decision processes (MDPs). The studied objectives require maximizing the expected value of the accumulated rewards minus a penalty factor times a deviation measure of the resulting distribution of rewards. Using the variance in this penalty mechanism leads to the variance-penalized expectation… ▽ More This paper addresses objectives tailored to the risk-averse optimization of accumulated rewards in Markov decision processes (MDPs). The studied objectives require maximizing the expected value of the accumulated rewards minus a penalty factor times a deviation measure of the resulting distribution of rewards. Using the variance in this penalty mechanism leads to the variance-penalized expectation (VPE) for which it is known that optimal schedulers have to minimize future expected rewards when a high amount of rewards has been accumulated. This behavior is undesirable as risk-averse behavior should keep the probability of particularly low outcomes low, but not discourage the accumulation of additional rewards on already good executions. The paper investigates the semi-variance, which only takes outcomes below the expected value into account, the mean absolute deviation (MAD), and the semi-MAD as alternative deviation measures. Furthermore, a penalty mechanism that penalizes outcomes below a fixed threshold is studied. For all of these objectives, the properties of optimal schedulers are specified and in particular the question whether these objectives overcome the problem observed for the VPE is answered. Further, the resulting algorithmic problems on MDPs and Markov chains are investigated. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: This is the extended version of a paper accepted for publication at CONCUR 2024

arXiv:2406.18727 [pdf, other]

Demonic variance and a non-determinism score for Markov decision processes

Authors: Jakob Piribauer

Abstract: This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP M, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of M in which… ▽ More This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP M, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of M in which also the non-deterministic choices are resolved independently by two distinct schedulers. It is shown that the demonic variance is between 1 and 2 times as large as the maximal variance of X in M that can be achieved by a single scheduler. This allows defining a non-determinism score for M and X measuring how strongly the difference of X in two executions of M can be influenced by the non-deterministic choices. Properties of MDPs M with extremal values of the non-determinism score are established. Further, the algorithmic problems of computing the maximal variance and the demonic variance are investigated for two random variables, namely weighted reachability and accumulated rewards. In the process, also the structure of schedulers maximizing the variance and of scheduler pairs realizing the demonic variance is analyzed. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: This is the extended version of a conference paper accepted for publication at MFCS 2024

arXiv:2406.15087 [pdf, ps, other]

Model Checking Markov Chains as Distribution Transformers

Authors: Rajab Aghamov, Christel Baier, Toghrul Karimov, Joris Nieuwveld, Joël Ouaknine, Jakob Piribauer, Mihir Vahanwala

Abstract: The conventional perspective on Markov chains considers decision problems concerning the probabilities of temporal properties being satisfied by traces of visited states. However, consider the following query made of a stochastic system modelling the weather: given the conditions today, will there be a day with less than 50\% chance of rain? The conventional perspective is ill-equipped to decide s… ▽ More The conventional perspective on Markov chains considers decision problems concerning the probabilities of temporal properties being satisfied by traces of visited states. However, consider the following query made of a stochastic system modelling the weather: given the conditions today, will there be a day with less than 50\% chance of rain? The conventional perspective is ill-equipped to decide such problems regarding the evolution of the initial distribution. The alternate perspective we consider views Markov chains as distribution transformers: the focus is on the sequence of distributions on states at each step, where the evolution is driven by the underlying stochastic transition matrix. More precisely, given an initial distribution vector $μ$, a stochastic update transition matrix $M$, we ask whether the ensuing sequence of distributions $(μ, Mμ, M^2μ, \dots)$ satisfies a given temporal property. This is a special case of the model-checking problem for linear dynamical systems, which is not known to be decidable in full generality. The goal of this article is to delineate the classes of instances for which this problem can be solved, under the assumption that the dynamics is governed by stochastic matrices. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2402.01539 [pdf, ps, other]

Backward Responsibility in Transition Systems Using General Power Indices

Authors: Christel Baier, Roxane van den Bossche, Sascha Klüppelholz, Johannes Lehmann, Jakob Piribauer

Abstract: To improve reliability and the understanding of AI systems, there is increasing interest in the use of formal methods, e.g. model checking. Model checking tools produce a counterexample when a model does not satisfy a property. Understanding these counterexamples is critical for efficient debugging, as it allows the developer to focus on the parts of the program that caused the issue. To this en… ▽ More To improve reliability and the understanding of AI systems, there is increasing interest in the use of formal methods, e.g. model checking. Model checking tools produce a counterexample when a model does not satisfy a property. Understanding these counterexamples is critical for efficient debugging, as it allows the developer to focus on the parts of the program that caused the issue. To this end, we present a new technique that ascribes a responsibility value to each state in a transition system that does not satisfy a given safety property. The value is higher if the non-deterministic choices in a state have more power to change the outcome, given the behaviour observed in the counterexample. For this, we employ a concept from cooperative game theory -- namely general power indices, such as the Shapley value -- to compute the responsibility of the states. We present an optimistic and pessimistic version of responsibility that differ in how they treat the states that do not lie on the counterexample. We give a characterisation of optimistic responsibility that leads to an efficient algorithm for it and show computational hardness of the pessimistic version. We also present a tool to compute responsibility and show how a stochastic algorithm can be used to approximate responsibility in larger models. These methods can be deployed in the design phase, at runtime and at inspection time to gain insights on causal relations within the behavior of AI systems. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2310.01001 [pdf, other]

doi 10.4204/EPTCS.390.9

Counterfactual Causality for Reachability and Safety based on Distance Functions

Authors: Julie Parreaux, Jakob Piribauer, Christel Baier

Abstract: Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems base… ▽ More Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems based on Stalnaker's and Lewis' semantics of counterfactuals in terms of most similar possible worlds and introduces a novel corresponding notion of counterfactual causality in two-player games. Using distance functions between paths in transition systems, this notion defines whether reaching a certain set of states is a cause for the violation of a reachability or safety property. Similarly, using distance functions between memoryless strategies in reachability and safety games, it is defined whether reaching a set of states is a cause for the fact that a given strategy for the player under investigation is losing. The contribution of the paper is two-fold: In transition systems, it is shown that counterfactual causality can be checked in polynomial time for three prominent distance functions between paths. In two-player games, the introduced notion of counterfactual causality is shown to be checkable in polynomial time for two natural distance functions between memoryless strategies. Further, a notion of explanation that can be extracted from a counterfactual cause and that pinpoints changes to be made to the given strategy in order to transform it into a winning strategy is defined. For the two distance functions under consideration, the problem to decide whether such an explanation imposes only minimal necessary changes to the given strategy with respect to the used distance function turns out to be coNP-complete and not to be solvable in polynomial time if P is not equal to NP, respectively. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: In Proceedings GandALF 2023, arXiv:2309.17318. An extended version can be found at arXiv:2308.11385v1

Report number: EPTCS 390-9

Journal ref: EPTCS 390, 2023, pp. 132-149

arXiv:2308.11385 [pdf, other]

Counterfactual Causality for Reachability and Safety based on Distance Functions

Authors: Julie Parreaux, Jakob Piribauer, Christel Baier

Abstract: Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems base… ▽ More Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems based on Stalnaker's and Lewis' semantics of counterfactuals in terms of most similar possible worlds and introduces a novel corresponding notion of counterfactual causality in two-player games. Using distance functions between paths in transition systems, this notion defines whether reaching a certain set of states is a cause for the violation of a reachability or safety property. Similarly, using distance functions between memoryless strategies in reachability and safety games, it is defined whether reaching a set of states is a cause for the fact that a given strategy for the player under investigation is losing. The contribution of the paper is two-fold: In transition systems, it is shown that counterfactual causality can be checked in polynomial time for three prominent distance functions between paths. In two-player games, the introduced notion of counterfactual causality is shown to be checkable in polynomial time for two natural distance functions between memoryless strategies. Further, a notion of explanation that can be extracted from a counterfactual cause and that pinpoints changes to be made to the given strategy in order to transform it into a winning strategy is defined. For the two distance functions under consideration, the problem to decide whether such an explanation imposes only minimal necessary changes to the given strategy with respect to the used distance function turns out to be coNP-complete and not to be solvable in polynomial time if P is not equal to NP, respectively. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: This is the extended version of a paper accepted for publication at GandALF 2023

arXiv:2307.06611 [pdf, other]

Entropic Risk for Turn-Based Stochastic Games

Authors: Christel Baier, Krishnendu Chatterjee, Tobias Meggendorfer, Jakob Piribauer

Abstract: Entropic risk (ERisk) is an established risk measure in finance, quantifying risk by an exponential re-weighting of rewards. We study ERisk for the first time in the context of turn-based stochastic games with the total reward objective. This gives rise to an objective function that demands the control of systems in a risk-averse manner. We show that the resulting games are determined and, in part… ▽ More Entropic risk (ERisk) is an established risk measure in finance, quantifying risk by an exponential re-weighting of rewards. We study ERisk for the first time in the context of turn-based stochastic games with the total reward objective. This gives rise to an objective function that demands the control of systems in a risk-averse manner. We show that the resulting games are determined and, in particular, admit optimal memoryless deterministic strategies. This contrasts risk measures that previously have been considered in the special case of Markov decision processes and that require randomization and/or memory. We provide several results on the decidability and the computational complexity of the threshold problem, i.e. whether the optimal value of ERisk exceeds a given threshold. In the most general case, the problem is decidable subject to Shanuel's conjecture. If all inputs are rational, the resulting threshold problem can be solved using algebraic numbers, leading to decidability via a polynomial-time reduction to the existential theory of the reals. Further restrictions on the encoding of the input allow the solution of the threshold problem in NP$\cap$coNP. Finally, an approximation algorithm for the optimal value of ERisk is provided. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: This is the extended version of a paper accepted for publication at MFCS 2023

arXiv:2302.13675 [pdf, other]

doi 10.46298/theoretics.24.9

Positivity-hardness results on Markov decision processes

Authors: Jakob Piribauer, Christel Baier

Abstract: This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction proba… ▽ More This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction probabilities of constraints on the total accumulated weight, the computation of quantiles for the accumulated weight, and the conditional value-at-risk for accumulated weights for integer-weighted MDPs. Although algorithmic results are available for some special instances, the decidability status of the decision versions of these problems is unknown in general. The paper demonstrates that these optimization problems are inherently mathematically difficult by providing polynomial-time reductions from the Positivity problem for linear recurrence sequences. This problem is a well-known number-theoretic problem whose decidability status has been open for decades and it is known that decidability of the Positivity problem would have far-reaching consequences in analytic number theory. So, the reductions presented in the paper show that an algorithmic solution to any of the investigated problems is not possible without a major breakthrough in analytic number theory. The reductions rely on the construction of MDP-gadgets that encode the initial values and linear recurrence relations of linear recurrence sequences. These gadgets can flexibly be adjusted to prove the various Positivity-hardness results. △ Less

Submitted 5 April, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2004.11441

ACM Class: G.3

Journal ref: TheoretiCS (April 8, 2024) theoretics:11035

arXiv:2209.02973 [pdf, other]

doi 10.46298/lmcs-20(1:4)2024

Foundations of probability-raising causality in Markov decision processes

Authors: Christel Baier, Jakob Piribauer, Robin Ziemek

Abstract: This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This incl… ▽ More This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This includes algorithms for checking cause conditions given an effect and deciding the existence of probability-raising causes. As the definition allows for sub-optimal coverage properties, quality measures for causes inspired by concepts of statistical analysis are studied. These include recall, coverage ratio and f-score. The computational complexity for finding optimal causes with respect to these measures is analyzed. △ Less

Submitted 18 January, 2024; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.08768

Journal ref: Logical Methods in Computer Science (January 19, 2024) lmcs:10015

arXiv:2204.12280 [pdf, other]

The variance-penalized stochastic shortest path problem

Authors: Jakob Piribauer, Ocan Sankur, Christel Baier

Abstract: The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulat… ▽ More The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative weights as well as an optimal deterministic finite-memory scheduler can be computed in exponential space. The threshold problem whether the maximal VPE exceeds a given rational is shown to be EXPTIME-hard and to lie in NEXPTIME. Furthermore, a result of interest in its own right obtained on the way is that a variance-minimal scheduler among all expectation-optimal schedulers can be computed in polynomial time. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: This is the extended version of the conference version accepted for publication at ICALP 2022

arXiv:2201.08768 [pdf, other]

On probability-raising causality in Markov decision processes

Authors: Christel Baier, Florian Funke, Jakob Piribauer, Robin Ziemek

Abstract: The purpose of this paper is to introduce a notion of causality in Markov decision processes based on the probability-raising principle and to analyze its algorithmic properties. The latter includes algorithms for checking cause-effect relationships and the existence of probability-raising causes for given effect scenarios. Inspired by concepts of statistical analysis, we study quality measures (r… ▽ More The purpose of this paper is to introduce a notion of causality in Markov decision processes based on the probability-raising principle and to analyze its algorithmic properties. The latter includes algorithms for checking cause-effect relationships and the existence of probability-raising causes for given effect scenarios. Inspired by concepts of statistical analysis, we study quality measures (recall, coverage ratio and f-score) for causes and develop algorithms for their computation. Finally, the computational complexity for finding optimal causes with respect to these measures is analyzed. △ Less

Submitted 21 January, 2022; originally announced January 2022.

Comments: This is the extended version of a conference version accepted for publication at FoSSaCS 2022

arXiv:2109.08326 [pdf, other]

doi 10.4204/EPTCS.346.3

Witnessing Subsystems for Probabilistic Systems with Low Tree Width

Authors: Simon Jantsch, Jakob Piribauer, Christel Baier

Abstract: A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic sys… ▽ More A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic systems that are similar to trees or paths. It introduces the parameters directed tree-partition width (dtpw) and directed path-partition width (dppw) and shows that computing minimal witnesses remains NP-hard for Markov chains with bounded dppw (and hence also for Markov chains with bounded dtpw). By observing that graphs of bounded dtpw have bounded width with respect to all known tree similarity measures for directed graphs, the hardness result carries over to these other tree similarity measures. Technically, the reduction proceeds via the conceptually simpler matrix-pair chain problem, which is introduced and shown to be NP-complete for nonnegative matrices of fixed dimension. Furthermore, an algorithm which aims to utilise a given directed tree partition of the system to compute a minimal witnessing subsystem is described. It enumerates partial subsystems for the blocks of the partition along the tree order, and keeps only necessary ones. A preliminary experimental analysis shows that it outperforms other approaches on certain benchmarks which have directed tree partitions of small width. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: In Proceedings GandALF 2021, arXiv:2109.07798. A full version of this paper, containing all proofs, appears at arXiv:2108.08070

Report number: EPTCS 346-3

Journal ref: EPTCS 346, 2021, pp. 35-51

arXiv:2108.08070 [pdf, other]

Witnessing subsystems for probabilistic systems with low tree width

Authors: Simon Jantsch, Jakob Piribauer, Christel Baier

Abstract: A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic sys… ▽ More A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic systems that are similar to trees or paths. It introduces the parameters directed tree-partition width (dtpw) and directed path-partition width (dppw) and shows that computing minimal witnesses remains NP-hard for Markov chains with bounded dppw (and hence also for Markov chains with bounded dtpw). By observing that graphs of bounded dtpw have bounded width with respect to all known tree similarity measures for directed graphs, the hardness result carries over to these other tree similarity measures. Technically, the reduction proceeds via the conceptually simpler matrix-pair chain problem, which is introduced and shown to be NP-complete for nonnegative matrices of fixed dimension. Furthermore, an algorithm which aims to utilise a given directed tree partition of the system to compute a minimal witnessing subsystem is described. It enumerates partial subsystems for the blocks of the partition along the tree order, and keeps only necessary ones. A preliminary experimental analysis shows that it outperforms other approaches on certain benchmarks which have directed tree partitions of small width. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: conference version accepted for publication at GandALF 2021

arXiv:2105.09533 [pdf, other]

From Verification to Causality-based Explications

Authors: Christel Baier, Clemens Dubslaff, Florian Funke, Simon Jantsch, Rupak Majumdar, Jakob Piribauer, Robin Ziemek

Abstract: In view of the growing complexity of modern software architectures, formal models are increasingly used to understand why a system works the way it does, opposed to simply verifying that it behaves as intended. This paper surveys approaches to formally explicate the observable behavior of reactive systems. We describe how Halpern and Pearl's notion of actual causation inspired verification-oriente… ▽ More In view of the growing complexity of modern software architectures, formal models are increasingly used to understand why a system works the way it does, opposed to simply verifying that it behaves as intended. This paper surveys approaches to formally explicate the observable behavior of reactive systems. We describe how Halpern and Pearl's notion of actual causation inspired verification-oriented studies of cause-effect relationships in the evolution of a system. A second focus lies on applications of the Shapley value to responsibility ascriptions, aimed to measure the influence of an event on an observable effect. Finally, formal approaches to probabilistic causation are collected and connected, and their relevance to the understanding of probabilistic systems is discussed. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: 20 pages, to appear in Proceedings of ICALP'21

arXiv:2104.13604 [pdf, ps, other]

Probabilistic causes in Markov chains

Authors: Christel Baier, Florian Funke, Simon Jantsch, Jakob Piribauer, Robin Ziemek

Abstract: The paper studies a probabilistic notion of causes in Markov chains that relies on the counterfactuality principle and the probability-raising property. This notion is motivated by the use of causes for monitoring purposes where the aim is to detect faulty or undesired behaviours before they actually occur. A cause is a set of finite executions of the system after which the probability of the effe… ▽ More The paper studies a probabilistic notion of causes in Markov chains that relies on the counterfactuality principle and the probability-raising property. This notion is motivated by the use of causes for monitoring purposes where the aim is to detect faulty or undesired behaviours before they actually occur. A cause is a set of finite executions of the system after which the probability of the effect exceeds a given threshold. We introduce multiple types of costs that capture the consumption of resources from different perspectives, and study the complexity of computing cost-minimal causes. △ Less

Submitted 8 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: Full version of a conference paper at ATVA'21; 26 pages, 9 figures

arXiv:2004.11441 [pdf, other]

On Skolem-hardness and saturation points in Markov decision processes

Authors: Jakob Piribauer, Christel Baier

Abstract: The Skolem problem and the related Positivity problem for linear recurrence sequences are outstanding number-theoretic problems whose decidability has been open for many decades. In this paper, the inherent mathematical difficulty of a series of optimization problems on Markov decision processes (MDPs) is shown by a reduction from the Positivity problem to the associated decision problems which es… ▽ More The Skolem problem and the related Positivity problem for linear recurrence sequences are outstanding number-theoretic problems whose decidability has been open for many decades. In this paper, the inherent mathematical difficulty of a series of optimization problems on Markov decision processes (MDPs) is shown by a reduction from the Positivity problem to the associated decision problems which establishes that the problems are also at least as hard as the Skolem problem as an immediate consequence. The optimization problems under consideration are two non-classical variants of the stochastic shortest path problem (SSPP) in terms of expected partial or conditional accumulated weights, the optimization of the conditional value-at-risk for accumulated weights, and two problems addressing the long-run satisfaction of path properties, namely the optimization of long-run probabilities of regular co-safety properties and the model-checking problem of the logic frequency-LTL. To prove the Positivity- and hence Skolem-hardness for the latter two problems, a new auxiliary path measure, called weighted long-run frequency, is introduced and the Positivity-hardness of the corresponding decision problem is shown as an intermediate step. For the partial and conditional SSPP on MDPs with non-negative weights and for the optimization of long-run probabilities of constrained reachability properties (a U b), solutions are known that rely on the identification of a bound on the accumulated weight or the number of consecutive visits to certain sates, called a saturation point, from which on optimal schedulers behave memorylessly. In this paper, it is shown that also the optimization of the conditional value-at-risk for the classical SSPP and of weighted long-run frequencies on MDPs with non-negative weights can be solved in pseudo-polynomial time exploiting the existence of a saturation point. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: Conference version accepted for publication at ICALP'20

arXiv:1902.04538 [pdf, ps, other]

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Authors: Jakob Piribauer, Christel Baier

Abstract: The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights,… ▽ More The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights, where the probability mass is redistributed to paths reaching the goal. Both variants constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), but have only been studied in structures with non-negative weights. Our main results are as follows. There are polynomial-time algorithms to check the finiteness of the supremum of the partial or conditional expectations in MDPs with arbitrary integer weights. If finite, then optimal weight-based deterministic schedulers exist. In contrast to the setting of non-negative weights, optimal schedulers can need infinite memory and their value can be irrational. However, the optimal value can be approximated up to an absolute error of $ε$ in time exponential in the size of the MDP and polynomial in $\log(1/ε)$. △ Less

Submitted 30 April, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: This is the extended version of a paper published at FoSSaCS'19

Showing 1–18 of 18 results for author: Piribauer, J