-
A Spectrum of Approximate Probabilistic Bisimulations
Authors:
Timm Spork,
Christel Baier,
Joost-Pieter Katoen,
Jakob Piribauer,
Tim Quatmann
Abstract:
This paper studies various notions of approximate probabilistic bisimulation on labeled Markov chains (LMCs). We introduce approximate versions of weak and branching bisimulation, as well as a notion of $\varepsilon$-perturbed bisimulation that relates LMCs that can be made (exactly) probabilistically bisimilar by small perturbations of their transition probabilities. We explore how the notions in…
▽ More
This paper studies various notions of approximate probabilistic bisimulation on labeled Markov chains (LMCs). We introduce approximate versions of weak and branching bisimulation, as well as a notion of $\varepsilon$-perturbed bisimulation that relates LMCs that can be made (exactly) probabilistically bisimilar by small perturbations of their transition probabilities. We explore how the notions interrelate and establish their connections to other well-known notions like $\varepsilon$-bisimulation.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Risk-averse optimization of total rewards in Markovian models using deviation measures
Authors:
Christel Baier,
Jakob Piribauer,
Maximilian Starke
Abstract:
This paper addresses objectives tailored to the risk-averse optimization of accumulated rewards in Markov decision processes (MDPs). The studied objectives require maximizing the expected value of the accumulated rewards minus a penalty factor times a deviation measure of the resulting distribution of rewards. Using the variance in this penalty mechanism leads to the variance-penalized expectation…
▽ More
This paper addresses objectives tailored to the risk-averse optimization of accumulated rewards in Markov decision processes (MDPs). The studied objectives require maximizing the expected value of the accumulated rewards minus a penalty factor times a deviation measure of the resulting distribution of rewards. Using the variance in this penalty mechanism leads to the variance-penalized expectation (VPE) for which it is known that optimal schedulers have to minimize future expected rewards when a high amount of rewards has been accumulated. This behavior is undesirable as risk-averse behavior should keep the probability of particularly low outcomes low, but not discourage the accumulation of additional rewards on already good executions. The paper investigates the semi-variance, which only takes outcomes below the expected value into account, the mean absolute deviation (MAD), and the semi-MAD as alternative deviation measures. Furthermore, a penalty mechanism that penalizes outcomes below a fixed threshold is studied. For all of these objectives, the properties of optimal schedulers are specified and in particular the question whether these objectives overcome the problem observed for the VPE is answered. Further, the resulting algorithmic problems on MDPs and Markov chains are investigated.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Demonic variance and a non-determinism score for Markov decision processes
Authors:
Jakob Piribauer
Abstract:
This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP M, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of M in which…
▽ More
This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP M, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of M in which also the non-deterministic choices are resolved independently by two distinct schedulers. It is shown that the demonic variance is between 1 and 2 times as large as the maximal variance of X in M that can be achieved by a single scheduler. This allows defining a non-determinism score for M and X measuring how strongly the difference of X in two executions of M can be influenced by the non-deterministic choices. Properties of MDPs M with extremal values of the non-determinism score are established. Further, the algorithmic problems of computing the maximal variance and the demonic variance are investigated for two random variables, namely weighted reachability and accumulated rewards. In the process, also the structure of schedulers maximizing the variance and of scheduler pairs realizing the demonic variance is analyzed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Model Checking Markov Chains as Distribution Transformers
Authors:
Rajab Aghamov,
Christel Baier,
Toghrul Karimov,
Joris Nieuwveld,
Joël Ouaknine,
Jakob Piribauer,
Mihir Vahanwala
Abstract:
The conventional perspective on Markov chains considers decision problems concerning the probabilities of temporal properties being satisfied by traces of visited states. However, consider the following query made of a stochastic system modelling the weather: given the conditions today, will there be a day with less than 50\% chance of rain? The conventional perspective is ill-equipped to decide s…
▽ More
The conventional perspective on Markov chains considers decision problems concerning the probabilities of temporal properties being satisfied by traces of visited states. However, consider the following query made of a stochastic system modelling the weather: given the conditions today, will there be a day with less than 50\% chance of rain? The conventional perspective is ill-equipped to decide such problems regarding the evolution of the initial distribution. The alternate perspective we consider views Markov chains as distribution transformers: the focus is on the sequence of distributions on states at each step, where the evolution is driven by the underlying stochastic transition matrix. More precisely, given an initial distribution vector $μ$, a stochastic update transition matrix $M$, we ask whether the ensuing sequence of distributions $(μ, Mμ, M^2μ, \dots)$ satisfies a given temporal property. This is a special case of the model-checking problem for linear dynamical systems, which is not known to be decidable in full generality. The goal of this article is to delineate the classes of instances for which this problem can be solved, under the assumption that the dynamics is governed by stochastic matrices.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Backward Responsibility in Transition Systems Using General Power Indices
Authors:
Christel Baier,
Roxane van den Bossche,
Sascha Klüppelholz,
Johannes Lehmann,
Jakob Piribauer
Abstract:
To improve reliability and the understanding of AI systems, there is increasing interest in the use of formal methods, e.g. model checking. Model checking tools produce a counterexample when a model does not satisfy a property. Understanding these counterexamples is critical for efficient debugging, as it allows the developer to focus on the parts of the program that caused the issue.
To this en…
▽ More
To improve reliability and the understanding of AI systems, there is increasing interest in the use of formal methods, e.g. model checking. Model checking tools produce a counterexample when a model does not satisfy a property. Understanding these counterexamples is critical for efficient debugging, as it allows the developer to focus on the parts of the program that caused the issue.
To this end, we present a new technique that ascribes a responsibility value to each state in a transition system that does not satisfy a given safety property. The value is higher if the non-deterministic choices in a state have more power to change the outcome, given the behaviour observed in the counterexample. For this, we employ a concept from cooperative game theory -- namely general power indices, such as the Shapley value -- to compute the responsibility of the states.
We present an optimistic and pessimistic version of responsibility that differ in how they treat the states that do not lie on the counterexample. We give a characterisation of optimistic responsibility that leads to an efficient algorithm for it and show computational hardness of the pessimistic version. We also present a tool to compute responsibility and show how a stochastic algorithm can be used to approximate responsibility in larger models. These methods can be deployed in the design phase, at runtime and at inspection time to gain insights on causal relations within the behavior of AI systems.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Counterfactual Causality for Reachability and Safety based on Distance Functions
Authors:
Julie Parreaux,
Jakob Piribauer,
Christel Baier
Abstract:
Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems base…
▽ More
Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems based on Stalnaker's and Lewis' semantics of counterfactuals in terms of most similar possible worlds and introduces a novel corresponding notion of counterfactual causality in two-player games. Using distance functions between paths in transition systems, this notion defines whether reaching a certain set of states is a cause for the violation of a reachability or safety property. Similarly, using distance functions between memoryless strategies in reachability and safety games, it is defined whether reaching a set of states is a cause for the fact that a given strategy for the player under investigation is losing. The contribution of the paper is two-fold: In transition systems, it is shown that counterfactual causality can be checked in polynomial time for three prominent distance functions between paths. In two-player games, the introduced notion of counterfactual causality is shown to be checkable in polynomial time for two natural distance functions between memoryless strategies. Further, a notion of explanation that can be extracted from a counterfactual cause and that pinpoints changes to be made to the given strategy in order to transform it into a winning strategy is defined. For the two distance functions under consideration, the problem to decide whether such an explanation imposes only minimal necessary changes to the given strategy with respect to the used distance function turns out to be coNP-complete and not to be solvable in polynomial time if P is not equal to NP, respectively.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Counterfactual Causality for Reachability and Safety based on Distance Functions
Authors:
Julie Parreaux,
Jakob Piribauer,
Christel Baier
Abstract:
Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems base…
▽ More
Investigations of causality in operational systems aim at providing human-understandable explanations of why a system behaves as it does. There is, in particular, a demand to explain what went wrong on a given counterexample execution that shows that a system does not satisfy a given specification. To this end, this paper investigates a notion of counterfactual causality in transition systems based on Stalnaker's and Lewis' semantics of counterfactuals in terms of most similar possible worlds and introduces a novel corresponding notion of counterfactual causality in two-player games. Using distance functions between paths in transition systems, this notion defines whether reaching a certain set of states is a cause for the violation of a reachability or safety property. Similarly, using distance functions between memoryless strategies in reachability and safety games, it is defined whether reaching a set of states is a cause for the fact that a given strategy for the player under investigation is losing. The contribution of the paper is two-fold: In transition systems, it is shown that counterfactual causality can be checked in polynomial time for three prominent distance functions between paths. In two-player games, the introduced notion of counterfactual causality is shown to be checkable in polynomial time for two natural distance functions between memoryless strategies. Further, a notion of explanation that can be extracted from a counterfactual cause and that pinpoints changes to be made to the given strategy in order to transform it into a winning strategy is defined. For the two distance functions under consideration, the problem to decide whether such an explanation imposes only minimal necessary changes to the given strategy with respect to the used distance function turns out to be coNP-complete and not to be solvable in polynomial time if P is not equal to NP, respectively.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Entropic Risk for Turn-Based Stochastic Games
Authors:
Christel Baier,
Krishnendu Chatterjee,
Tobias Meggendorfer,
Jakob Piribauer
Abstract:
Entropic risk (ERisk) is an established risk measure in finance, quantifying risk by an exponential re-weighting of rewards. We study ERisk for the first time in the context of turn-based stochastic games with the total reward objective. This gives rise to an objective function that demands the control of systems in a risk-averse manner. We show that the resulting games are determined and, in part…
▽ More
Entropic risk (ERisk) is an established risk measure in finance, quantifying risk by an exponential re-weighting of rewards. We study ERisk for the first time in the context of turn-based stochastic games with the total reward objective. This gives rise to an objective function that demands the control of systems in a risk-averse manner. We show that the resulting games are determined and, in particular, admit optimal memoryless deterministic strategies. This contrasts risk measures that previously have been considered in the special case of Markov decision processes and that require randomization and/or memory. We provide several results on the decidability and the computational complexity of the threshold problem, i.e. whether the optimal value of ERisk exceeds a given threshold. In the most general case, the problem is decidable subject to Shanuel's conjecture. If all inputs are rational, the resulting threshold problem can be solved using algebraic numbers, leading to decidability via a polynomial-time reduction to the existential theory of the reals. Further restrictions on the encoding of the input allow the solution of the threshold problem in NP$\cap$coNP. Finally, an approximation algorithm for the optimal value of ERisk is provided.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Positivity-hardness results on Markov decision processes
Authors:
Jakob Piribauer,
Christel Baier
Abstract:
This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction proba…
▽ More
This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction probabilities of constraints on the total accumulated weight, the computation of quantiles for the accumulated weight, and the conditional value-at-risk for accumulated weights for integer-weighted MDPs. Although algorithmic results are available for some special instances, the decidability status of the decision versions of these problems is unknown in general. The paper demonstrates that these optimization problems are inherently mathematically difficult by providing polynomial-time reductions from the Positivity problem for linear recurrence sequences. This problem is a well-known number-theoretic problem whose decidability status has been open for decades and it is known that decidability of the Positivity problem would have far-reaching consequences in analytic number theory. So, the reductions presented in the paper show that an algorithmic solution to any of the investigated problems is not possible without a major breakthrough in analytic number theory. The reductions rely on the construction of MDP-gadgets that encode the initial values and linear recurrence relations of linear recurrence sequences. These gadgets can flexibly be adjusted to prove the various Positivity-hardness results.
△ Less
Submitted 5 April, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Foundations of probability-raising causality in Markov decision processes
Authors:
Christel Baier,
Jakob Piribauer,
Robin Ziemek
Abstract:
This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This incl…
▽ More
This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This includes algorithms for checking cause conditions given an effect and deciding the existence of probability-raising causes. As the definition allows for sub-optimal coverage properties, quality measures for causes inspired by concepts of statistical analysis are studied. These include recall, coverage ratio and f-score. The computational complexity for finding optimal causes with respect to these measures is analyzed.
△ Less
Submitted 18 January, 2024; v1 submitted 7 September, 2022;
originally announced September 2022.
-
The variance-penalized stochastic shortest path problem
Authors:
Jakob Piribauer,
Ocan Sankur,
Christel Baier
Abstract:
The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulat…
▽ More
The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative weights as well as an optimal deterministic finite-memory scheduler can be computed in exponential space. The threshold problem whether the maximal VPE exceeds a given rational is shown to be EXPTIME-hard and to lie in NEXPTIME. Furthermore, a result of interest in its own right obtained on the way is that a variance-minimal scheduler among all expectation-optimal schedulers can be computed in polynomial time.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
On probability-raising causality in Markov decision processes
Authors:
Christel Baier,
Florian Funke,
Jakob Piribauer,
Robin Ziemek
Abstract:
The purpose of this paper is to introduce a notion of causality in Markov decision processes based on the probability-raising principle and to analyze its algorithmic properties. The latter includes algorithms for checking cause-effect relationships and the existence of probability-raising causes for given effect scenarios. Inspired by concepts of statistical analysis, we study quality measures (r…
▽ More
The purpose of this paper is to introduce a notion of causality in Markov decision processes based on the probability-raising principle and to analyze its algorithmic properties. The latter includes algorithms for checking cause-effect relationships and the existence of probability-raising causes for given effect scenarios. Inspired by concepts of statistical analysis, we study quality measures (recall, coverage ratio and f-score) for causes and develop algorithms for their computation. Finally, the computational complexity for finding optimal causes with respect to these measures is analyzed.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
Witnessing Subsystems for Probabilistic Systems with Low Tree Width
Authors:
Simon Jantsch,
Jakob Piribauer,
Christel Baier
Abstract:
A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic sys…
▽ More
A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic systems that are similar to trees or paths. It introduces the parameters directed tree-partition width (dtpw) and directed path-partition width (dppw) and shows that computing minimal witnesses remains NP-hard for Markov chains with bounded dppw (and hence also for Markov chains with bounded dtpw). By observing that graphs of bounded dtpw have bounded width with respect to all known tree similarity measures for directed graphs, the hardness result carries over to these other tree similarity measures. Technically, the reduction proceeds via the conceptually simpler matrix-pair chain problem, which is introduced and shown to be NP-complete for nonnegative matrices of fixed dimension. Furthermore, an algorithm which aims to utilise a given directed tree partition of the system to compute a minimal witnessing subsystem is described. It enumerates partial subsystems for the blocks of the partition along the tree order, and keeps only necessary ones. A preliminary experimental analysis shows that it outperforms other approaches on certain benchmarks which have directed tree partitions of small width.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Witnessing subsystems for probabilistic systems with low tree width
Authors:
Simon Jantsch,
Jakob Piribauer,
Christel Baier
Abstract:
A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic sys…
▽ More
A standard way of justifying that a certain probabilistic property holds in a system is to provide a witnessing subsystem (also called critical subsystem) for the property. Computing minimal witnessing subsystems is NP-hard already for acyclic Markov chains, but can be done in polynomial time for Markov chains whose underlying graph is a tree. This paper considers the problem for probabilistic systems that are similar to trees or paths. It introduces the parameters directed tree-partition width (dtpw) and directed path-partition width (dppw) and shows that computing minimal witnesses remains NP-hard for Markov chains with bounded dppw (and hence also for Markov chains with bounded dtpw). By observing that graphs of bounded dtpw have bounded width with respect to all known tree similarity measures for directed graphs, the hardness result carries over to these other tree similarity measures. Technically, the reduction proceeds via the conceptually simpler matrix-pair chain problem, which is introduced and shown to be NP-complete for nonnegative matrices of fixed dimension. Furthermore, an algorithm which aims to utilise a given directed tree partition of the system to compute a minimal witnessing subsystem is described. It enumerates partial subsystems for the blocks of the partition along the tree order, and keeps only necessary ones. A preliminary experimental analysis shows that it outperforms other approaches on certain benchmarks which have directed tree partitions of small width.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
From Verification to Causality-based Explications
Authors:
Christel Baier,
Clemens Dubslaff,
Florian Funke,
Simon Jantsch,
Rupak Majumdar,
Jakob Piribauer,
Robin Ziemek
Abstract:
In view of the growing complexity of modern software architectures, formal models are increasingly used to understand why a system works the way it does, opposed to simply verifying that it behaves as intended. This paper surveys approaches to formally explicate the observable behavior of reactive systems. We describe how Halpern and Pearl's notion of actual causation inspired verification-oriente…
▽ More
In view of the growing complexity of modern software architectures, formal models are increasingly used to understand why a system works the way it does, opposed to simply verifying that it behaves as intended. This paper surveys approaches to formally explicate the observable behavior of reactive systems. We describe how Halpern and Pearl's notion of actual causation inspired verification-oriented studies of cause-effect relationships in the evolution of a system. A second focus lies on applications of the Shapley value to responsibility ascriptions, aimed to measure the influence of an event on an observable effect. Finally, formal approaches to probabilistic causation are collected and connected, and their relevance to the understanding of probabilistic systems is discussed.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Probabilistic causes in Markov chains
Authors:
Christel Baier,
Florian Funke,
Simon Jantsch,
Jakob Piribauer,
Robin Ziemek
Abstract:
The paper studies a probabilistic notion of causes in Markov chains that relies on the counterfactuality principle and the probability-raising property. This notion is motivated by the use of causes for monitoring purposes where the aim is to detect faulty or undesired behaviours before they actually occur. A cause is a set of finite executions of the system after which the probability of the effe…
▽ More
The paper studies a probabilistic notion of causes in Markov chains that relies on the counterfactuality principle and the probability-raising property. This notion is motivated by the use of causes for monitoring purposes where the aim is to detect faulty or undesired behaviours before they actually occur. A cause is a set of finite executions of the system after which the probability of the effect exceeds a given threshold. We introduce multiple types of costs that capture the consumption of resources from different perspectives, and study the complexity of computing cost-minimal causes.
△ Less
Submitted 8 July, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
On Skolem-hardness and saturation points in Markov decision processes
Authors:
Jakob Piribauer,
Christel Baier
Abstract:
The Skolem problem and the related Positivity problem for linear recurrence sequences are outstanding number-theoretic problems whose decidability has been open for many decades. In this paper, the inherent mathematical difficulty of a series of optimization problems on Markov decision processes (MDPs) is shown by a reduction from the Positivity problem to the associated decision problems which es…
▽ More
The Skolem problem and the related Positivity problem for linear recurrence sequences are outstanding number-theoretic problems whose decidability has been open for many decades. In this paper, the inherent mathematical difficulty of a series of optimization problems on Markov decision processes (MDPs) is shown by a reduction from the Positivity problem to the associated decision problems which establishes that the problems are also at least as hard as the Skolem problem as an immediate consequence. The optimization problems under consideration are two non-classical variants of the stochastic shortest path problem (SSPP) in terms of expected partial or conditional accumulated weights, the optimization of the conditional value-at-risk for accumulated weights, and two problems addressing the long-run satisfaction of path properties, namely the optimization of long-run probabilities of regular co-safety properties and the model-checking problem of the logic frequency-LTL. To prove the Positivity- and hence Skolem-hardness for the latter two problems, a new auxiliary path measure, called weighted long-run frequency, is introduced and the Positivity-hardness of the corresponding decision problem is shown as an intermediate step. For the partial and conditional SSPP on MDPs with non-negative weights and for the optimization of long-run probabilities of constrained reachability properties (a U b), solutions are known that rely on the identification of a bound on the accumulated weight or the number of consecutive visits to certain sates, called a saturation point, from which on optimal schedulers behave memorylessly. In this paper, it is shown that also the optimization of the conditional value-at-risk for the classical SSPP and of weighted long-run frequencies on MDPs with non-negative weights can be solved in pseudo-polynomial time exploiting the existence of a saturation point.
△ Less
Submitted 23 April, 2020;
originally announced April 2020.
-
Partial and Conditional Expectations in Markov Decision Processes with Integer Weights
Authors:
Jakob Piribauer,
Christel Baier
Abstract:
The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights,…
▽ More
The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights, where the probability mass is redistributed to paths reaching the goal. Both variants constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), but have only been studied in structures with non-negative weights. Our main results are as follows. There are polynomial-time algorithms to check the finiteness of the supremum of the partial or conditional expectations in MDPs with arbitrary integer weights. If finite, then optimal weight-based deterministic schedulers exist. In contrast to the setting of non-negative weights, optimal schedulers can need infinite memory and their value can be irrational. However, the optimal value can be approximated up to an absolute error of $ε$ in time exponential in the size of the MDP and polynomial in $\log(1/ε)$.
△ Less
Submitted 30 April, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.