Expected Reachability-Time Games

Probabilistic timed automata are a suitable formalism to model systems with real-time, nondeterministic and probabilistic behaviour. We study two-player zero-sum games on such automata where the objective of the game is specified as the expected time to reach a target. The two players---called player Min and player Max---compete by proposing timed moves simultaneously and the move with a shorter delay is performed. The first player attempts to minimise the given objective while the second tries to maximise the objective. We observe that these games are not determined, and study decision problems related to computing the upper and lower values, showing that the problems are decidable and lie in the complexity class NEXPTIME $\cap$ co-NEXPTIME.


Introduction
Two-player zero-sum games on finite automata, as a mechanism for supervisory controller synthesis of discrete event systems, were introduced by Ramadge and Wonham [1]. In this setting the two players-called Min and Max-represent the controller and the environment, and controller synthesis corresponds to finding a winning (or optimal) strategy of the controller for some given performance objective. Timed automata [2] extend finite automata by providing a mechanism to model real-time behaviour, while priced timed automata are timed automata with (time-dependent) prices attached to the locations of the automata. If the game structure or objectives are dependent on time or price, e.g. when the objective corresponds to completing a given set of tasks within some deadline or within some cost, then games on timed automata are a well-established approach for controller synthesis, see e.g. [3,4,5,6,7].
In this paper we extend the above approach to a setting that is quantitative in terms of both timed and probabilistic behaviour. Probabilistic behaviour is important in modelling, e.g., faulty or unreliable components, the random coin flips of distributed communication and security protocols, and performance characteristics. We consider an extension of probabilistic time automata (PTA) [8,9,10], a model for real-time systems exhibiting nondeterministic and probabilistic behaviour.
In our model, called probabilistic timed game arena (PTGA), a token is placed on a configuration of a PTA and a play of the game corresponds to both players proposing a timed move of the PTA, i.e. a time delay and action under their control (we assume each action of the PTA is under the control of exactly one of the players). Once the players have made their choices, the timed move with the shorter delay is performed and the token is moved according to the probabilistic transition function of the PTA. Intuitively, players Min and Max represent two different forms of non-determinism, called angelic and demonic. To prevent the introduction of a third form, we assume the move of Max (the environment) is taken if the delays are equal. The converse can be used without changing the presented results.
Players Min and Max choose their moves in order to minimise and maximise, respectively, the objective function. The upper value of a game is the minimum expected time that Min can ensure, while the lower value of a game is the maximum expected value that Max can ensure. A game is determined if the lower and upper values are equal, and in this case the optimal value of the game exists and equals the upper and lower values.
The objectives frequently studied include reachability, which asks for certain locations to be eventually reached, safety, which asks for a given target set to be avoided, or more complex properties, expressed using a formula of a linear temporal logic. The objective function is then an indicator function saying whether the property is satisfied on a play, and the expected value then corresponds to the probability of the property being true. In our paper we are interested in a more complex setting and study reachabilitytime time objectives, which express the expected time to reach a given target set. These objectives have many practical applications, e.g., in job-shop scheduling, where machines can be faulty or have variable execution time, and both routing and task graph scheduling problems. For real-life examples relevant to our setting, see e.g. [11,7]. The reachability-time objectives are a special case of weight or price objectives in which different numbers are assigned to locations, and the value of the objective function depends on the respective numbers and the time spent in the locations; in our setting, the numbers are fixed to be 1 and the objective function simply sums the times spent in for each location. Computing properties related to price functions often leads to undecidability, even in non-probabilistic setting [12,13]. Studying simpler properties is thus motivated by the desire to obtain decidable properties while still being able to study sufficiently complex class of properties.

Contribution
We demonstrate the decidability of the problem of whether the upper (lower, or the optimal when it exists) value of a game with reachability-time objectives is at most a given bound. Our proofs immediately yield a NEXPTIME ∩ co-NEXPTIME complexity bound. To our best knowledge, this is the first decidability result for stochastic games on timed automata in which the objective concerns a random variable that takes non-binary values.
Our approach is based on extending the boundary region graph construction for timed automata [14] to PTGAs and demonstrating that the reachability-time problem can be reduced to the same problem on the boundary region graph. In particular, our proof aims to show that the limit of the step-bounded value functions in the timed automata and boundary region graph also coincide.
Generic results exist that allow one to prove that step-bounded values converge to the step-unbounded value, but to the best of our knowledge none are readily applicable in our setting where the state space is uncountable and little is known a priori about the value functions. For example, Banach fixpoint theorem requires the value iteration function (that takes a n-step value function and returns the n + 1-step value function) to be a contraction on an underlying metric space, and it appears difficult to devise the metric space so that the contraction property is easily obtained. Another possible proof direction is Kleene fixpoint theorem, which requires Scott-continuity on the value functions, which again is a property that is difficult to establish in our setting. We are able to partly rely on the Knaster-Tarski fixpoint theorem which characterises the set of fixpoints, but it is not strong enough to prove the convergence itself, for reasons similar to the ones above. Several other theorems such as Brouwer fixpoint theorem or Kakutani fixpoint theorem are generally not suitable for proving properties that we require in turn-based stochastic games.
Hence, to prove that the limit of the step-bounded value functions is the desired value, we need to take a tailor-made approach. We first inductively show that, when the number of steps is bounded, then the value functions in timed automata and boundary region graph coincide and are non-expansive within a region. Here we make use of quasi-simple functions which generalise simple functions, previously used by Asarin and Maler in the study of games over non-probabilistic timed automata [3]. Then, using the non-expansiveness property, we show that the limit of the step-bounded value functions in the timed automata and boundary region graph also coincide. In this part we use Knaster-Tarski fixpoint theorem.
The definition of quasi-simple functions is a central component of our proof, as it is strong enough to enable us to utilise an approach used in proofs of fixpoint theorems, but on the other hand general enough to capture the values of reachability-time objectives. We believe that it can serve as a step from simple functions towards functions describing even more complex but still decidable objectives.

Related Work
Hoffman and Wong-Toi [15] were the first to define and solve the optimal controller synthesis problem for timed automata. For a detailed introduction to the topic of qualitative games on timed automata, see e.g. [16]. Asarin and Maler [3] initiated the study of quantitative games on timed automata by providing a symbolic algorithm to solve reachability-time objectives. The works of [17] and [14] show that the decision problem for such games over timed automata with at least two clocks is EXPTIMEcomplete. The tool UPPAAL Tiga [6] is capable of solving reachability and safety objectives for games on timed automata. Jurdziński and Trivedi [18] show the EXPTIMEcompleteness for average-time games on automata with two or more clocks.
A natural extension of games with reachability-time objectives are games on priced timed automata where the objective concerns the cumulated price of reaching a target. Both [4] and [5] present semi-algorithms for computing the value of such games for linear prices. In [12] the problem of checking the existence of optimal strategies is shown to be undecidable, with [13] showing undecidability holds even for three clocks and stopwatch prices.
As for two-player quantitative games on PTAs, for a significantly different model of stochastic timed games, deciding whether a target is reachable within a given bound is undecidable [19]. In [20], continuous-time games are verified against time-automata objectives, giving rise to systems whose semantics is related to the ones of [19]. The work of [21] studies probability of satisfying Büchi objectives in a timed game where perturbations of probabilities can take place, and [22] studies games on interactive Markov chains which are modelled as a game extension of timed automata.
Regarding one-player games on PTAs, in [23] the problem of deciding whether a target can be reached within a given price and probability bound is shown to be undecidable for priced PTAs with three clocks and stopwatch prices. The work of [24] shows that the problem becomes decidable when the price functions are of a restricted form. In [25], simple functions are extended to devise a symbolic algorithm for computing minimum expected time to reach a target in one-player games on PTAs; the extension differs fundamentally from our quasi-simple functions. We also mention the approaches for analysing unpriced probabilistic timed automata against temporal logic specifications based on the region graph [8,9] and either forwards [8] or backwards [26] reachability. The complexity of performing such verification is studied in [27] for almost-sure reachability, and in [28] for PCTL properties and a restricted number of clocks. Finally, [29] deals with a model similar to PTAs in which time evolves continuously and controllable "fixed delay" events are introduced.
A preliminary version of the work was published in conference proceedings [30]. The result presented in [30] required an assumption on the structure of the PTAs that enforced a terminal state to be reached almost surely under any pair of strategies. In this paper we lift this restriction and consider arbitrary PTAs. Further, the proofs in [30] contain a significant flaw which required major changes to be made to the proof, also for the restricted case. Thus, although the high-level idea behind the proof (bounding the difference of values for two configurations whose clock values are close to each other) stays the same, the actual steps of the proof changed significantly. Note that, although [30] also introduces quasi-simple functions, the definition used here is different (and not equivalent). Most notably, our proofs here use a much more "constructive" approach when defining value functions.

Outline
The structure of the paper is the following. In Section 2 we introduce Stochastic Games Arenas, which serve as semantics for games on PTAs. Games on PTAs are then introduced in Section 3, and Section 4 defines boundary region abstraction, which plays a fundamental role in our proofs. Section 5 provides the proofs for the main result.

Stochastic Game Arena
We now introduce a general notion of stochastic game arenas (SGAs), which will later serve as semantics for the model we study. The reader may notice that our definition of a stochastic game arena differs from the standard concurrent stochastic game arena [31,32]. However, as we shall demonstrate later, it captures precisely the semantics of probabilistic timed game arena. In addition, presenting the basic concepts relating to values in the general setting of SGAs allows us to use these concepts in the context of both probabilistic timed game arenas and their abstractions.

Stochastic Game Arena: Syntax and Semantics
We write N for the set of non-negative integers, Q for the rational numbers, R 0 for the non-negative reals, and R ∞ 0 for the reals with the maximum element ∞. A function f : is at most countable. Let D(Q) denote the set of all discrete distributions over Q. We say a distribution d ∈ D(Q) is a point distribution if d(q)=1 for some q ∈ Q. Given a set Q and two functions f, f ′ : Note that SGAs introduced above are more general than classical stochastic games, in particular SGAs contain information about the time delays of actions. We say that an SGA is finite if S, A Min and A Max are finite. For any state s ∈ S, we let A Min (s) denote the set of actions available to player Min in s, i.e., the actions α ∈ A Min for which p Min (s, α) is defined, letting A Min (s)={⊥} if no such action exists. Similarly, A Max (s) denotes the actions available to player Max in s and we let A(s)=A Min (s)×A Max (s). From the conditions required of the probabilistic transition functions of the players, we have (⊥, ⊥) ∈ A(s) for all s ∈ S.
A game on SGA G starts with a token in an initial state s ∈ S and players Min and Max construct an infinite play by repeatedly choosing enabled actions, and then moving the token to a successor state determined by the probabilistic transition function of the player proposing the action that is favoured by the win function. Formally, we introduce the following auxiliary definition for an SGA.

Reachability-time objective in Stochastic Game Arena
We now define the reachability-time objective for plays of SGAs.

Definition 4.
For an SGA G and target set of states F of G, the (finite-horizon) n-step reachability-time objective associated with an infinite play ρ= s 0 , (α 1 , β 1 ), s 1 , . . . is given by: for some j<n ∈ N and k=n otherwise. Furthermore, the (infinite-horizon) reachability-time objective (with target set F ⊆ S) associated with an infinite play ρ is given by: In the definition of the infinite horizon objective the limit always exists, but it can be infinite. To simplify notation, we often omit the target set F when it is clear from the context. In our games on an SGAs players Min and Max move a token along the edges in order to minimise and maximise, respectively, the (n-step) reachability-time objective function. Formally, for an SGA G and an objective Reach n we define lower and upper value with respect to Reach n for G in state s ∈ S by respectively. Similarly, for an objective Reach we define the lower and upper values: In the cases when the lower and upper values coincide, we denote this value simply as Val n G (s) or Val G (s) and say that the corresponding game is determined. We omit G if it is clear from the context, e.g. we write simply Val instead of Val G .
For µ ∈ Σ Min , χ ∈ Σ Max and s ∈ S, let If G is determined, then each player has an ε-optimal strategy for all ε>0.
Since we will consider two-player games on SGAs that are not determined, we are interested in the following problem with respect to the upper value of a game.

Definition 5.
Given an SGA G, initial state s ∈ S, reachability-time objective and value B ∈ Q, the corresponding game reachability-time problem is to decide whether Val(s) B.
All results presented in the paper are still valid when replacing the upper value with the lower value. The following is a well-known result.

Optimality Equations for SGAs
We now introduce optimality equations for reachability objectives over SGAs. For the remainder of this section we fix an SGA G=(S, A Min , A Max , p Min , p Max , τ Min , τ Max ) and a target set F ⊆ S. Definition 7. The Bellman-style equations for n-step reachability time objective are given as follows: Val n (s)=0 whenever n=0 or s ∈ F , and for n 0 and s ∈ F : The correctness of these equations can be easily obtained from the fact that for any n 0, s ∈ F , path ρ with last (ρ)=s and strategies µ and χ, where α=win(µ(ρ), χ(ρ)), by definition of E µ,χ ρ we have: (by properties of µ ρ , χ ρ and definition of expectation) Let us now turn to the equations for infinite-horizon objectives.
and is a solution of the optimality equations Opt G , written P |= Opt G , if for any s ∈ S: To simplify the presentation, from now we will only concentrate on upper value Val.
Analogous results for the lower value follow in a straightforward manner. Our aim is to utilise the optimality equations for Opt G and prove that Val and lim n→∞ Val n are equal, as an initial step towards computing or approximating Val.
Although this equivalence can seem obvious, it is not at all trivial and, due to the uncountable nature of SGAs, it is not possible to use results such as Kleene fixpoint theorem out of the box. In fact, in this paper we will only prove the equivalence for a special case of SGAs (sufficient for our purpose). Nevertheless, the following two lemmas can be established for SGAs in general.

Lemma 9. For any solution
PROOF. Consider any ε>0 and let µ be a strategy for player Min that, for any finite play ρ, selects an ε·2 −(len(ρ)+1) optimal action. For an initial state s ∈ S and a finite play ρ such that last (ρ)=s, it follows that: We will now show that for any path ρ, counter-strategy χ for Max and n ∈ N we have: We prove (2) by induction on n ∈ N. The case for n=0 follows from Definition 4 and Definition 8. Now suppose (2) holds for some n ∈ N. Consider any finite path ρ where last (ρ) = s and counter-strategy χ for Max. Now, if s ∈ F , then by Definition 4 we have: On the other hand, if s ∈ F and letting a=win(µ(ρ), χ(ρ)), then by Definition 4 and Definition 8: Since these are all the cases to consider, it follows that (2) holds by induction on n.
Letting ρ = s and taking the limit of n in (2), we have E µ,χ s (Reach F ) V (s) + ε and, since ε and χ were arbitrary, it follows that Val(s) V (s) as required.
PROOF. The proof follows straightforwardly from the fact that for any n ∈ N and finite play ρ we have that Reach F (ρ) Reach n F (ρ).

Probabilistic Timed Game Arenas
In this section we introduce Probabilistic Timed Game Arenas (PTGAs) which extend classical timed automata [2] with discrete distributions and a partition of the actions between two players Min and Max. However, before we present syntax and semantics of PTGAs, we need to introduce the concept of clock variables and related notions.

Clocks, Constraints, Regions, and Zones
Clocks. Let C be a finite set of clocks. A clock valuation on C is a function ν : C→R 0 and we write V (C) (or just V when C is clear from the context) for the set of clock valuations. Abusing notation, we also treat a valuation ν as a point in (R 0 ) |C| . Let 0 denote the clock valuation that assigns 0 to all clocks. If ν ∈ V and t ∈ R 0 then we write ν+t for the clock valuation defined by (ν+t)(c) we write X for the smallest closed set in V containing X. Although clocks are usually allowed to take arbitrary non-negative values, for notational convenience we assume that there is an upper bound K ∈ N such that for every clock c ∈ C we have that ν(c) K.
For ν ∈ V (C) and K ∈ N, let SCC(ν, K) be the set of clock constraints with upper bound K which hold in ν, i.e. those constraints that resolve to true after substituting each occurrence of a clock x with ν(x).
Clock regions. Every clock region is an equivalence class of the indistinguishabilityby-clock-constraints relation, and vice versa. For a given set of clocks C and upper bound K ∈ N on clock constraints, a clock region is a maximal set ζ⊆V (C) such that SCC(ν, K)=SCC(ν ′ , K) for all ν, ν ′ ∈ ζ. For the set of clocks C and upper bound K we write R(C, K) for the corresponding finite set of clock regions. We write [ν] for the clock region of ν. If ζ=[ν], write ζ C for [ν C ]; this definition is well-defined, since for any clock valuations ν and Clock zones. A clock zone is a convex set of clock valuations, which is a union of a set of clock regions. We write Z(C, K) for the set of clock zones over the set of clocks C and upper bound K. Observe that a set of clock valuations is a clock zone if and only if it is definable by a clock constraint. Although more than one clock constraint can represent the same zone, for any clock zone ζ, there exists an O(|C| 3 ) algorithm to compute the (unique) canonical clock constraint of ζ [38]. We therefore interchange the semantic and syntactic interpretation of clock zones.
When the set of clocks and upper bound is clear from the context we write R and Z for the set of regions and zones respectively.

Probabilistic Timed Game Arena: Syntax
For the remainder of the paper we fix a positive integer K, and work with K-bounded clocks and clock constraints. • C is a finite set of clocks; • Inv : L→Z is an invariant condition; • Act Min and Act Max are disjoint finite sets of actions, and we use Act for the set Act Min ∪ Act Max • E : L×Act→Z is an action enabling condition; • δ : L×Act→D(2 C ×L) is a probabilistic transition function.
When we consider a PTGA as an input of an algorithm, its size is understood as the sum of the sizes of encodings of L, C, Inv , Act, E, and δ. As usual [28], we assume that probabilities are expressed as ratios of two natural numbers, each written in binary, and zones in the definition of Inv and E are expressed as clock constraints. A standard probabilistic timed automaton (PTA) is a PTGA where one of Act Min and Act Max is empty. On the other hand, the standard (non-probabilistic) timed game arena (timed automaton) is a PTGA (PTA) such that δ(ℓ, a) is a point distribution for all ℓ ∈ L and a ∈ Act.

Probabilistic Timed Game Arena: Semantics
Let T=(L, C, Inv, Act Min , Act Max , E, δ) be a probabilistic timed game arena. A configuration of a PTGA is a pair (ℓ, ν), where ℓ is a location and ν a clock valuation such that ν ∈ Inv (ℓ). For any t ∈ R 0 , we let (ℓ, ν)+t equal the configuration (ℓ, ν+t). In a configuration (ℓ, ν), a timed action (time-action pair) (t, a) is available if and only if the invariant condition Inv (ℓ) is continuously satisfied while t time units elapse, and a is enabled (i.e. the enabling condition E(ℓ, a) is satisfied) after t time units have elapsed. Furthermore, if the timed action (t, a) is performed, then the next configuration is determined by the probabilistic transition relation δ, i.e. with probability δ[ℓ, a](C, ℓ ′ ) the clocks in C are reset and we move to the location ℓ ′ .
A game on a PTGA starts in an initial configuration (ℓ, ν) ∈ L×V and Min and Max construct an infinite play by repeatedly choosing available timed actions (t a , a) ∈ R 0 ×Act Min and (t b , b) ∈ R 0 ×Act Max proposing ⊥ if no timed action is available. The player responsible for the move is Min if the time delay of Min's choice is less than that of Max's choice or Max chooses ⊥, and otherwise Max is responsible. We assume the players cannot simultaneously choose ⊥, i.e. that in any configuration there is at least one timed action available. • for ⋆ ∈ {Min, Max}, (ℓ, ν) ∈ S and (t, a) ∈ A ⋆ the probabilistic transition function p ⋆ is defined when ν+t ′ ∈ Inv (ℓ) for all 0 t ′ t, ν+t ∈ E(ℓ, a) and for any (ℓ ′ , ν ′ ): If one of the arguments to win is ⊥, we define the returning value to be the other argument.
• the time delay function is given by τ ⋆ (s, (t, a)) = t for all ⋆ ∈ {Min, Max}, s ∈ S and (t, a) ∈ A ⋆ such that p ⋆ (s, (t, a)) is defined.
The sum in the definitions of p Min and p Max is used to capture the fact that resetting different subsets of C may result in the same clock valuation (e.g. if all clocks are initially zero, then we end up with the same valuation, no matter which clocks we reset). Also, notice that the time delay function of the SGA corresponds to the elapsed time of each move.
Time Divergence. When modelling real-time systems it is important to restrict attention to time divergent (or non-Zeno) behaviour. More precisely, one should not consider strategies which lead to behaviour in which time does not advance beyond a certain point, as this cannot occur in a real system. We achieve this by restricting attention to structurally non-Zeno PGTAs, these are PGTA where all strategies will yield timedivergent behaviour by construction. We use the syntactic conditions given in [39] for PTAs and are derived from those for timed automata [40,41].
Example 13. Consider the PTGA in Figure 1; we use solid and dashed lines to indicate actions controlled by Min and Max respectively. Considering location ℓ 1 , the invariant condition is (0<y 2)∧(x 2), actions a and c are enabled when y>1 and, x 1 x:=0 x 1

Reachability-time problem over PTGA
We are interested in the reachability-time problem for games over the semantics of a PTGA T. We assume that the target set is given as a set L F of locations (the corresponding target of the SGA [[T]], with state space S, is given by F ={(ℓ, ν) ∈ S | ℓ ∈ L F }). However, the results presented can be easily generalised to target sets of location-zone pairs.

Non-determinacy of PTGA with reachability-time objectives
Before proceeding with the definitions that we need to prove the main decidability result of the paper, we show, through the following counter-example, that PTGAs are not determined, even when the game contains only non-strict inequalities.

Example 14.
Considering the PTGA given in Figure 2 with target set L F ={ℓ 4 }, recall that we use solid and dashed lines to indicate actions controlled by Min and Max respectively. Constructing the optimality equations Opt G for the SGA semantics of this PTGA, we have, after some simplifications: and P (ℓ 0 , 0) is equal to the minimum of: and inf 0<t 1 max sup The expression (3) is equal to 1 and corresponds to player Min leaving ℓ 0 immediately (when the clock x equals 0). The expression (4) corresponds to the infimum over leaving ℓ 0 after a non-zero delay (when the clock x is greater than 0) and is also equal to 1. Combining these results we have that P (ℓ 0 , 0)=1.
On the other hand, considering the optimality equations Opt G , the values for the locations ℓ 1 , . . . , ℓ 4 are as above, while the value for P (ℓ 0 , 0) equals the maximum of: The first expression in (5) equals 0 and corresponds to player Max leaving ℓ 0 immediately. The second expression in (5) corresponds to the supremum over leaving ℓ 0 after a non-zero delay, and is also equal to 0, and therefore it follows that P (ℓ 0 , 0)=0. Hence the game is not determined as the upper and lower values of the game differ in the state (ℓ 0 , 0).

Boundary region abstraction
The region graph [2] is useful for solving time-abstract optimisation problems on timed automata. The region graph, however, is not suitable for solving competitive optimisation problems and games on timed automata as it abstracts away the timing information.
The corner-point abstraction [42], which captures digital clock semantics [43] of timed automata, is an abstraction of timed automata where the configurations of the system are restricted to L×N |C| , i.e. transitions are allowed only when all clocks have nonnegative integer values. Although this abstraction retains some timing information, it is not convenient for proof techniques based on dynamic programming, used in this paper. The boundary region abstraction (BRA) [14], a generalisation of the corner-point abstraction, is better suited for such proof techniques. More precisely, we need to prove certain properties of values in a PTGA, which we can do only when reasoning about all the states of the PTGA. In the corner-point abstraction we cannot do this since it represents only states corresponding to corner points of regions. Here, we generalise the BRA of [14] to handle PTGAs. First, we require a number of preliminary concepts.

Timed Successor Regions.
Recall that R is the set of clock regions. For ζ, ζ ′ ∈ R, we say that ζ ′ is in the future of ζ, denoted ζ → * ζ ′ , if there exist ν ∈ ζ, ν ′ ∈ ζ ′ and t ∈ R 0 such that ν ′ = ν+t and say ζ ′ is the time successor of ζ if ζ = ζ ′ and ν+t ′ ∈ ζ ∪ ζ ′ for all t ′ t and write ζ → ζ ′ to denote this fact. We also use Intuition for the Boundary Region Abstraction. In our definition of the boundary region abstraction (BRA) we capture the intuition that, when studying the "optimal" behaviour of the players, it is sufficient to consider moves that take place near the start or end of the regions. This allows us to abstract from moves that specify the precise time, but instead allow the players to say which regions they wish to enter, and then either say that they want to take the move at the start of the region (inf), or at its end (sup). Based on this intuition we define the boundary region abstraction of a probabilistic game arena as follows. • for ⋆ ∈ {Min, Max},ŝ = (ℓ, ν, ζ) ∈ S and α = (a, ζ ′′ , opt) ∈ A ⋆ such that ζ → * ζ ′′ , the probabilistic transition function p ⋆ is defined if [ζ, ζ ′′ ] ⊆ Inv (ℓ) and ζ ′′ ⊆ E(ℓ, a) and for any (ℓ ′ , ν ′ , ζ ′ ) ∈ S:
Given a target set of locations L F of T, the corresponding target set of the BRA is given by F ={(ℓ, ν, ζ) ∈ S | ℓ ∈ L F }.
To simplify notation, for two elements a ∈ A Min and b ∈ A Max we write a b to denote that win(a, b)=a. We use analogous notation also for other SGAs. For an element s=(ℓ, ν) ∈ L×V , we use s to denote the element (ℓ, ν, [ν]) ∈ S. Although the boundary region abstraction is not a finite SGA, for a fixed initial state we can restrict attention to a finite SGA, adapting an approach from [44] as follows.

Proposition 16. Let T be a PTGA and T the corresponding BRA. For any state of T, its reachable sub-graph is finite and constructible in time exponential in the size of T.
PROOF. The most demanding part of the proof is to show that there is a set V of valuations that has exponential size and contains ν for any state (ℓ, ν, ζ) reachable in the sub-graph of T.
For r ∈ R 0 we write r for the fractional part of r, i.e. r−⌊r⌋. For a clock valuation ν we define its fractional signature ν to be the sequence . . , f m ) because for any i we have: This means that, by successive application of shifts, only m different fractional signatures can be obtained. We further say that a fractional signature (f ′ 0 , f ′ 1 , . . . , f ′ n ) is a subsequence of another fractional signature (f 0 , f 1 , . . . , f m ) if n m and for all i n there exists j m such that f ′ i =f j . For any state (ℓ, ν, ζ) of the BRA T, we claim that it is only possible to transition to states (ℓ ′ , ν ′ , ζ ′ ) such that ν ′ is a subsequence of a k-shift of ν , for some k. To see that, notice that the ν α in the definition of p ⋆ (Definition 15) satisfies that ν α is a k-shift of ν = (f 0 , . . . f m ) for k chosen so that f m is the fractional part of clocks that have integer value in ν α . Subsequently resetting clocks gives rise to a subsequence of a fractional signature, and so ν ′ (for ν ′ from the defining sum of p ⋆ ) is a subsequence of ν α . Figure 1), a sub-graph of BRA reachable from (ℓ 0 , (0.3, 0.1), 0<y<x<1) for the PTGA of Figure 1 is shown in Figure 3. The names of the regions correspond to the regions depicted in the bottom right corner. Edges are labelled (a, ζ, opt) and the intuitive meaning is to wait until we reach the lower or upper (depending on opt) boundary of the region. For some regions, for example ζ 4 , the boundaries coincide and we keep this redundancy to simplify the notation. Considering the region ζ 1 , we see that it is determined by the constraints (1<x<2)∧(0<y<1)∧(y<x−1). The bold numbers on edges correspond to the time delay before the action labelling the edge is taken. Figure 3 includes the actions available in the initial state and one of the action pairs that are available in the state (ℓ 1 , (0, 1), (x=0)∧(1<y<2)). To simplify the figure, the probabilities that are equal to 0.5 are omitted.

Decidability of the Reachability-Time Problem
In this section we show decidability of the reachability-time problem, which is the main result of the paper. The result is formalised in the following theorem.

Theorem 18. Let T be a PTGA. The reachability-time problem for infinite-horizon objectives in T is in NEXPTIME∩co-NEXPTIME.
The crucial, and most demanding, step of the proof of Theorem 18 is proving that the problems on PTGAs can be reduced to problems on BRAs. This fact is formalised in Theorem 19. Theorem 18 then follows straightforwardly from Theorem 19, Proposition 16 and Theorem 6.

Theorem 19. Let T be a PTGA and T the corresponding BRA. The answers to the reachability-time problems for T and T are the same.
The remainder of this paper is devoted to the proof of Theorem 19. First, in Section 5.1 we introduce quasi-simple functions and prove some of their properties. Then, in Section 5.2 we show that values in the games we study can be characterised using quasi-simple functions, and that this allows us to establish the correspondence between PTGA and its boundary region abstraction.
For the remainder of this section, unless otherwise specified, we fix a PTGA T = (L, C, Inv, Act Min , Act Max , E, δ), set of target locations F L , suppose the semantics of T is given by: with corresponding target set F ={(ℓ, ν) ∈ S | ℓ ∈ F L } and the boundary region abstraction of T is given by with corresponding target set F ={(ℓ, ν, ζ) ∈ S | ℓ ∈ F L }.

Quasi-simple Functions
To prove properties of controllers for (non-probabilistic) timed systems, Asarin and Maler [3] introduced simple functions, a finitely representable class of functions with the property that every decreasing sequence is finite. We define these functions here and show that they are not sufficient for our purpose.

Definition 20 (Simple Functions). Given a set of valuations X⊆V , a function f :
X→R 0 is simple if there exists e ∈ N and either f (ν)=e for all ν ∈ X, or there exists a clock c ∈ C such that f (ν)=e−ν(c) for all ν ∈ X. Furthermore, a function f : S→R 0 is regionally simple if f (ℓ, ·, ζ) is simple for all ℓ ∈ L and ζ ∈ R.
For timed games, Asarin and Maler showed that upper values for n-step reachabilitytime objectives are regionally simple, and because the fixpoint is reached for some n the upper value for reachability-time objective is regionally simple. Also, using the properties of simple functions, [14] shows that, for a non-probabilistic game reachability-time objectives, the optimal strategies are regionally positional, i.e., in every state of a region the strategy chooses the same action. Unfortunately, in the case of PTGAs, applying the value improvement function does not necessarily preserve regional-simplicity. Moreover, as the example below demonstrates, neither is the value of the game necessarily regionally simple nor optimal strategies regionally positional.

Example 21.
Consider the one-player PTGA shown in Figure 4. Observe that, for every state (ℓ 0 , ν) in the region (ℓ 0 , 0<x<1), the optimal expected time to reach ℓ 2 equals Hence, the values in PTGA with reachability-time objectives may not be regionally simple. Moreover, the optimal strategy is not regionally positional, since if ν(x) 0.5 then the optimal strategy is to take action a immediately, while otherwise the optimal strategy is to wait until ν(x)=1 and then take action b.
Due to these results it is not possible to work with simple functions. Our proof instead relies on regional non-expansiveness of value functions. Given X ⊆ V , a function f : is non-expansive for any ℓ and ζ, and similarly any f : S→R ∞ 0 is regionally non-expansive if f (ℓ, ·) is non-expansive when its domain is restricted to a single region.
The proof direction that we take requires us to establish that lim n→∞ Val n is nonexpansive. To do this, we will show that for each n ∈ N the function Val n is nonexpansive. However, a direct proof by induction would fail and instead we are required to prove a stronger claim about the functions Val n . To do this, we first introduce quasisimple functions.

Definition 22 (Quasi-Simple Functions). Let X ⊆ V be a set of clock valuations. The class of quasi-simple functions is built by first defining every simple function to be quasi-simple, and then inductively by stipulating that convex combination, maximum and minimum of finitely many quasi-simple functions are quasi-simple.
A function f : S→R ∞ 0 is regionally quasi-simple if f (ℓ, ·, ζ) is quasi-simple for all ℓ ∈ L and ζ ∈ R, and any f : S→R ∞ 0 is regionally quasi-simple if f (ℓ, ·) is quasisimple when its domain is restricted to a single region.
We will later show that functions Val : S→R 0 for n ∈ N are regionally quasi-simple. From this using the lemma below we can then demonstrate that these functions are non-expansive.
• If f is a simple function, then either f is a constant, and hence: for some clock c, in which case: • If f is a convex combination p 1 , . . . , p n of quasi-simple functions f 1 , . . . , f n , then: (since we are considering a convex combination) as required.
• If f is the maximum of two quasi-simple functions f 1 and f 2 , then without loss of generality we suppose f 1 (ν 1 ) f 2 (ν 1 ). In the case when f 1 (ν 2 ) f 2 (ν 2 ) we have On the other hand, in the case when f 1 (ν 2 ) < f 2 (ν 2 ): , and therefore we have: Since these are all the cases to consider we have f is non-expansive as required.
• If f is minimum of two quasi-simple functions the proof follows similarly to the case when f is the maximum of two quasi-simple functions.
Since these are the only cases to consider the proof is complete.
In the proofs below we will make use of several technical properties of quasi-simple functions. First, however, we require an alternative representation of quasi-simple functions in terms of parse trees. Let Υ be the set of all parse trees whose leaves are simple functions and whose nodes are the operations: min, max and convex combination. Clearly, every tree ∆ ∈ Υ corresponds to a unique quasi-simple function which we will call qs(∆). Conversely, every quasi-simple function corresponds to infinitely many trees from Υ. The definition below gives us a unique representative.

Definition 24.
Let the rank of a quasi-simple function f : X→R ∞ 0 , denoted rank (f ), be the smallest k such that there is a tree ∆ ∈ Υ of height k such that qs(∆) = f . For any quasi-simple function f : X→R ∞ 0 we define a unique representative parse tree ∆ f by induction on the rank of f .
• If rank (f ) = 0, then let ∆ f to be any tree with height 0 such that qs(∆ f ) = f .
• If rank (f ) = k+1 for some k ∈ N, there must be an operation op (either min, max or convex combination) and integer n such that f is obtained by taking the op of the quasi-simple functions f 1 , . . . , f n , each of which has rank at most k. Therefore, by induction we have representatives ∆ f1 , . . . , ∆ fn for f 1 , . . . , f n . Let ∆ f be the tree with root op and subtrees ∆ f1 , . . . , ∆ fn . Clearly, by construction we have qs(∆ f ) = f .
The first technical property of quasi-simple functions will allow us to establish that when we take a delay so that a boundary of a region is reached, quasi-simplicity is preserved.
Lemma 25. Let f : X→R ∞ 0 be a quasi-simple function, c a clock and i an integer such that ν(c) i for all ν ∈ X. If f elapse : X→R ∞ 0 is the function where for any ν ∈ X we have f elapse (ν) = t ν +f (ν+t ν ) and t ν = ν(c)−i, then f elapse is quasisimple.
PROOF. Consider any quasi-simple function f : X→R ∞ 0 and let ∆ f be its representative parse tree constructed using Definition 24. Let ∆ mod f be the modified parse tree where any leaf labeled with a constant simple function e is replaced by the non-constant simple function e ′ − ν(c), where e ′ = e+i.
We will prove that f elapse = qs(∆ mod f ), which demonstrates that f elapse is quasisimple as required. The proof is by induction on the rank of f . If rank (f ) = 0, then there are two cases to consider.
• If ∆ f is a leaf labelled with a constant simple function which for any ν ∈ X returns e for some e ∈ N, then for any ν ∈ X: which equals qs(∆ mod g )(ν) as required.
• If ∆ f is a leaf labelled with a simple function which for any ν ∈ X returns e−ν(c ′ ) for some e ∈ N and clock c ′ , then we have for any ν ∈ X: which again equals qs(∆ mod g )(ν) as required.
For the inductive step, suppose rank (f ) = k+1 for some k ∈ N and for any quasisimple function of rank less than or equal to k the result holds. Since rank (f ) = k+1 there must be an operation op (either min, max or convex combination) and integer n such that f is obtained by taking the op of some quasi-simple functions f 1 , . . . , f n , each of which has rank at most k. Now, for any ν ∈ X: . . , f n (ν+t ν )) (by definition of f ) = op(t ν +f 1 (ν+t ν ), . . . , t ν +f n (ν+t ν )) (rearranging) = op(f elapse The next lemma states that resetting clocks preserves quasi-simplicity.
PROOF. For a quasi-simple function f , let ∆ mod f be the modified parse tree of ∆ f where a leaf labelled with a non-constant simple function which for any ν ∈ ζ C returns e−ν(c) for some integer e and clock c ∈ C is replaced with a leaf labelled by the constant function e. The proof follows by showing f reset = qs(∆ mod f ) for all quasisimple functions f . This proof is by induction on the rank of f .
If rank (f ) = 0, then ∆ f is a leaf and there are three cases to consider.
• If ∆ f is a leaf labelled with a constant simple function which for any ν ∈ ζ C returns e for some e ∈ N, then for any ν ∈ ζ by construction: • If ∆ f is a leaf labelled with a simple function which for any ν ∈ ζ C returns e−ν(c ′ ) for some e ∈ N and clock c ′ ∈ C, then for any ν ∈ ζ: • If ∆ f is a leaf labelled with a simple function which for any ν ∈ ζ C returns e−ν(c) for some e ∈ N and clock c ∈ C, then we have for any ν ∈ ζ: For the inductive step, suppose rank (f ) = k+1 for some k ∈ N and for any quasisimple function of rank less than or equal to k the result holds. Since rank (f ) = k+1 there must be an operation op (either min, max and convex combination) and integer n such that f is obtained by taking the op of some quasi-simple functions f 1 , . . . , f n , each of which has rank at most k. Therefore for any ν ∈ ζ by construction: which completes the proof.
The following technical lemma will allow us to establish that, assuming quasi-simplicity in successor states, the players' optimal behaviour is to pick delays so that boundaries of regions are reached.
Lemma 27. Let f : X→R ∞ 0 be a quasi-simple function. For any x ∈ X and t ∈ R 0 such that x+t ∈ X: PROOF. Consider any quasi-simple function f : X→R ∞ 0 and clock x. It suffices to show that the function t → t+f (x+t) is increasing. Now for any t 1 , t 2 ∈ R 0 such that t 1 t 2 and x+t 1 , x+t 2 ∈ X, we have: where the inequality follows since the term (t 2 −t 1 )+(f (x+t 2 )−f (x+t 1 )) is nonnegative by the non-expansiveness of f (see Lemma 23).

Establishing correspondence of PTGA and boundary region abstraction
Having introduced quasi-simple functions and their properties, we will now show how they relate to PTGAs and how they can be utilised to finish the proof of Theorem 19. The proof is notationally heavy, and to alleviate some of the technical notation we first introduce a number of functions (and properties of these functions) that will allow us to abbreviate some of the notation. Intuitively, these functions are counterparts to Val functions that in addition to an initial state also take the first action to be taken.
Furthermore, for ν ∈ Inv(ℓ) and (t, a) ∈ A Min ∪ A Max such that ν+t ∈ζ let: Intuitively, Val n+1 T ((ℓ, ν, ζ), (a, ζ ′ , opt)) corresponds to the optimal value in (ℓ, ν, ζ) when the length of the horizon is n+1 and the first action performed is fixed to be (a, ζ ′ , opt). Similarly, Val Next we show that, within a region, the values in the BRA T are quasi-simple when we restrict to a finite horizon reachability objectives. To simplify the notation, we assume that in any state player Min can pick at least one action, and that, for each action a player Min can select, there exists an action b player Max can select that is preferred, i.e. b=win(a, b), and also an action b that is not preferred, i.e. a=win(a, b) (in addition, there can be actions b not satisfying any of the two conditions). We refer to this assumption as choice freedom.  , (a, ζ, sup)) for all a ∈ Act. The latter follows from Definition 15.
From now on, we will assume T is choice-free. Note that this is purely a notational advantage, which will allow us to use Lemma 29. The proofs we give can be easily We now proceed with the following lemma which states that the n-step value functions on a BRA are regionally quasi-simple.
Using induction, Lemma 26, Lemma 25 and the quasi-simplicity of a convex combination of quasi-simple functions, it follows that the function Val (ℓ, ·, ζ) : ζ→R 0 equals an expression which takes the maxima and minima of quasi-simple functions, and therefore by Definition 22 is also quasi-simple.
The following lemma demonstrates that, for finite-horizon reachability-time objective, the values in the BRA and PTGA coincide. : ζ→R 0 is regionally quasi-simple for any ℓ ∈ L and ζ ∈ R such that ζ ⊆ Inv (ℓ).
PROOF. Consider any s=(ℓ, ν) ∈ S. We proceed by induction on n ∈ N. If n=0, then by Definition 7 both Val Furthermore, letting R(ν) = {ζ ∈ R | ζ ⊆ Inv (ℓ) ∧ [ν] → * ζ} be the set of regions obtainable from ν by some delay and Act Min (ℓ, ζ) = {a ∈ Act Min | ζ ⊆ E(ℓ, a)} the set of actions of player Min available in location ℓ and region ζ, again by Definition 12 we have: Now, by Definition 28, letting t + ν,ζ ′ = sup{t ′ | ν+t ′ ∈ ζ ′ } we have: T is quasi-simple and by Lemma 27) = max  (8)) For any ζ ∈ R(ν) and (t, a) ∈ A Min (ℓ, ζ) the expression and both of these expressions decrease as t decreases. Moreover, letting t − ν,ζ = inf{t ′ | ν+t ′ ∈ ζ} and using Lemma 27, Definition 28, and induction, we have: Consequently, it follows that Val In the rest of this subsection we use the lemmas to prove properties of the infinitehorizon setting. PROOF. The proof follows from Lemma 31 and the fact that a limit of non-expansive functions is a non-expansive function. and from the Knaster-Tarski fixpoint theorem which implies that for all ordinals o 1 o 2 we have F o1 (0) F o2 (0) where 0 is the lowest element in the complete lattice of functions S→R ∞ 0 ordered with respect to . We complete the proof of (9) by showing the left hand side is greater than or equal to the right hand side. Consider any s ∈ S. If Γ(s) is infinite, then the result follows. On the other hand, if Γ(s) is finite, it is sufficient to show that for any ε>0: We begin by selecting a finite sequence t 1 , . . . , t m of positive reals such that for any possible delay t in s=(ℓ, ν) there exists t i (denoted nr (t)) with [ν+t] = [ν+nr(t)] and |t−nr(t)| ε/6. Note that such a sequence t 1 , . . . , t m can always be selected as the clock values are bounded. By construction we have for any t ∈ R 0 and C ⊆ C: |(ν+t)−(ν+nr (t))| ε/6 and |(ν+t) C −(ν+nr (t)) C | ε/6 (10) Now for any (t, a) ∈ A Min ∪ A Max we have: |t−nr(t)| + ε/6 (since δ[ℓ, a](C, ℓ ′ ) is a distribution) ε/6 + ε/6 (by construction of nr (t)) = ε/3 .
By similar arguments (using Lemma 23 and Lemma 31) we can show that for any n ∈ N: Val which completes the proof.
We are now a few steps away from concluding the proof of the main result of the paper.  [36,37]. This completes the proof of Theorem 19.

Conclusions
In this paper we introduced the reachability-time problem for PTGAs and showed that it is decidable and in NEXPTIME ∩ co-NEXPTIME. Our proof relies on an analysis of step-bounded value functions, showing that they are quasi-simple and non-expansive when infinite horizon is taken. This allows us to reduce the problem to the reachabilitytime problem on a finite abstraction. As opposed to the preliminary version of the work presented in [30], the reduction works for an unrestricted class of PTGAs.
Although the computational complexity of solving games on timed automata is high, UPPAAL Tiga [6] is able to solve practical reachability and safety properties for timed games by using efficient symbolic zone-based algorithms [7,11]. A natural future direction is to investigate the possibility of devising similar algorithms for probabilistic timed games.
On the theoretical level, we plan to study if our approach can be utilised for extensions of reachability-time objectives by considering an appropriate class of rewardbased properties.