Dyadic Existential Rules

Existential rules form an expressive Datalog-based language to specify ontological knowledge. The presence of existential quantification in rule-heads, however, makes the main reasoning tasks undecidable. To overcome this limitation, in the last two decades, a number of classes of existential rules guaranteeing the decidability of query answering have been proposed. Unfortunately, only some of these classes fully encompass Datalog and, often, this comes at the price of higher computational complexity. Moreover, expressive classes are typically unable to exploit tools developed for classes exhibiting lower expressiveness. To mitigate these shortcomings, this paper introduces a novel general syntactic condition that allows us to define, systematically and in a uniform way, from any decidable class $\mathcal{C}$ of existential rules, a new class called Dyadic-$\mathcal{C}$ enjoying the following properties: $(i)$ it is decidable; $(ii)$ it generalises Datalog; $(iii)$ it generalises $\mathcal{C}$; $(iv)$ it can effectively exploit any reasoner for query answering over $\mathcal{C}$; and $(v)$ its computational complexity does not exceed the highest between the one of $\mathcal{C}$ and the one of Datalog. Under consideration in Theory and Practice of Logic Programming (TPLP).


Introduction
In ontology-based query answering, a conjunctive query is typically evaluated over a logical theory consisting of a relational database paired with an ontology.Description Logics (Baader et al. 2003) and Existential Rules -a.k.a.tuple generating dependencies, or Datalog ± rules- (Baget et al. 2011) are the main languages used to specify ontologies.In particular, the latter are essentially classical datalog rules (Abiteboul et al. 1995) extended with existential quantified variables in rule-heads.The presence of existential quantification in the head of rules, however, makes query answering undecidable in the general case.To overcome this limitation, in the last two decades, a number of classes of existential rules-based on both semantic and syntactic conditions-that guarantee the decidability of query answering have been proposed.Concerning the semantic conditions, we recall finite expansions sets, finite treewidth sets, finite unification sets, and strongly parsimonious sets (Baget et al. 2009;Baget et al. 2011;Leone et al. 2019).Each of these classes encompasses a number of concrete classes based on syntactic conditions (Baget et al. 2011;Calì et al. 2013;Fagin et al. 2005;Krötzsch and Rudolph 2011;Ceri et al. 1989;Leone et al. 2019;Gottlob and Pieris 2015;Baldazzi et al. 2022;Calì et al. 2012b;Calì et al. 2012a;Gogacz and Marcinkowski 2017;Johnson and Klug) 1984.Table 1 summarises these classes and their computational complexity with respect to query answering, by distinguishing between combined complexity (the input consists of a database, an ontology, a conjunctive query, and a tuple of constants) and data complexity (only a database is given as input, whereas the remaining parameters are considered fixed).
Unfortunately, on the one side, despite the fact that existential rules generalise datalog rules, only some of these syntactic classes fully encompass Datalog and, in some cases, this even comes at the price of higher computational complexity of query answering.Moreover, on the other side, expressive classes typically need ad hoc reasoners without being able to exploit mature tools developed for classes exhibiting lower expressiveness.
With the aim of mitigating the two aforementioned shortcomings, this paper introduces a novel general syntactic condition that allows to define, systematically and in a uniform way, from any decidable class C of existential rules, a new class called Dyadic-C that enjoys the following properties: (i) it is decidable; (ii) it generalises Datalog;1 (iii) it generalises C; and (iv) it can effectively exploit any reasoner for query answering over C. In particular, let C d (resp., C c ) be the data (resp., combined) complexity of query answering over C, query answering over Dyadic-C is PTIME C d (resp., EXPTIME Cc provided that there is at least an exponential jump from C d to C c ).Since all the classes reported in Table 1 comply with the exponential jump assumption, we get the following: (a) whenever C d ⊇ PTIME (entries 1-8 of Table 1), then query answering over Dyadic-C is complete for C d (resp., C c ); (b) in all the remaining cases (entries 9-12 of Table 1), query answering over Dyadic-C is complete for PTIME (resp., EXPTIME), namely it has the same complexity of query answering over Datalog.
Concerning the key principle at the heart of this new general syntactic condition, basically, an ontology Σ belongs to Dyadic-C if one can easily construct a pair (Σ HG , Σ C ) of ontologies, called dyadic, such that: (i) Σ HG ∪ Σ C is equivalent to Σ with respect to query answering; (ii) Σ C ∈ C; and (iii) Σ HG is a set rules called head-ground with respect to Σ HG ∪Σ C (Gottlob and Pieris 2015).Intuitively, Σ HG satisfies the following properties: (1) it belongs to Datalog; (2) for each database D, the chase procedure (Deutsch et al. 2008) over D∪Σ HG ∪Σ C never generates atoms containing null-values via rules of Σ HG ; (3) head-predicates of Σ HG and body-predicates of Σ HG are disjoint; and (4) head-predicates of Σ HG and head-predicates of Σ C are disjoint.Finally, since Dyadic-C is well-defined even if C is a class of existential rules based on some semantic conditions and, if so, since query answering is still decidable over Dyadic-C, then -in analogy with the existing semantic classes-the union of all the Dyadic-C classes are called dyadic decomposable sets.
The article is a revised version of an earlier workshop paper (Gottlob et al. 2022).Specifically, the content that was previously presented in a single preliminary section has been expanded and reorganised into two longer separate sections, namely Sections 2 and 3.These sections now contain the necessary background information, ensuring that the paper is self-contained.Furthermore, the previous notion of "dyadic decomposition" has evolved into the novel notion of a "Dyadic Pair of TGDs", which is discussed in Section 4. This new notion captures the essential properties of dyadic decompositions and also generalises the notion of ontology, providing new perspectives and insights.Additionally, in Section 5, the notion of "Dyadic Decomposable Sets" is now supported by a canonical concrete algorithm that produces a Dyadic Pair of TGDs from each Dyadic Decomposable Set.The revisions also lead to new results regarding decidability and complexity.First, if C is an abstract (resp., concrete) and decidable class, then Dyadic-C is now also an abstract (resp., concrete) and decidable class.Second, the relationship between Datalog and any Dyadic-C is made explicit, emphasising the low expressive power required for C to ensure that Dyadic-C fully encompasses Datalog.Finally, the computational complexity analysis is completed in Section 6, where both data and combined complexity for any Dyadic-C class are systematically studied.

Preliminaries
In this section, we introduce the syntax and the semantics of the class of rules that generalises Datalog with existential quantifiers in rule-heads.Regarding computational complexity, we assume the reader is familiar with the basic complexity classes used in the subsequent sections: Moreover, for a complexity class C, we denote by PTIME C (resp., EXPTIME C ) the class of decision problems that can be solved by an oracle Turing machine operating in polynomial (resp., exponential) time with the aid of an oracle that decides a problem in C.

Basics on Relational Structures
Fix three pairwise disjoint lexicographically enumerable infinite sets C of constants, N of nulls (ϕ, ϕ 0 , ϕ 1 , ...), and V of variables (x, y, z, and variations thereof).Their union is denoted by T and its elements are called terms.For any integer k ≥ 0, we may write [k] for the set {1, ..., k}; in particular, as usual, if k = 0, then [k] = ∅.
An atom a is an expression of the form P (t), where preds(a) = P is a (relational) predicate, t = t 1 , ..., t k is a tuple of terms arity (a) = arity(P ) = k ≥ 0 is the arity of both a and P , and a[i] denotes the i-th term t[i] = t i of a, for each i ∈ [k].In particular, if k = 0, then t is the empty tuple and a = P ().By consts(a) and vars(a) we denote, respectively, the set of constants and variables occurring in a.A fact is an atom that contains only constants.
A (relational) schema S is a finite set of predicates, each with its own arity.The set of positions of S, denoted by pos(S), is defined as the set {P [i] | P ∈ S ∧ 1 ≤ i ≤ arity (P )}, where each P [i] denotes the i-th position of P .A (relational) structure over S is any (possibly infinite) set of atoms using only predicates from S. The domain of a structure S, denoted by dom(S), is the set of all the terms forming the atoms of S.An instance over S is any structure I over S such that dom(I) ⊆ C ∪ N. A database over S is any finite instance over S containing only facts.The active domain of an instance I, denoted by dom(I), is the set of all the terms occurring in I, whereas the Herbrand Base of I, denoted by HB (I), is the set of all the atoms that can be formed using the predicate symbols of S and terms of dom(I).
Consider two sets of terms T 1 and T 2 and a map µ : Given a set T of terms, the restriction of µ with respect to T is the map µ| T = {t → µ(t) : t ∈ T 1 ∩ T }.An extension of µ is any map µ ′ between terms, denoted by µ ′ ⊇ µ, such that µ ′ | T1 = µ.A homomorphism from a structure S 1 to a structure S 2 is any map h : dom(S 1 ) → dom(S 2 ) such that both the following hold: (i) if t ∈ C ∩ dom(S 1 ), then h(t) = t; and (ii) h(S 1 ) = {P (h(t)) : P (t) ∈ S 1 } ⊆ S 2 .

Conjunctive Queries
A conjunctive query (CQ) q over a schema S is a (first-order) formula of the form where x and y are tuples (often seen as sets) of variables such that x∩y = ∅, and Φ(x, y) is a conjunction (often seen as a set) of atoms using only predicates from S. In particular, -dom(Φ) ⊆ x ∪ y ∪ C, -whenever a variable z belongs to x ∪ y, then z occurs also in Φ, x are the output variables of q, and y are the existential variables of q.To highlight the output variables, we may write q(x) instead of q.The evaluation of q over an instance I is the set q(I) of every tuple t of constants admitting a homomorphism h t from Φ(x, y) to I such that h t (x) = t.
A Boolean conjunctive query (BCQ) is a CQ with no output variable, namely an expression of the form ← ∃ y Φ(y).An instance I satisfies a BCQ q, denoted I |= q, if q(I) is nonempty, namely q(I) contains only the empty tuple .

Tuple-Generating Dependencies
A tuple-generating dependency (TGD) σ-also known as (existential) rule-over a schema S is a (first-order) formula of the form where x, y, and z are pairwise disjoint tuples of variables, and both Φ(x, y) and Ψ(x, z) are conjunctions (often seen as a sets) of atoms using only predicates from S. In particular, -Φ (resp., Ψ) contains all and only the variables in x ∪ y (resp., x ∪ z), -constants (but not nulls) may also occur in σ, x ∪ y are the universal variables of σ denoted by vars ∀ (σ), z are the existential variables of σ denoted by vars ∃ (σ), and x are the frontier variables of σ denoted by vars (σ).We refer to body(σ) = Φ and head (σ) = Ψ as the body and head of σ, respectively.If we denote the set of predicates in head (σ) (resp., body(σ)).An instance I satisfies σ, written I |= σ, if the existence of a homomorphism h from Φ to I implies the existence of a homomorphism h ′ ⊇ h |x from Ψ to I.
A class C of ontologies is any (typically infinite) set of TGDs fulfilling some syntactic or semantic conditions (see, for example, the classes shown in Table 1, some of which will be formally defined in the subsequent sections).In particular, Datalog is the class of ontologies containing only datalog rules.

Ontological Query Answering
Consider a database D and a set Σ of TGDs.A model of D and Σ is an instance I such that I ⊇ D and I |= Σ.Let mods(D, Σ) be the set of all models of D and Σ.The certain answers to a CQ q w.r.t.D and Σ are defined as the set of tuples cert (q, D, Σ) = M∈mods(D,Σ) q(M ).Accordingly, for any fixed schema S, two ontologies Σ 1 and Σ 2 over S are said to be S-equivalent (in symbols Σ 1 ≡ S Σ 2 ) if, for each D and q over S, it holds that cert (q, D, Σ 1 ) = cert (q, D, Σ 2 ).The pair D and Σ satisfies a BCQ q, written D ∪ Σ |= q, if cert (q, D, Σ) = , namely M |= q for each M ∈ mods(D, Σ).Fix a class C of ontologies.The computational problem studied in this work-called cert-eval[C]-can be schematized as follows: Input: A database D, a ontology Σ ∈ C, a conjunctive query q(x), and a tuple c ∈ C |x| .Question: Does c ∈ cert (q, D, Σ) hold?
In what follows, with a slight abuse of terminology, whenever we say that C is decidable, we mean that cert-eval[C] is decidable.Note that c ∈ cert(q, D, Σ) if, and only if, D ∪ Σ |= q(c), where q(c) is the BCQ obtained from q(x) by replacing, for each i ∈ {1, ..., |x|}, every occurrence of the variable x[i] with the constant c[i].Actually, the former problem is AC 0 reducible to the latter.
While considering the computational complexity of cert-eval[C], we recall the following convention: (i) combined complexity means that D, Σ, q, and c are given in input; and (ii) data complexity means that only D and c are given in input, whereas Σ and q are considered fixed.Accordingly, we point out that complexity results reported in Table 1 refer to cert-eval[C] under this convention.

The Chase Procedure
The chase procedure (Deutsch et al. 2008) is a tool exploited for reasoning with TGDs.Consider a database D and a set Σ of TGDs.Given an instance I ⊇ D, a trigger for I is any pair σ, h , where σ ∈ Σ is a rule as in Equation 2and h is a homomorphism from body(σ) to I. Let I ′ = I ∪h ′ (head (σ)), where h ′ ⊇ h| x maps each z ∈ vars ∃ (σ) to a "fresh" null h ′ (z) not occurring in I such that z 1 = z 2 in vars ∃ (σ) implies h ′ (z 1 ) = h ′ (z 2 ).Such an operation which constructs I ′ from I is called chase step and denoted σ, h (I) = I ′ .
Without loss of generality, we assume that nulls introduced at each trigger functionally depend on the pair σ, h that is involved in the trigger.For example, given a rule σ as in Equation 2 and a homomorphism h, it is sufficient to pick ϕ z,h(x,y) as the fresh null replacing z when the chase produces the trigger σ, h .Accordingly, the processing order of rules and triggers does not change the result of the chase, and hence chase(D, Σ) can be considered unique.The chase procedure of D ∪ Σ is an exhaustive application of chase steps, starting from D, which produce a sequence such a way that: (i) for each i ≥ 0, I i+1 = σ, h (I i ) is a chase step obtained via some trigger σ, h for I i ; (ii) for each i ≥ 0, if there exists a trigger σ, h for I i , then there exists some j > i such that I j = σ, h (I j−1 ) is a chase step; and (iii) any trigger σ, h is used only once.We define chase(D, Σ) = ∪ i≥0 I i .
The chase bottom is the finite set of all null-free atoms in chase(D, Σ) and is defined as chase We recall that chase(D, Σ) can be decomposed into levels (Calì et al. 2010): each atom of D has level γ = 0; an atom of chase(D, Σ) has level γ + 1 if, during its generation, the exploited trigger σ, h maps the body of σ via h to atoms whose maximum level is γ.We refer to the part of the chase up to level γ as chase γ (D, Σ).Clearly, chase(D, Σ) = ∪ γ≥0 chase γ (D, Σ).Finally, a trigger involved at a certain level j if it gives rise to an atom of level j.

Considered Decidable Classes of TGDs
In this section we provide an overview of the main existing decidable classes of TGDs.We recall both syntactic and semantic classes, where the first are based on a specific syntactic condition that can be checked, while the latter are classes that do not come with a syntactic property that can be checked on rules and, hence, are not recognizable.Finally, we introduce a very simple new class of existential rules called Af-Inds.We will exploit the latter to sharpen our results presented in Section 5 and Section 6.

Preliminary Notions
We start fixing some basics notions.We have chosen to provide a uniform notation for the key existing notions of affected and invaded positions, such as attacked, protected, harmless, harmful, and dangerous variables (Leone et al. 2019;Calì et al. 2013;Krötzsch and Rudolph 2011;Berger et al. 2022;Gottlob et al. 2022).Basically, these notions serve to separate positions in which the chase can introduce only constants from those where nulls might appear.

Definition 1 (S-affected positions)
Consider an ontology Σ and a variable z ∈ vars ∃ (Σ).A position π ∈ pos(Σ) is z-affected (or invaded by z) if one of the following two properties holds: (i) there exists σ ∈ Σ such that z appears in the head of σ at position π; (ii) there exist σ ∈ Σ and x ∈ front(σ) such that x occurs both in head (σ) at position π and in body(σ) at z-affected positions only.Moreover, a position π ∈ pos(Σ) is S-affected, where S ⊆ vars ∃ (Σ), if: (i) for each z ∈ S, π is z-affected; and (ii) for each z ∈ vars ∃ (Σ), if π is z-affected, then z ∈ S.
We point out that the notion above presented is a refined version of the classical notion of affected position (Calì et al. 2013).In particular, it holds that if a position π is Saffected, then π is also affected; whereas if π is affected, then π may not be S-affected.Moreover, the S-affected notion coincides with the one of attacked positions by a variable (Leone et al. 2019;Krötzsch and Rudolph 2011).We highlight that its key nature and properties are not modified by the notion of S-affected position introduced above.Hence, for simplicity of exposition, we give only this refined definition.In the same spirit, we classify variables occurring in a conjunction of atoms.

Definition 2 (Variables classification)
Let σ be a TGD and x ∈ vars(body(σ)).Then, (i) if x occurs at positions π 1 , . . ., π n and Given a variable x that is S-dangerous, we write dang(x) for the set S. Hereinafter, for simplicity of exposition, the prefix S-is omitted when it is not necessary.Consider an ontology Σ.Given a rule σ ∈ Σ, we denote by dang(σ) (resp., harmless(σ) and harmful (σ)) the dangerous (resp., harmless and harmful) variables in σ.These sets of variables naturally extend to the whole Σ by taking, for each of them, the union over all the rules of Σ.

Decidable Classes of Existential Rules
We now survey the fifteen concrete classes reported in Table 1 as well as the known abstract classes based on semantic conditions.On the one side, we report some specific syntactic conditions whenever these are useful for the rest of the presentation; on the other side, for all of them (both concrete and abstract), we recall their containment relationships.For the rest of the section, fix a Datalog ∃ ontology Σ.
The class FES (Baget et al. 2009) stands for finite expansions sets, which intuitively are sets of TGDs which ensure the termination of the chase procedure.The class BTS (Baget et al. 2009) stands for bounded treewidth sets, which intuitively are sets of TGDs which guarantee that the (possibly infinite) instance constructed by the chase procedure has bounded treewidth.The class FUS (Baget et al. 2011) stands for finite unification sets, which intuitively are sets of TGDs which guarantee the termination of (resolution-based) backward chaining procedures.The class SPS (Leone et al. 2019) stands for strongly parsimonious sets, which intuitively are sets of TGDs which guarantee that the parsimonious chase procedure can be reapplied a number of times that is linear in the size of the query.
We now recall the notion of marked variable, in order to define the class Sticky (Calì et al. 2012b).A variable x of Σ is marked if (i) there is σ ∈ Σ such that x occurs in body (σ) but not in head (σ); or (ii) there is σ ∈ Σ such that x occurs in head (σ) at position π together with some σ ′ ∈ Σ having a marked variable in its body at position π.Accordingly, the stickiness condition states that Σ is Sticky if, for each σ ∈ Σ, x occurs multiple times in body (σ) implies x is not marked.
The class Linear (Calì et al. 2012a) is based on the linearity condition: an ontology Σ belongs to Linear if each rule contains at most one body atoms.This class generalize the class Inclusion-Dependencies (Abiteboul et al. 1995;Johnson and Klug 1984) in which rules contain only one body atom and one head atom and the repetition of variables is not allowed neither in the body nor in the head.
The class Guarded (Calì et al. 2013) is based on the guardedness condition: an ontology Σ belongs to Guarded if for each rule σ ∈ Σ there is a in body(σ) such that vars ∀ (σ) = vars(a).In similar fashion, Σ belongs to Weakly-Guarded if, for each σ ∈ Σ, there is an atom of body(σ) containing all the affected variables of σ.
We recall the shyness condition underlying the class Shy (Leone et al. 2019).An ontology Σ is Shy if, for each σ ∈ Σ the following conditions both hold: (i) if a variable x occurs in more than one body atom, then x is harmless; (ii) for every pair of distinct dangerous variable z and w in different atoms, dang(z) ∩ dang(w) = ∅.
The class Ward (Gottlob and Pieris 2015) is based on the wardedness condition: Σ ∈ Ward if, for each σ ∈ Σ, there are no dangerous variables in body (σ), or there exists an atom a ∈ body(σ), called a ward, such that (i) all the dangerous variables in body(σ) occur in a, and (ii) each variable of vars(a) ∩ vars(body(σ) \ {a}) is harmless.
Having finished with syntactic and semantic conditions, we close the section with a proposition stating their containment relationships (Baget et al. 2011;Krötzsch and Rudolph 2011;Leone et al. 2019;Baldazzi et al. 2022).

Proposition 1
The following classes are pairwise uncomparable, except for: Throughout the remainder of the paper, let E syn denote the set of all fifteen decidable syntactic classes reported in Table 1.Analogously, let E sem denote the set of known decidable abstract classes considered in this paper, namely FES, FUS, BTS, and SPS.

Autonomous Full Inclusion Dependencies
The aim of this section is to introduce a very simple new class of existential rules called Af-Inds.Additionally, we characterise the main properties of this class.
Definition 3 (Af-Inds) An ontology Σ belongs to Af-Inds (autonomous full inclusion dependencies) if Σ belongs to Inclusion-Dependencies and the following conditions are also satisfied: (1) head predicates do not appear in bodies (autonomous property); (2) rules have no existential variables (full property).Now, we show that any class C of TGDs in E syn ∪ E sem includes the class just defined.Formally, it holds the following.

Proposition 2
Consider a class C ∈ E syn ∪ E sem of TGDs.Then, Af-Inds ⊆ C.

Proof
Thanks to Proposition 1, the statement becomes equivalent to show that (i) Af-Inds ⊆ Inclusion-Dependencies and (ii) Af-Inds ⊆ Datalog.By Definition 3, the class Af-Inds contains all the rules that have only one body and head atom, without repetition of variables neither in the body nor in the head, and that satisfy the autonomous property (head atom does not appear in bodies) and the full property (rules have only one head atom without existential variables).Accordingly, relation (i) and (ii) are trivially full-filled.
We conclude the section by providing the complexity of the class Af-Inds.
Proposition 3 cert-eval[Af-Inds] is in AC 0 in data complexity and NP-complete in combined complexity.

Proof
By Proposition 2, Af-Inds ⊆ Inclusion-Dependencies.Hence, the data complexity of the problem cert-eval[Af-Inds] is inherit from that of cert-eval[Inclusion-Dependencies], that is AC 0 .For the combined complexity, we first observe that the problem cert-eval[Af-Inds] is NP-hard, building upon the well-known fact that cert-eval[∅] is already NP-hard.The latter refers to the problem of evaluating a query against a database in the absence of an ontology.Secondly, to prove the completeness of the cert-eval[Af-Inds] problem, we show that given a query q(x) and an ontology Σ, it is possible to construct in NP a CQ q Σ (x) such that c ∈ cert(D, Σ, q) iff c ∈ q Σ (D), with c being a tuple in C |x| .To this aim, for each atom a ∈ q(x), we guess if leave a unchanged, or "resolv" a with the body of some rule σ in Σ such that head (σ) unify with a. Accordingly, q Σ (D) is polynomial with respect to the input and, finally, it is possible to guess in NP an homomorphism to check if c ∈ q Σ (D).

Dyadic Pairs of TGDs
In this section we lay the groundwork for the main contribution of the paper, that is the definition of a new decidable class of TGDs called Dyadic-C.To this aim we first introduce some preliminary notions in order to define a dyadic pair and, then we conclude with some computational properties.

Formal Definition
We start introducing the concept of head-ground set of rules, being roughly "nonrecursive" rules in which nulls are neither created nor propagated.
The following example is given to better understand the above definition.

Example 1
Consider the next set of rules: A(x 5 , z 5 ), D(y 5 , z 5 ) → Q(x 5 , y 5 ) A subset of head ground rule w.r.t.Σ is given by Σ HG = {σ 2 , σ 3 }.In fact, harmless(Σ) is the set {x 1 , y 1 , y 2 , x 2 , z 2 , x 3 , y 3 , z 3 , y 5 , z 5 }; hence, according to Definition 4, it is easy to check that (i) σ 2 and σ 3 are datalog rules; (ii) the head atoms of σ 2 and σ 3 contain only harmless variables; (iii) both predicates S and T do not occur in the body of any rule in Σ HG , and (iv) both predicates S and T do not occur in the head of any rule in {σ 1 , σ 4 , σ 5 }.On the contrary, none of the rules in {σ 1 , σ 4 , σ 5 } can be part of any headground subset of Σ.Indeed, according to Definition 4, both σ 1 and σ 2 violate properties ( 1) and ( 2), whereas σ 5 violates property (2).Hence, we observe that the set Σ HG is also maximal.
Having in mind the notion of head-ground set of rules, we can now formally define what is a dyadic pair.Whenever the above definition applies, we also say, for short, that Π is a C-dyadic pair.Consider the following example to more easily understand the concept of dyadic pair.

Example 2
Consider the following pair Π = (Σ HG , Σ C ) of TGDs, where Σ HG is: and Σ C is: In particular, Π is a dyadic pair with respect to any C ∈ {Guarded, Shy, Ward.}.To this aim, let Σ and aff (S [1]) = {y 1 }.Accordingly, harmless(Σ) = {x 1 , x 2 , x 3 }, harmful (Σ) = {y 3 } and dang(Σ) = {y 3 }.To prove that Π is a dyadic pair, we have first to show that Σ HG is an head ground set of rules with respect to Σ. Clearly, Σ HG ∈ Datalog and each head atom contains only harmless variables; moreover, the head predicates do not appear neither in body atoms of Σ HG nor in head atoms of Σ C .Hence, Σ HG is head-ground with respect to Σ.It remains to show that Σ C ∈ C. We focus on the last rule of Σ C , since the first two rules are linear rules, and hence are trivially guarded, shy and ward rules.The last rule belongs to Guarded since the atom R(y 3 , x 3 ) contains all the universal variables of the rule (guardedness condition); it belong to Shy since the variable x 3 that occurs in two body atoms is harmless (shyness condition); finally, it belongs to Ward since atom R(y 3 , x 3 ) is the ward that contains the dangerous variables (y 3 ) and shares with the rest of the body only harmless variables (x 3 ) (wardedness condition).The next step is to extend the query answering problem-classically defined over an ontology-over a dyadic pair.Therefore, we extend both notions of chase and certain answers for a dyadic pair.Accordingly, given a dyadic pair Π = (Σ HG , Σ C ), we define and Now we can fix the problem studied in the rest of the paper.

Computational Properties
For the rest of the section, fix a decidable class C of TGDs.Given a database D and a C-dyadic pair Π of TGDs, we define the following set of ground atoms: Our idea is to reduce query answering over a dyadic pair Π to query answering over C, the latter being decidable by assumption.
In particular, given a database D and a dyadic pair Π = (Σ HG , Σ C ), Algorithm 2 iteratively constructs the set D + = D∪gra(D, Π), with gra(D, Π) being the set defined by Equation 5. Roughly speaking, the first two instructions are required, respectively, to add D to D + and to initialise a temporary set D used to store ground consequences derived from Σ HG .The rest of the algorithm is an iterative procedure that computes the certain answers (instruction 5) to the queries constructed from the rules of Σ HG (instruction 4) and completes the initial database D (instruction 7) until no more auxiliary ground atoms can be produced (instruction 6).We point out that, in general, D ⊆ gra(D, Π) holds; in particular, D = gra(D, Π) holds in the last execution of instruction 7 or, equivalently, when the condition D ∪ D ⊃ D + examined at instruction 6 is false, since all the auxiliary ground atoms have been added to D + .
Before we prove that Algorithm 2 always terminates and correctly constructs D + , we show the following preliminary lemma.Proof Let X = {a 1 , ..., a n }.For each j ∈ [n], let lev (a i ) be the level of a i inside chase(D, Σ).Let p = max j∈[n] {lev (a i )}.The proof proceeds by induction on the level i of chase(D∪X, Σ ′ ).
Base case: i = 1.We want to prove that chase 1 (D ∪ X, Σ ′ ) ⊆ chase(D, Σ).Let a be an atom of chase 1 (D ∪ X, Σ ′ ) generated exactly at level i = 1.By definition, a is obtained due to some trigger σ, h such that h maps body(σ) to D ∪ X.If a ∈ chase p (D, Σ), then the claim holds trivially.Otherwise, we can show that a ∈ chase p+1 (D, Σ).Indeed, since h maps body(σ) to D ∪ X, then h is also a trigger involved at level p + 1 since it maps body(σ) to chase p (D, Σ).In particular, h maps at least one atom of body (σ) to some atom a k ∈ X such that lev (a k ) = p.Since, by definition, nulls introduced during the chase functionally depend on the involved triggers, then a necessarily belongs to chase p+1 (D, Σ).
Induction step: i = ℓ.Given that for every level i ≤ ℓ − 1, chase i (D ∪ X, Σ ′ ) ⊆ chase(D, Σ) (induction hypothesis), we prove that chase ℓ (D∪X, Σ ′ ) ⊆ chase(D, Σ) holds, too.Let β be an atom of chase ℓ (D ∪X, Σ ′ ) generated exactly at level i = ℓ.By definition, β is obtained via some trigger σ ′ , h ′ such that h ′ maps body(σ ′ ) to atoms with level at most ℓ − 1. Accordingly, by induction hypothesis, h ′ maps body(σ ′ ) also to chase(D, Σ).Hence, since the processing order of rules and triggers does not change the result of the chase and nulls functionally depend on the involved triggers, it follow that also β ∈ chase(D, Σ).
With the next proposition, we prove that Algorithm 2 always terminates and correctly constructs D + .

Proposition 4
Consider a database D and a C-dyadic pair Π of TGDs.It holds that Algorithm 2 both terminates and computes D + = D ∪ gra(D, Π).

Proof
Let Π = (Σ HG , Σ C ).We proceed by proving first the termination of Algorithm 2 and then its correctness.
Termination.To prove the termination of Algorithm 2, it suffices to show that each instruction alone always terminates and that the overall procedure never falls into an infinite loop.First, observe that |gra(D, Π)| ≤ |h-preds(Σ HG )| • d µ , where d = |consts(D)| and µ = max P ∈h-preds(ΣHG) arity(P ).Instructions 1, 2, 4, 8 and 9 trivially terminate.Instructions 6 and 7 both terminate, since D ⊆ gra(D, Π) always holds (see correctness below).Each time instruction 3 is reached, the for-loop simply scans the set Σ HG , which is finite by definition.Concerning instruction 5, it suffices to observe that its termination relies on the termination of cert-eval[C]-which is true by hypothesis-and on the fact that, for each query q, to construct the set {H i (t) | t ∈ cert(q, D + , Σ C )}, the problem cert-eval[C] must be solved at most d µ times, where d µ is the maximum number of tuples t for which the check t ∈ cert (q, D + , Σ C ) has to be performed.Since each instruction alone terminates, it remains to analyze the overall procedure.It contains two loops.The first, namely the for-loop at instruction 3, is not problematic; indeed, we shown that it locally terminates.The second one, namely the go to-loop, depends on the evaluation of the if-instruction, which can be executed at most |gra(D, Π)| times.Thus, also the go to-loop does the same.
Correctness.We now claim that Algorithm 2 correctly completes the database.Let D + be the output of Algorithm 2. Our claim is that D + = D ∪ gra(D, Π).
Inclusion 1 (D + ⊇ D ∪ gra(D, Π)).Assume, by contradiction, that D ∪ gra(D, Π) contains some atom that does not belong to D + .This means that there exists some j > 0 such that both D = ((D ∪ gra(D, Π)) ∩ chase j−1 (D, Σ HG ∪ Σ C )) ⊆ D + and ((D ∪ gra(D, Π)) ∩ chase j (D, Σ HG ∪ Σ C )) \ D + = ∅ hold.Thus, there exists some a ∈ chase j (D, Σ HG ∪ Σ C ) whose level is exactly j and that does not belong to D + .Let σ, h be the trigger used by the chase to generate a, where σ is of the form Φ(x, y) → H(x).Clearly, h maps Φ(x, y) to chase j−1 (D, Σ HG ∪ Σ C ), and we also have that a = H(h(x)).Consider now the query q = x ← Φ(x, y) constructed from σ by Algorithm 2 at instruction 4. Thus, . We proceed by induction on the number ℓ of iterations.
Base case: Let i = 1.We claim that D1 ⊆ gra(D, Π).By construction, the set D1 = Since D + 0 = D and the component Σ C does not produce any atom in the first iteration of the algorithm, the latter is equal to Induction step: Given that, for i = ℓ − 1, Dℓ−1 ⊆ gra(D, Π) (induction hypothesis), we prove that Dℓ ⊆ gra(D, Π) holds, too.By construction Dℓ = {H i (t We conclude the section by proving the decidability of the problem dp-cert-eval[C], under the assumption that cert-eval[C] is decidable.

Proof
To prove the decidability of dp-cert-eval[C] we design Algorithm 1.Let D be a database, Π = (Σ HG , Σ C ) a dyadic pair, q(x) a CQ, and c a tuple in C x .Clearly, step 1 always terminates since it recalls Algorithm 2 that, as shown in Proposition 4, always terminates and correctly constructs the set D + .By Theorem 1, checking if c ∈ dp-cert(q, D, Π) boils down to checking if c ∈ cert (q, D + , Σ C ), that is decidable by hypothesis.Hence, step 2 never falls in a loop and Algorithm 1 correctly computes dp-cert-eval[C].

Dyadic Decomposable Sets
In this section we introduce a novel general condition that allows to define, from any decidable class C of ontologies, a new decidable class called Dyadic-C enjoying desirable properties.The union of all the Dyadic-C classes, with C being any decidable class of TGDs, forms what we call dyadic decomposable sets, which encompass and generalize any other existing decidable class, including those based on semantic conditions.
We start the section by providing a classification of atoms of the rule-body, according to where dangerous variables appear; then we define the class Dyadic-C proving that the query answer problem over this class is decidable.

Definition 6 (Atoms classification)
Consider a set Σ of TGDs and a rule σ ∈ Σ.An atom a of body (σ) is σ-problematic if (i) a contains a dangerous variable w.r.t.Σ, or (ii) a is connected to some σ-problematic atom via some harmful variable.The set of all the problematic atoms of σ is denoted by p-atoms(σ), whereas s-atoms(σ) = body(σ) \ p-atoms(σ) denotes the set of all the safe atoms of σ.
In the special case in which a variable x of vars ⋆ (σ) occurs n > 1 times in head (σ), then x also occurs n times in the head of hg(σ).Accordingly, x occurs with different names (e.g., x 1 , ..., x n ) both in the head and in the body of main(σ).For example, if the ontology contains only the rule σ : P (x) → R(x, x), then hg(σ) : P (x) → Aux σ (x, x) and main(σ) : Aux σ (x 1 , x 2 ) → R(x 1 , x 2 ).Clearly, the latter two rules together are equivalent to σ w.r.t. to the schema {P, R}.We prefer to keep the formal definition of hg(Σ) and main(Σ) light without formalising such special cases.
We are now ready to formally introduce the class Dyadic-C.By Definition 7, to check if an ontology Σ belongs to Dyadic-C, one has to verify if Σ ∈ C, or main(Σ) ∈ C. We observe that the construction of the set main(Σ) explained above, is polynomial (indeed linear) with respect to the size of Σ.Hence, the following result holds.

Theorem 3
Consider a class C of TGDs and assume that checking whether an ontology belongs to C is doable in some complexity class C ⊇ PTIME.Then, checking whether an ontology belongs to Dyadic-C is decidable and it belongs to C.

Proof
We start recalling that, by Definition 7, an ontology Σ belongs to Dyadic-C if (i) Σ ∈ C, or (ii) main(Σ) ∈ C. Accordingly, checking condition (i) is doable in some complexity class C ⊇ PTIME, by assumption; otherwise, the construction of the set main(Σ) is done by a procedure that is polynomial (indeed linear) with respect to the size of Σ and, hence, always terminates.Accordingly, checking condition (ii) is also decidable and doable in the complexity class C. ∈ C and let Π = (hg (Σ), main(Σ)).Property (ii) is satisfied since by hypothesis Σ ∈ Dyadic-C; hence, it follows by definition that main(Σ) ∈ C. It remains to show property (i).According to Definition 4, the set hg(Σ) has to satisfy four properties.Property 1 and 2 are trivially fulfilled since, by construction, for each σ ∈ Σ, hg(σ) is a datalog rule and each head atom contains only harmless variables with respect to hg(Σ) ∪ main (Σ).Property 3 and 4 state that h-preds(hg(Σ)) ∩ b-preds(hg(Σ)) = ∅ and h-preds(hg (Σ)) ∩ h-preds(main(Σ)) = ∅.These hold since, by construction, h-preds(hg (Σ)) = {Aux σ : σ ∈ Σ}, where each Aux σ is a predicate that does not occur neither in any body of hg(Σ) nor in any head of main(Σ).
Concerning the equivalence between Σ HG ∪ Σ C and Σ, we can observe that it easily comes from the shape of hg(σ) and main(σ) with respect to each original rule σ ∈ Σ.Indeed, first, the body of σ is first partitioned in s-atoms(σ) and p-atoms(σ).Second, all the atoms if s-atoms(σ) form the body of hg(σ).Then, all the variables of hg(σ) that are in join with p-atoms(σ) or are in the head of σ are collected in Aux σ (vars ⋆ (σ)).Finally, Aux σ (vars ⋆ (σ)) is put in conjunction with p-atoms(σ) to form the body of main(σ).Such a way of decomposing a rule σ is well-known to be correct for query answering purposes even when the variables in the auxiliary atom are harmful.
It remains to show that Dyadic-C is decidable.We rely on Algorithm 3 together with Theorem 1 and Proposition 4 to state the following result.

Theorem 5
Consider a decidable class C of TGDs.Then, cert-eval[Dyadic-C] is decidable.

Proof
To prove the statement we provide the terminating Algorithm 3. Let Σ ∈ Dyadic-C an ontology.Instructions 1 and 2 of the algorithm are introduced in order to construct the components (hg(Σ), main(Σ)) of a dyadic pair Π, which is successively initialized at instruction 3. Of course, the construction of Π is based on a polynomial procedure with respect to the size of the input Σ, hence these instructions always terminates.Finally, instruction 4 returns the result of the evaluation of the problem dp-cert-eval[C].To solve the latter, is invoked Algorithm 1, which in turn invokes Algorithm 2; their correctness is guaranteed by Theorem 1 and Proposition 4, respectively.Accordingly, cert-eval[Dyadic-C] is decidable.Accordingly to the above theorem, immediately we get the following result.

Corollary 1
Complexity results in Table 2 do hold.
For studying the combined complexity, we need to take into account the fact that the database returned by Algorithm 2 (namely, D + ) is exponential with respect to the input one (namely, D).Indeed, the check c ∈ cert (q, D + , Σ C ) performed by Algorithm 1 is done on an exponentially bigger database.Thus, in case cert-eval[C] would have the same data complexity and combined complexity, it might happen that the combined complexity of cert-eval[Dyadic-C] could be exponentially higher that the one of cert-eval [C].Although all the considered classes in E + syn do not suffer from this shortcoming, before stating our general result, we need to focus on "well-behaved" classes of TGDs.A class C of TGDs enjoys the dropping data-complexity property if there is an exponential jump from the combined complexity of cert-eval[C] to the data complexity of cert-eval[C].

Proposition 6
Each class in E syn enjoys the dropping data-complexity property.
We can now state the last result of the section, providing the combined complexity of problem cert-eval over Dyadic-C sets of TGDs. .Indeed, also in this case, this value is an upper bound for the number of calls to the oracle.This is enough to show point 2.
Concerning the memberships of point 1 and point 3, differently from the proof of Theorem 7, in combined complexity the maximum arity µ, the size of the sets Σ HG and Σ C , as well as the size and the number of each query constructed at instruction 4  We can now consider the cost function g(n) (resp., f (n)) of some algorithm/oracle that decides cert-eval[C] and shows that it belongs to C (resp., C d ) in combined (resp., data) complexity.According to the dropping data-complexity property, we know that g(n) grows at least exponentially faster than f (n).Essentially, there is an exponential jump from C d to C that does not depend on the size of the input database but only on the size of other parameters, namely the ontology, the query and the tuple of constants.Consider now the query q ′ := x ← Φ(x, y) constructed at instruction 4 of Algorithm 2 (we call it q ′ to avoid confusion with q(x) mentioned at the beginning of this proof).At instruction 5 of the same algorithm, the oracle for cert-eval[C] checks whether t ∈ cert(q ′ , D + , Σ C ) holds.Since g(n) grows at least exponentially faster than f (n), we get that g(||t, q ′ , D + , Σ C ||) remains of the same exponential order of g(||t, q ′ , D, Σ C ||), although ||D + || may be exponentially larger than ||D||.
For the memberships of point 3, we already know that cert-eval[Dyadic-C] is in EXPTIME C .Consider now a C-oracle O for cert-eval[C] characterised by the cost function g(n).If C ⊇ EXPTIME is deterministic, then O works with respect to ||t, q ′ , Σ C || in an exponentially faster way than with respect to ||D + ||; thus, also in this case, O cannot exceed the power of C. Therefore, EXPTIME C , in a sense, collapses to C.
Finally, we conclude the proof by considering the hardness of points 3 and 4. In the first case, we observe that it derives from Proposition 5, since Dyadic-C includes the class C, that is C-hard by assumption.For point 4, we recall that by Theorem 6, Datalog ⊆ Dyadic-C; hence, since cert-eval[Datalog] is EXPTIME-hard, it follows the thesis.
The following immediately derives from above theorem.

Corollary 2
Complexity results in Table 3 do hold.

Conclusion
Dyadic decomposable sets form a novel decidable class of TGDs that encompasses and generalises all the existing (syntactic and semantic) decidable classes of TGDs.In the near feature, it would be interesting to implement a prototype for dyadic existential rules by exploiting different kinds of existing reasoners.

Definition 5 (
Dyadic pairs) Consider a class C of TGDs.A pair Π = (Σ HG , Σ C ) of TGDs is dyadic with respect to C if the next hold: (1) Σ HG is head-ground with respect to Σ HG ∪ Σ C ; and (2) Σ C ∈ C.

Algorithm 2 :
Complete [C] (D, Π) Input: A database D and a C-dyadic pair Π = (Σ HG , Σ C ) Output: The set D + of ground atoms
and, thus, a ∈ D + , which is a contradiction.Inclusion 2 (D + ⊆ D ∪ gra(D, Π)).Let D + be the set produced by Algorithm 2. Let ℓ be the number of time instruction 7 of Algorithm 2 is executed.At each execution i ∈ [ℓ] of instruction 7, the algorithm computes the set Di containing only auxiliary ground atoms, and produces the set D + i = D ∪ Di .By construction, D + = D ∪ Dℓ .Let D0 = ∅, D + 0 = D ∪ D0 , and

Theorem 8
Consider a class C of TGDs.In combined complexity, if cert-eval[C]  belongs to some decidable complexity class C and C enjoys the dropping data-complexity property, then the following hold:1.If C ⊆ EXPTIME, then cert-eval[Dyadic-C] is in EXPTIME; 2. If C ⊇ EXPTIME, then cert-eval[Dyadic-C] is in EXPTIME C ; 3. If C ⊇ EXPTIME is deterministic and cert-eval[C] is C-complete, then it holds that cert-eval[Dyadic-C] is C-complete too; 4. If C ⊇ Af-Inds, then cert-eval[Dyadic-C] is EXPTIME-hard.Proof The argument proceeds similarly to proof of Theorem 7 by arguing on Algorithm 3 to determine the complexity of cert-eval[Dyadic-C].Let D be a database, Σ ∈ Dyadic-C an ontology, q(x) a CQ, and c ∈ C |x| a tuple.Moreover, let d = |consts(D)| and µ = max P ∈h-preds(ΣHG) arity(P ).As previously shown, Algorithm 3 invokes Algorithm 1, which in turn invokes Algorithm 2. Concerning the latter, by ignoring the computational costs of the oracle, it overall performs a number of step that is linear in |Σ HG | • |h-preds(Σ HG )| • d 2µ not bounded.Accordingly, also the size of the completed database returned by Algorithm 2 (namely D + ) may become exponential with respect to the input.More precisely, |consts(D + )| = d and |D + | ≤ |h-preds(Σ HG )| • d µ + |D|.Let n generically denote the size ||seq|| of any sequence seq of objects given in input to cert-eval[C].
Definition 7 (Dyadic-C) Consider a class C of TGDs such that cert-eval[C] is decidable.We say that Σ belongs to Dyadic-C if Σ belongs to C or if main(Σ) belongs to C. According to the previous definition, one can easily state the following property.Consider a class C of TGDs.It holds that C ⊆ Dyadic-C.

Table 2 .
Data complexity comparison of cert-eval[C]with cert-eval[Dyadic-C]. at instruction 5, is called d µ times, and the for-loop at instruction 3 is executed |Σ HG | times.Therefore, by ignoring the computational costs of the oracle (i.e., checking whether t ∈ cert (q, D + , Σ C )), Algorithm 2 overall performs a number of step that is linear in|Σ HG | • |h-preds(Σ HG )| • d 2µ .Indeed, this value is also an upper bound for the number of calls to the oracle.Since we are in data complexity, the following parameters are bounded: the maximum arity µ, the size of the sets Σ HG and Σ C , as well as the size and the number of each query q constructed at instruction 4 of Algorithm 2. PTIME C .To prove point 3 of the theorem, we observe that the membership follows from point 2 and from the fact that, for any deterministic class C ⊇ PTIME, it holds that PTIME Hence, the latter calls polynomially many times the problem cert-eval[C].Accordingly, Algorithm 3 is polynomial and in turn it invokes polynomially many times an oracle to compute cert-eval[C].Hence, if C ⊆ PTIME, trivially cert-eval[Dyadic-C] ∈ PTIME; whereas, if C ⊇ PTIME, cert-eval[Dyadic-C] ∈ C = C; whereas, the hardness derives from Proposition 5, since Dyadic-C includes the class C, that is C-hard by assumption.Finally, to prove point 4, we recall that by Theorem 6, Datalog ⊆ Dyadic-C; hence, since cert-eval[Datalog] is PTIME-hard, the thesis follows.