## Amie: association rule mining under incomplete evidence in ontological knowledge bases

**AMIE: Association Rule Mining under Incomplete Evidence**
**in Ontological Knowledge Bases**
Luis Galárraga1, Christina Teflioudi1, Katja Hose2, Fabian M. Suchanek1
1Max-Planck Institute for Informatics, Saarbrücken, Germany
2Aalborg University, Aalborg, Denmark
1{lgalarra, chteflio, suchanek}@mpi-inf.mpg.de, 2{khose}@cs.aau.dk
inevitably exhibit gaps. Others are created and extendedmanually. Making these KBs complete requires great effort
Recent advances in information extraction have led to huge
to extract facts, check them for correctness, and add them
knowledge bases (KBs), which capture knowledge in a ma-
to the KB. However, KBs themselves often already contain
chine-readable format. Inductive Logic Programming (ILP)
enough information to derive and add new facts. If, for in-
can be used to mine logical rules from the KB. These rules
stance, a KB contains the fact that a child has a mother,
can help deduce and add missing knowledge to the KB.

then the mother's husband is most likely the father:
While ILP is a mature field, mining logical rules from KBs isdifferent in two aspects: First, current rule mining systems
motherOf (m, c) ∧ marriedTo(m, f ) ⇒ fatherOf (f, c)
are easily overwhelmed by the amount of data (state-of-theart systems cannot even run on today's KBs). Second, ILP
As for any rule, there can be exceptions, but in the vast
usually requires counterexamples. KBs, however, implement
majority of cases, the rule will hold.

Finding such rules
the open world assumption (OWA), meaning that absent
can serve four purposes: First, by applying such rules on
data cannot be used as counterexamples. In this paper, we
the data, new facts can be derived that make the KB more
develop a rule mining model that is explicitly tailored to
complete. Second, such rules can identify potential errors
support the OWA scenario. It is inspired by association rule
in the knowledge base. If, for instance, the KB contains the
mining and introduces a novel measure for confidence. Our
statement that a totally unrelated person is the father of a
extensive experiments show that our approach outperforms
child, then maybe this statement is wrong. Third, the rules
state-of-the-art approaches in terms of precision and cover-
can be used for reasoning. Many reasoning approaches rely
age. Furthermore, our system, AMIE, mines rules orders of
on other parties to provide rules (e.g., . Last, rules
magnitude faster than state-of-the-art approaches.

describing general regularities can help us understand thedata better. We can, e.g., find out that countries often trade
Categories and Subject Descriptors
with countries speaking the same language, that marriage isa symmetric relationship, that musicians who influence each
H2.8 [Information Systems]: Database Applications
other often play the same instrument, and so on.

The goal of this paper is to mine such rules from KBs.

We focus on RDF-style KBs in the spirit of the Seman-
tic Web, such as YAGO Freebase, and DBpedia These KBs provide binary relationships in the form of RDF
triplSince RDF has only positive inference rules, theseKBs contain only positive statements and no negations. Fur-
Rule Mining, Inductive Logic Programming, ILP
thermore, they operate under the Open World Assumption(OWA). Under the OWA, a statement that is not contained
in the KB is not necessarily false; it is just unknown. This
In recent years, we have experienced the rise of large
is a crucial difference to many standard database settings
knowledge bases (KBs), such as Cyc YAGO DB-
that operate under the Closed World Assumption (CWA).

pedia and FreebaThese KBs provide information
Consider an example KB that does not contain the infor-
about a great variety of entities, such as people, countries,
mation that a particular person is married. Under CWA we
rivers, cities, universities, movies, animals, etc. Moreover,
can conclude that the person is not married. Under OWA,
KBs also contain facts relating these entities, e.g., who was
however, the person could be either married or single.

born where, which actor acted in which movie, or which city
Mining rules from a given dataset is a problem that has
is located in which country. Today's KBs contain millions
a long history. It has been studied in the context of asso-
of entities and hundreds of millions of facts.

ciation rule mining and inductive logic programming (ILP).

Yet, even these large KBs are not complete.

Association rule mining is well-known in the context of
them are extracted from natural language resources that
sales databases. It can find rules such as "If a client boughtbeer and wine, then he also bought aspirin". The confidence
of such a rule is the ratio of cases where beer and wine wasactually bought together with aspirin. Association rule min-
Copyright is held by the International World Wide Web ConferenceCommittee (IW3C2). IW3C2 reserves the right to provide a hyperlink
ing inherently implements a closed world assumption: A rule
to the author's site if the Material is used in electronic media.

WWW, '13 Rio de Janeiro, Brazil
that predicts new items that are not in the database has a
preprocessing step and by penalizing longer rules in the in-
low confidence. It cannot be used to (and is not intended to
ference part. For mining the rules, Sherlock uses 2 heuristics:
be used to) add new items to the database.

statistical significance and statistical relevance.

ILP approaches deduce logical rules from ground facts.

The WARMR system mines patterns in databases
Yet, current ILP systems cannot be applied to semantic
that correspond to conjunctive queries. It uses a declara-
KBs for two reasons: First, they usually require negative
tive language bias to reduce the search space. An extension
statements as counter-examples. Semantic KBs, however,
of the system, WARMER modified the approach to
usually do not contain negative statements. The semantics
support a broader range of conjunctive queries and increase
of RDF are too weak to deduce negative evidence from the
efficiency of search space exploration.

facts in a KBecause of the OWA, absent statements can-
ALEPis a general purpose ILP system, which imple-
not serve as counter-evidence either. Second, today's ILP
ments Muggleton's Inverse Entailment algorithm in Pro-
systems are slow and cannot handle the huge amount of
log. It employs a variety of evaluation functions for the rules,
data that KBs provide. In our experiments, we ran state-of-
and a variety of search strategies.

the-art approaches on YAGO2 for a couple of days without
These approaches are not tailored to deal with large KBs
obtaining any results.

under the Open World Assumption. We compare our sys-
In this paper, we propose a rule mining system that is
tem, AMIE, to WARMR and ALEPH, which are the only
inherently designed to work under the OWA, and efficient
ones available for download. Our experiments do not only
enough to handle the size of today's KBs. More precisely,
show that these systems mine less sensible rules than AMIE,
our contributions are as follows:
but also that it takes them much longer to do so.

(1) A method to simulate negative examples for positive KBs
Expert Rule Mining. Another rule mining approach over
(the Partial Completeness Assumption)
RDF data was proposed to discover causal relations in
(2) An algorithm for the efficient mining of rules.

RDF-based medical data. It requires a domain expert who
(3) A system, AMIE, that mines rules on millions of facts
defines targets and contexts of the mining process, so that
in a few minutes without the need for parameter tuning or
the correct transactions are generated. Our approach, in
expert input.

contrast, does not rely on the user to define any context or
The rest of this paper is structured as follows. Section dis-
target. It works out-of-the-box.

cusses related work and Section introduces preliminaries.

Generating Schemas. In this paper, we aim to generate
Sections and are the main part of the paper, present-
Horn rules on a KB. Other approaches use rule mining to
ing our mining model and its implementation. Section
generate the schema or taxonomy of a KB. applies clus-
presents our experiments before Section concludes.

tering techniques based on context vectors and formal con-cept analysis to construct taxonomies. Other approaches
use clustering and ILP-based approaches . For the
We aim to mine rules of the form
friend-of-a-friend network on the Semantic Web, ap-plies clustering to identify classes of people and ILP to learn
motherOf (m, c) ∧ marriedTo(m, f ) ⇒ fatherOf (f, c)
descriptions of these groups. Another example of an ILP-based approach is the DL-Learner , which has success-
Technically, these are Horn rules on binary predicates. Rule
fully been applied to generate OWL class expressions
mining has been an area of active research for the past couple
from YAGO As an alternative to ILP techniques,
of years. Some approaches mine association rules, some mine
propose a statistical method that does not require negative
logical rules, others mine a schema for the KB, and again
In contrast to our approach, these techniques
others use rule mining for application purposes.

aim at generating a schema for a given RDF repository, not
Association Rule Mining. Association rules are mined
logical rules in general.

on a list of transactions. A transaction is a set of items. For
Learning Rules From Hybrid Sources. proposes to
example, in the context of sales analysis, a transaction is the
learn association rules from hybrid sources (RDBMS and
set of products bought together by a customer in a specific
Ontologies) under the OWA. For this purpose, the defini-
event. The mined rules are of the form {ElvisCD, Elvis-
tion of frequency (and thus of support and confidence) is
Book } ⇒ ElvisCostume, meaning that people who bought
changed so that unknown statements contribute with half
an Elvis CD and an Elvis book usually also bought an Elvis
of the weight of the true statements. Another approach
costume. However, these are not the kind of rules that we
makes use of an ontology and a constraint Datalog program.

aim to mine in this paper. We aim to mine Horn rules.

The goal is to learn association rules at different levels of
One problem for association rule mining is that for some
granularity w.r.t. the type hierarchy of the ontology. While
applications the standard measurements for support and
these approaches focus more on the benefits of combining
confidence do not produce good results. discusses a num-
hybrid sources, our approach focuses on pure RDFS KBs.

ber of alternatives to measure the interestingness of a rule in
Further Applications of Rule Mining. proposes an
general. Our approach is inspired by this work and we also
algorithm for frequent pattern mining in KBs that use DL-
make use of a language bias to reduce the search space.

safe rules. Such KBs can be transformed into a disjunctive
Logical Rule Mining. Sherlock is an unsupervised
Datalog program, which allows seeing patterns as queries.

ILP method to learn first-order Horn clauses from a set of
This approach does not mine the Horn rules that we aim at.

extracted facts for a given target relation. It uses probabilis-
Some approaches use rule mining for ontology merging
tic graphical models (PGMs) to infer new facts. It tackles
and alignment The AROMA system , e.g.,
the noise of the extracted facts by extensive filtering in a
3RDF has only positive rules and no disjointness constraints
or similar concepts.

uses association rules on extracted terms to find subsump-
Rules. An atom is a fact that can have variables at the
tion relations between classes and properties of different on-
subject and/or object position. A (Horn) rule consists of a
tologies. Again, these systems do not mine the kind of rules
head and a body, where the head is a single atom and the
we are interested in.

body is a set of atoms. We denote a rule with head r(x, y)
In association rules and frequency analysis are used to
and body {B1, ., Bn} by an implication
identify and classify common misusage patterns for relations
in DBpedia. In contrast to our work, this approach does not
1 ∧ B2 ∧ . ∧ Bn ⇒ r(x, y)
mine logical rules, but association rules on the co-occurrence
which we abbreviate as
B ⇒ r(x, y). One example of such
of values. Since RDF data can be seen as a graph, mining
frequent subtrees is another related field of research.

However, as the URIs of resources in knowledge bases are
hasChild (p, c) ∧ isCitizenOf (p, s) ⇒ isCitizenOf (c, s)
unique, these techniques are limited to mining frequent com-
An instantiation of a rule is a copy of the rule, where all
binations of classes.

variables have been substituted by entities. A prediction of
Several approaches, such as Markov Logic or URDF
a rule is the head atom of an instantiated rule if all body
use Horn rules to perform reasoning. These approaches
atoms of the instantiated rule appear in the KB. For ex-
can be consumers of the rules we mine with AMIE.

ample, the above rule can predict isCitizenOf(Lisa,USA) ifthe KB knows a parent of Lisa (hasChild(Elvis,Lisa)) whois American (isCitizenOf(Elvis,USA)).

Language Bias. As most ILP systems, AMIE uses a lan-
RDF KBs. In this paper, we focus on RDF knowledge
guage bias to restrict the search space. We say that two
baseAn RDF KB can be considered a set of facts, where
atoms in a rule are connected if they share a variable or an
each fact is a triple of the form hx, r, yi with x denoting the
entity. A rule is connected if every atom is connected tran-
subject, r the relation (or predicate), and y the object of the
sitively to every other atom of the rule. AMIE mines only
fact. There are several equivalent alternative representations
connected rules, i.e., it avoids constructing rules that con-
of facts; in this paper we use a logical notation and represent
tain unrelated atoms. We say that a rule is closed if every
a fact as r(x, y). For example, we write father(Elvis,Lisa).

variable in the rule appears at least twice. Such rules do not
The facts of an RDF KB can usually be divided into an A-
predict merely the existence of a fact (e.g. diedIn(x, y) ⇒
Box and a T-Box. While the A-Box contains instance data,
∃z : wasBornIn(x, z)), but also concrete arguments for it
the T-Box is the subset of facts that define classes, domains,
(e.g. diedIn(x, y) ⇒ wasBornIn(x, y)). AMIE mines only
ranges for predicates, and the class hierarchy. Although T-
closed rules. We allow recursive rules that contain the head
Box information can also be used by our mining approach,
relation in the body.

we are mainly concerned with the A-Box, i.e., the set of facts
Parallels to Association Rule Mining. Association Rule
relating one particular entity to another.

Mining discovers correlations in shopping transactions. Thus,
In the following, we assume a given KB K as input. Let
association rules are different in nature from the Horn rules
R = πrelation(K) denote the set of relations contained in K
Still, we can show some similarities between
and E = πsubject(K) ∪ πobject(K) the set of entities.

the two approaches. Let us define one transaction for every
Functions. A function is a relation r that has at most one
set of n entities that are connected in the KB. For exam-
object for every subject, i.e., ∀x : {y : r(x, y)} ≤ 1. A
ple, in Figure we will define a transaction for the enti-
relation is an inverse function if each of its objects has at
ties Elvis, Lisa and Priscilla, because they are connected
most one subject. Since RDF KBs are usually noisy, even
through the facts mother(Priscilla,Lisa), father(Elvis,Lisa),
relations that should be functions (such as hasBirthdate)
marr(Elvis, Priscilla). We label the transaction with the
may exhibit two objects for the same subject. Therefore,
set of these entities. Each atom r(xi, xj ) on variables in-
we use the notion of functionality . The functionality of
dexed by 1 ≤ i, j ≤ n corresponds to an item. A transaction
a relation r is a value between 0 and 1, that is 1 if r is a
with label hC1, . . , Cni contains an item r(xi, xj ) if r(Ci, Cj )
is in the KB. For example, the transaction hElvis, Lisa,
#x : ∃y : r(x, y)
Priscillai contains the items {mother(x3,x2), father(x1,x2),
f un(r) := #(x, y) : r(x, y)
marr(x1,x3)}, since the ground atoms mother(Priscilla,Lisa),father(Elvis,Lisa) and marr(Elvis, Priscilla) are in the KB.

with #x : X as an abbreviation for {x : X ∈ K} . The
In this representation, association rules are Horn rules. In
inverse functionality is defined accordingly as if un(r) :=
the example, we can mine the association rule
f un(r−1). Without loss of generality, we assume that ∀r ∈R : f un(r) ≥ if un(r) (FUN-Property).

{mother(x3, x2), marr(x1, x3)} ⇒ {f ather(x1, x2)}
the case for a relation r, we can replace all facts r(x, y)
which corresponds to the Horn rule
with the inverse relation, r−(y, x), which entails f un(r−) ≥if un(r−).

For example, if the KB contains the inverse
mother(x3, x2) ∧ marr(x1, x3) ⇒ f ather(x1, x2)
functional relation directed(person,movie), we can create the
Transaction Label
Transaction Items
functional relation isDirectedBy(movie,person) and use only
that one in the rule mining process.

Manual inspection
shows, however, that relations in semantic KBs tend to be
more functional than inverse functional. Intuitively, this al-
Figure Mining Rules with 3 Variables
lows us to consider a fact r(x, y) as a fact about x.

Constructing such a table with all possible combinations
of entities is practically not very viable. Apart from that,
it faces a number of design issues (e.g., how to deal with
ically if more body atoms are added and avoids equivalent
transactions that contain the same entities in different or-
rules with different support values. With this in mind, we
derings). Therefore, association rule mining cannot be used
define the support of a rule as the number of distinct pairs
directly to mine Horn rules. However, we take inspiration
of subjects and objects in the head of all instantiations that
from the parallels between the two types of mining for our
appear in the KB:
system, AMIE.

B ⇒ r(x, y)) := #(x, y) : ∃z1, ., zm :
where z1, ., zm are the variables of the rule apart from xand y.

Model. Let us consider a given Horn rule
B ⇒ r(x, y). Let
Head Coverage. Support is an absolute number. This
us look at all facts with relation r (Figure . We distinguish
means that a user who thresholds on support has to know
4 types of facts: True facts that are known to the KB (KB-
the absolute size of the KB to give meaningful values. To
true), true facts that are unknown to the KB (NEWtrue),
avoid this, we also define a proportional version of support.

facts that are known to be false in the KB (KBfalse), and
A naive way would be to use the absolute number of support,
facts that are false but unknown to the KB (NEWfalse).

as defined in the previous paragraph, over the size of the KB.

The rule will make certain predictions (blue circle). These
In this case, however, relations that do not have many facts
predictions can be known to be true (A), known to be false
(either because of the incompleteness of the KB or because of
(C), or unknown (B and D). When they are unknown to the
their nature), will not be considered in the head of rules, i.e.

KB, they can still be true (B) or false (D) with respect to
we will not learn rules predicting such relations. Therefore,
the real world.

we propose to use the notion of head coverage. This is theproportion of pairs from the head relation that are covered
by the predictions of the rule
B ⇒ r(x, y)) := #(x0, y0) : r(x0, y0)
Negative Examples. The central challenge of our setting
is to provide counter-examples for the rule mining. These
can take the role of KBfalse, so that we can estimate the ar-
eas NEWtrue and NEWfalse. There are several approachesto this problem: The standard confidence, the standardpositive-only learning evaluation score of ILP, and our new
Figure Prediction under Incompleteness
partial completeness assumption.

Standard Confidence. The standard confidence measure
Goal. Our goal is to find rules that make true predictions
takes all facts that are not in the KB (i.e., NEWtrue and
that go beyond the current KB. In the figure, we wish maxi-
NEWfalse) as negative evidence. Thus, the standard confi-
mize the area B, and to minimize the area D. There are two
dence of a rule is the ratio of its predictions that are in the
obvious challenges in our context: First, the areas NEWtrue
KB, i.e., the share of A in the set of predictions:
and NEWfalse are unknown. So if we wish to maximize Bat the expense of D, we are operating in an area outside our
B ⇒ r(x, y)) :=
KB. We would want to use the areas KBtrue and KBfalse
#(x, y) : ∃z1, ., zm :
to estimate the unknown area. This, however, leads to the
The standard confidence is blind to the distinction between
second challenge: Semantic KBs do not contain negative
"false" and "unknown". Thus, it implements a closed world
evidence. Thus, the area KBfalse is empty. We will now
setting. It mainly describes the known data and penalizes
present different measures that address these challenges.

rules that make a large number of predictions in the un-
Support. The support of a rule quantifies the number of
known region. We, in contrast, aim to maximize the number
correct predictions, i.e., the size of A. There are several ways
of true predictions that go beyond the current knowledge.

to define the support: It can be the number of instantiations
We do not want to describe data, but to predict data.

of the rule that appear in the KB. This is what our analogy
Positive-Only Learning. For cases where the KB does
to association rule mining suggests (Section .

not contain negative examples, Muggleton has developed a
measure, however, is not monotonic if we add atoms to the
positive-only learning evaluation score for ILP , . It
body. Consider, for example, the rule
takes random facts as negative evidence:
marriedTo( x, y) ⇒ marriedTo( y, x)
Score = log(P ) − log
If we add hasGender(x,male) to the body, the number of
instantiations that are in the KB decreases. If we add an
Here, P is the number of known true facts covered (A in
atom with a fresh variable, e.g., hasFriend(x,z), to the body,
the figure), R is the number of randoms covered, Rsize is
the number of instantiations increases for every friend of x.

the total number of randoms and L is the number of atoms
This is true even if we add another atom with z to make the
in the hypothesis. The intuition is that a good rule should
rule closed. Alternatively, we can count the number of facts
cover many positive examples, and few or no randomly gen-
in one particular body atom. This definition, however, de-
erated examples. This ensures that the rule is not overly
pends on the choice of the body atom, so that the same rule
general. Furthermore, the rule should use as few atoms as
can have different supports. We can also count the number
possible, and thus achieve a high compression. This measure
of facts of the head atom. This measure decreases monoton-
is implemented (among others) in the ALEPH system.

Partial Completeness. We propose to generate negative
By repeated application of these operators, we can generate
evidence by the partial completeness assumption (PCA).

the entire space of rules as defined in Section The oper-
This is the assumption that if r(x, y) ∈ KBtrue for some
ators generate even more rules than those we are interested
in, because they also produce rules that are not closed. Analternative set of operators could consist of OD and an op-
∀y0 : r(x, y0) ∈ KBtrue ∪ NEWtrue ⇒ r(x, y0) ∈ KBtrue
erator for instantiation. But these operators would not be
In other words, we assume that if the database knows some
monotonic, in the sense that an atom generated by one oper-
r-attribute of x, then it knows all r-attributes of x. This
ator can be modified in the next step by the other operator.

assumption is certainly true for functional relations r, such
Therefore, we chose the above 3 operators as a canonic set.

as birth dates, capitals, etc. Thanks to the FUN-Property
Algorithm. We mine rules with Algorithm The algo-
(see Section , it is also true for inverse-functional relations,
rithm maintains a queue of rules, which initially just con-
such as owns, created, etc. The assumption is also true in the
tains the empty rule. The algorithm iteratively dequeues a
vast majority of cases for relations that are not functional,
rule from the queue. If the rule is closed (see Section ,
but that have a high functionality. Even for other relations,
the rule is output, otherwise, it is not. Then, the algorithm
the PCA is still reasonable for knowledge bases that have
applies all operators to the rule and adds the resulting rules
been extracted from a single source (such as DBpedia and
to the queue (unless they are pruned out, s.b.). This pro-
YAGO). These usually contain either all r-values or none for
cess is repeated until the queue is empty. We parallelize this
a given entity.

process by maintaining a centralized queue, from which the
PCA Confidence. Under the PCA, we normalize the con-
threads dequeue and enqueue. We do not feed predictions of
fidence not by the entire set of facts, but by the set of facts
the rules back into the KB. All measures (such as confidence
of which we know that they are true, together with the facts
and support) are always computed on the original KB.

of which we assume that they are false. If the head atomof the rule is r(x, y), then this set is just the set of facts
Algorithm 1 Rule Mining
{r(x, y0) : r(x, y0) ∈ K}. Thanks to the FUN-Property, the
1: function AMIE(KB K)
PCA is always applied to the first argument of the head
Execute in parallel:
while ¬q.isEmpty() do
B ⇒ r(x, y)) :=
#(x, y) : ∃z1, ., zm, y0 :
if r is closed ∧ r is not pruned for output then
We show in our experiments that the PCA confidence iden-
tifies much more productive rules than the other measures.

for all operators o do
for all rules r0 ∈ o(r) do
if r0 is not pruned then
After having outlined the basic definitions and the mining
model in Sections and we now outline the core algorithm
of our framework and its implementation.

Goal. Our goal is to mine rules of the form defined in Sec-
tion One of the main problems of any mining approachis to find an efficient way to explore the search space. The
Pruning. If executed naively, our algorithm will have pro-
naive algorithm of enumerating all possible rules is infeasi-
hibitively high run-times. The instantiation operator OI , in
ble for large KBs. Hence, we explore the search space by
particular, generates atoms in the order of R × E . We
iteratively extending rules by mining operators.

first observe that we are usually not interested in rules that
Mining Operators. We see a rule as a sequence of atoms.

cover only very few facts of the head relation. Rules that
The first atom is the head atom and the others are the body
cover, for example, less than 1% of the facts of the head
atoms. In the process of traversing the search space, we can
relation can safely assumed to be marginal. Therefore, we
extend a rule by using one of the following operators:
set θ = 0.01 as a lower bound for the head coverage. Weobserve that head coverage decreases monotonically as we
1. Add Dangling Atom (OD)
add more atoms. This allows us to safely discard any rule
This operator adds a new atom to a rule. The new
that trespasses the threshold (Lines 11 and 12).

atom uses a fresh variable for one of its two arguments.

The monotonicity of head coverage gives us another op-
The other argument (variable or entity) is shared with
portunity to prune: If a rule B
the rule, i.e., it occurs in some other atom of the rule.

1 ∧ . ∧ Bn ∧ Bn+1 ⇒ H does
not have larger confidence than the rule B1 ∧ . ∧ Bn ⇒ H,
2. Add Instantiated Atom (O
then we do not output the longer rule. This is because both
This operator adds a new atom to a rule that uses an
the confidence and the head coverage of the longer rule are
entity for one argument and shares the other argument
necessarily dominated by the shorter rule. This way, we can
(variable or entity) with the rule.

reduce the number of produced rules (Lines 6 and 7).

Last, we never enqueue a rule that is already in the queue.

3. Add Closing Atom (OC )
It is expensive to check two rules for equality. However, it
This operator adds a new atom to a rule so that both
is easy to compute measures such as head coverage, confi-
of its arguments are shared with the rule.

dence, and PCA confidence for each rule. Two rules can
only be equal if they have the same values for these mea-
GROUP BY(H.x1, H.xr, H.x2)
sures. This restricts the rules that have to be checked. If
HAVING COUNT(*) ≥ k
a rule is duplicate, we do not enqueue it (Lines 11 and 12).

where ?x is replaced with a reference to any of the intro-
We can be sure that any potential duplicates will still be in
duced columns. The WHERE clause lists all variables that
the queue. This is because the length of the rules increases
are shared between any two atoms in the rule, i.e., all join
monotonically: When we dequeue a rule with n atoms, no
columns and conditions between atom tables.

rule with n + 1 atoms has ever been dequeued. Thus, when
LECT can only select variables that appear in the GROUP
we apply the operators to the rule with n atoms, and gener-
BY statement, the above template is for the case where ?x
ate a rule with n + 1 atoms, any potential duplicate of that
appears in H. The case where ?x does not appear in H will
new rule must be in the queue.

require a nested query. Our experience shows that already
Projection Queries. No matter what operator is applied
running the non-nested query on a database of a few mil-
in particular, the algorithm needs to choose a relation for
lion facts can easily take several minutes on an off-the-shelf
the new atom that is added to the rule. In addition, the
RDBMS. Hence, efficient SPARQL engines such as RDF-
instantiation operator OI also allows the choice of an entity.

3X are an alternative option. In SPARQL 1.1, the pro-
In order to select only relations and entities that will fulfill
jection query template is:
the head coverage constraint, we rely on the KB to answerprojection queries. These are queries of the form
SELECT ?x WHERE H ∧ B1 ∧ . ∧ Bn
H.x1, H.xr, H.x2 .

HAVING COUNT(H)≥ k
B1.x1, B1.xr, B1.x2 .

. .

where B1, ., Bn are atoms and k is a natural number. H is
the projection atom on which we project. ?x is the selection
n.x1, Bn.xr , Bn.x2 .

variable. It is a variable that appears in one or more atoms
at the position of one of the arguments or at the position
HAVING COUNT(*) ≥ k
of the relation (as it is common in SPARQ. Such queriesselect an entity or relation x such that the result of the query
Again, this is only for the case where ?x appears in H. RDF-
H ∧ B1 ∧ . ∧ Bn on the KB contains more than k distinct
3X does not support aggregate functions in this way. Thus,
query answers for H.

we would need extensive postprocessing of query results to
Using Projection Queries. Projection queries allow us
compute a projection query – already in the case where ?x
to select the relationship for the operators OD, OI , and OC
in such a way that the head coverage of the resulting rule is
In-Memory Database. We have implemented a vanilla
above θ. This works by firing a projection query of the form
in-memory database for semantic KBs. Our implementa-tion indexes the facts aggressively with one index for each
SELECT ?r WHERE H ∧ B1 ∧ . ∧ Bn∧ ?r(X, Y )
permutation of subject, relation, and object. Each index is
HAVING COUNT(H)≥ k
a map from the first item to a map from the second itemto a set of third items (e.g., a map from relations to a map
where X and Y are variables or constants, depending on the
from subjects to a set of objects). This allows retrieving the
type of atoms that the operator generates. The results for
instantiations of a single atom in constant time. The exis-
?r will be the relations that, once bound in the query, ensure
tence of a query answer can be checked naively by selecting
that the support of the rule B1 ∧ . ∧ Bn∧?r(X, Y ) ⇒ H is
the atom with fewest instantiations, running through all of
greater than k. If we choose k equal to θ times the num-
its instantiations, instantiating the remaining atoms accord-
ber of facts of the relation of H, then the head coverage of
ingly, and repeating this process recursively until we find an
the resulting rules will be greater than θ – which is what
instantiation of the query that appears in the KB. Select
we want. For the instantiation operator OI , we first fire
queries are similar.

a projection query to select relations, and then fire projec-
Projection Queries. Algorithm shows how we answer
tion queries to retrieve entities. This way, projection queries
projection queries. The algorithm takes as input a selec-
allow us to choose the relationships and entities for the op-
tion variable ?x, a projection atom H = R(X, Y ), remain-
erators in such a way that the head coverage for the new
rules is guaranteed to be above θ. Next, we discuss how to
1, .Bn, a constant k, and a KB K.

check whether ?x appears in the projection atom. If that
implement projection queries efficiently.

is the case, we run through all instantiations of the projec-
tion atom, instantiate the query accordingly, and check forexistence. Each existing instantiation increases the counter
SQL and SPARQL. Projection queries are essential for
for the respective value of ?x. We return all values whose
the efficiency of our system. Yet, standard database imple-
counter exceeds k. If the selection variable does not appear
mentations do not provide special support for these types
in the projection atom, we iterate through all instantiations
of queries. Assuming that the KB K is stored as a three-
of the projection atom. We instantiate the query accord-
column table (i.e., each fact is a row with three elements),
ingly, and fire a SELECT query for ?x. We increase the
the projection query template in SQL would be:
counter for each value of ?x. We report all values whose
counter exceeds k.

Summary. We have identified projection queries as the cru-
cial type of queries for rule mining. Since standard database
i = Bj .xm, . .

systems and standard SPARQL systems provide no specifi-
cally tuned support for these queries, we have implemented
a vanilla in-memory database, which has specific support for
strings) from the KBs. Literal values (such as geographi-
projection queries. Our entire implementation is in Java.

cal coordinates) are shared by only very few entities, whichmakes them less interesting for rule mining.

Algorithm 2 Answering Projection Queries
Evaluations. In all experiments, our goal is twofold: First,
we want to produce as many predictions as possible beyond
SELECT(?x, R(X, Y ) ∧ B1 ∧ . ∧ Bn, k, K)
the current KB. Second, the percentage of correct predic-
∨ X ≡ ?x ∨ Y ≡ ?x
tions shall be as large as possible. The particular challenge
for all instantiations r(x, y) of R(X, Y ) ∈ K do
is that we want to evaluate predictions that go beyond the
current KB. We are not interested in describing the exist-
ing data, but in generating new data. Therefore, we pro-
if exists instantiation q ∈ K then
ceed as follows: We run the systems on an older dataset
map(value of ?x) + +
(YAGO2 . We generate all predictions, i.e., the head
atoms of the instantiated rules (see Section We remove
all predictions that are in the old KB. Then we compare the
remaining predicted facts to the successor of that dataset
for all instantiations r(x, y) of R(X, Y ) ∈ K do
(YAGO2s . A prediction is "correct" if it appears in the
newer KB. A prediction is "incorrect" if it has a highly func-
tional or highly inverse functional relation and contradicts
for all x ∈ SELECT ?x FROM K WHERE q do
an existing fact in the newer KB, e.g., a different birth place.

For all other predictions, we manually validated the facts by
checking a sample of 30 of them against Wikipedia pages.

This classifies the remaining predictions as "correct" or "in-
correct" – except for a few cases where the fact is "unknown",
return {x : map(x) ≥ k}
such as the death place of a person that is still alive. The
ratio of correct predictions out of the correct and incorrectpredictions yields the precision of the rule.

Outlook. We note that with the project of predicting be-
yond current knowledge, we are entering a new, and very
risky area of research. We do not expect Horn rules to haveextraordinary precisions in the unknown region. Rules can
Experiments. We conducted 3 groups of experiments: In
only yield hypotheses about possible facts.

the first group, we compare AMIE to two popular, state-of-the-art systems that are publicly available, WARMR
AMIE vs. WARMR and ALEPH
and ALEPH. In the second group of experiments, wecompare the standard confidence to the novel PCA confi-
In this section, we compare AMIE to WARMR and ALEPH.

dence that we have introduced in this paper (Section In
For each system, we conduct 3 experiments: We first com-
the third group of experiments, we run AMIE on different
pare the usability of the competitor system to AMIE. Then,
datasets to show the applicability of the system.

we compare their runtimes. Last, we compare their outputs.

Settings. By default, AMIE finds all rules whose head cov-erage exceeds the default threshold of θ = 1%. AMIE ranks
the resulting rules by decreasing PCA confidence. There is
Usability. WARMR is a system that unifies ILP and as-
no need to deviate from this default configuration when a
sociation rule mining. Similar to APRIORI algorithms ,
user runs AMIE. There are no parameters to tune. All ex-
it performs a breadth-first search in order to find frequent
periments with AMIE on all datasets are run in this setting,
patterns. WARMR generates Datalog queries of the form
unless otherwise mentioned.

"? − A1, A2, ., An", where Ai are logical atoms.

For some experiments, we want to compare AMIE's run-
To discover frequent patterns (as in association rule min-
time with other systems. To have an equal basis, we make
ing), we need to have a notion of frequency. Given that
AMIE simulate the metrics of the competitor systems. AMIE
WARMR considers queries as patterns and that queries can
can threshold on support, head coverage, confidence, and
have variables, it is not immediately obvious what the fre-
PCA confidence, and she can rank by any of these. AMIE
quency of a given query is. Therefore, the user needs to
can also count the support not on two variables, but on a sin-
specify the predicate that is being counted by the system
gle variable. AMIE can also output non-closed rules. Since
(the key predicate). In the usual scenario of market basket
this is just a choice of what to output, it does not influ-
analysis, e.g., the system counts customer transactions. In
ence runtime. All experiments with all systems are run on a
a scenario in which the database is a KB, one solution is to
server with 48GB RAM and 8 CPUs. We always mine rules
count entities. Since the key predicate determines what is
without constants (i.e., without the instantiation operator),
counted, it is necessary that it is contained in all queries.

unless otherwise mentioned.

Therefore, we add a predicate entity(x), which we fill with
Knowledge Bases. We run our experiments on different
all entities of the KB. AMIE does not require such a choice.

KBs. In all cases, we removed the rdf:type relationship, be-
For WARMR, the user needs to provide specific informa-
cause it inflates the size of the KBs. We are aware that
tion about which predicates can be added to a query, which
the rdf:type relationship can be very helpful for rule mining.

of their variables can be fresh, and which arguments of pred-
However, currently no approach (including ours) makes spe-
icates are allowed to be unified (type declarations). In con-
cific use of it. We plan to make use of it in future work. Fur-
trast, AMIE requires none of these. AMIE simply takes as
thermore, we removed all facts with literals (numbers and
input the KB in triple format.

WARMR is also able to mine rules with constants. The
rules, which included the ones mined by WARMR. We checked
user can define which predicates and arguments should be
back with the WARMR team and learned that for a given set
instantiated with constants (we call this mode MODE1).

of atoms B1, .Bn, WARMR will mine only one rule, picking
WARMR then checks all the constants appearing in the facts
one of the atoms as head atom (e.g., B1 ∧ . ∧ Bn−1 ⇒ Bn).

of that specific predicate and argument and afterwards uses
AMIE, in contrast, will mine one rule for each possible choice
them in the queries. MODE1 naturally entails an increase of
of head atom (as long as the thresholds are met). In other
the branching factor in the search space and an explosion in
words, AMIE with the standard support and confidence
the number of candidates that need to be evaluated. Alter-
measures simulates WARMR, but mines more rules. Fur-
natively, WARMR allows the user to set a maximum num-
thermore, it runs orders of magnitude faster.

ber of constants to be used for each predicate and argument
for large datasets for which the user would have needed to
(MODE2). Unfortunately, though, it does not provide a way
use complicated sampling schemes in order to use WARMR,
for the user to influence the selection of these constants. In
AMIE can be a very attractive alternative. Even for smaller
other words, there is no guarantee that the constants that
datasets with rules with constants, AMIE can provide results
WARMR will use are the most promising ones.

while WARMR cannot. Moreover, AMIE comes with met-
WARMR produces rules as output. These rules are not
rics that go beyond the standard confidence and the stan-
necessarily connected. For example, WARMR mines
dard support. We will show later that these improve thequality of the results.

isMarriedTo(B,C), ∧ isLeaderOf (A,D)⇒ hasAcademicAdvisor (C,E)
This rule is not only nonsensical from a semantic perspec-
Usability. ALEPH can be run with different commands
tive, but also redundant, because the second atom does not
that influence the search strategy. We chose the induce com-
influence the implication. Therefore, the user has to filter
mand, which runs fastest. For running ALEPH, the user
out these rules from the output.

has to specify the target predicate for learning (the head
Thus, we conclude that the broader mission and the broader
predicate of the rules). In the following, we ran ALEPH
applicability of WARMR entails that much more configura-
successively with all predicates of the KB as targets. In ad-
tion, acquaintance, and expert knowledge is needed to make
dition, the user has to specify a series of type and mode
it mine Horn rules on semantic KBs.

declarations (similar to WARMR), which will be used as a
Runtime. YAGO2 contains around 940K facts about
language bias in order to restrict the search space. In addi-
470K entities. WARMR was not able to terminate on this
tion, the user needs to provide ALEPH with files containing
data in a time period of 1 day. Therefore, we created a
the background knowledge and positive examples for the tar-
sample of YAGO2. Randomly selecting a number of facts
get predicate. In contrast, AMIE requires no such input. It
from the initial dataset could break the interesting links be-
will run on a KB without any prespecified choices of predi-
tween the entities. Therefore, we randomly selected 10,000
seed entities and included their 3-hop neighborhood. This
yielded 14K entities and 47K facts. This sample contains
4.96s to > 1 day
all available information in a radius of 3 hops around the
0.05s to > 1 day
seed entities, but much less information about the entities
Table Runtimes ALEPH vs. AMIE
at the periphery of the subgraph. Therefore, we restrictedthe values for the key predicate to the seed entities only.

Since the sample is much smaller than the original KB, we
isPoliticianOf, hasCapital, hasCurrency
lowered the support threshold to 5 entities. We ran AMIE
dealsWith, hasOfficialLanguage, imports
with these parameters on the sample. AMIE mined her rules
in 3.90 seconds. WARMR, in contrast, took 18 hours. We
also ran both systems allowing them to mine rules with con-
isMarriedTo, livesIn, worksAt, isLocatedIn
stants. AMIE completed the task in 1.53 minutes. WARMR
Table Runtimes of ALEPH on YAGO2
in MODE1 for all relations did not terminate in 3 days.

Therefore, we ran it also only for the relations diedIn, livesIn,
diedIn, directed, hasAcademicAdvisor
wasBornIn, for which it took 48h. We also ran WARMR in
graduatedFrom, isPoliticianOf, playsFor
MODE2. To have reasonable runtimes, we allowed WARMR
wasBornIn, worksAt, isLeaderOf
to find constants only for one predicate (diedIn). We also re-
exports, livesIn, isCitizenOf
stricted it to find only 20 constants. WARMR ran 19 hours.

actedIn, produced, hasChild, isMarriedTo
Table summarizes the runtime results. We conclude that
Table Runtimes of ALEPH on YAGO2 Sample
AMIE is better suited for large KBs than WARMR. Thisis because WARMR is an ILP algorithm written in a logic
We ran AMIE and ALEPH on YAGO2
programming environment, which makes the evaluation of
For ALEPH, we used the positive-only evaluation function
all candidate queries inefficient.

with Rsize = 50 and we considered only clauses that wereable to explain at least 2 positive examples, so that we will
not get grounded facts as rules in the output. For a fair
comparison, we also instructed AMIE to run with a support
Table Runtimes on YAGO2 Sample
threshold of 2 facts. AMIE terminated in 3.62 minutes, andfound rules for all relations. ALEPH ran for one head re-
Results. After filtering out non-connected rules, WARMR
lation at a time. For some relations (e.g.isPoliticianOf ), it
mined 41 closed rules. AMIE, in contrast, mined 207 closed
terminated in a few seconds. For others, however, we had to
abort the system after 1 day without results (Tables and
10 rules already produce 135K predictions – at a precision
. For each relation, ALEPH treats one positive example
of 39%. The top 30 rules produce 3 times more predictions
at a time. Some examples need little processing time, others
than the top 30 rules by standard confidence – at compa-
block the system for hours. We could not figure out a way
rable precision. This is because the PCA confidence is less
to choose examples in such a way that ALEPH runs faster.

conservative than the standard confidence.

Hence, we used the sample of YAGO2 that we created for
WARMR. Again, runtimes varied widely between relations
PCA Confidence (rules 1-30)
Std. Confidence (rules 1-30)
(Table . Some relations ran in a few seconds, others did
Std. Confidence (rules 31-46)
not terminate in a day. The runtimes with constants are
similarly heterogenous, with at least 7 relations not termi-
nating in 1 day.

Results. We compared the output of ALEPH on the head
relations for which it terminated to the output of AMIE
on these head relations, on the sample dataset.

100000 150000 200000 250000 300000 350000 400000 450000
Aggregated predictions (beyond the initial KB)
mined 56 rules, while AMIE mined 335 rules.

the rules by decreasing score (ALEPH) and decreasing PCA
Figure Std. Confidence vs. PCA Confidence
confidence (AMIE). Table shows the number of predic-tions, and their total precision as described in Section
Discussion. The precision of the rules is in the range of
We show the aggregated values at the points where both
30%-40%. Thus, only a third of the predictions in the un-
approaches have produced around 3K, 5K and 8K predic-
known region will be correct. Still, even imperfect rules can
tions. AMIE's PCA confidence succeeds in sorting the rules
be useful: If, e.g., a human checks the facts before they
roughly by descending precision, so that the initial rules have
are added, then reducing the number of false predictions is
an extraordinary precision compared to ALEPH's. AMIE
a great advantage. If games with a purpose are employed,
needs more rules to produce the same number of predictions
then the rules can help pre-select candidate facts. If multiple
as ALEPH (but she also mines more).

sources are combined, then rules can contribute evidence. If
ALEPH's positives-only evaluation function manages to fil-
reasoning approaches are used, then rules can be taken into
ter out overly general rules only to some extent. ALEPH
consideration according to their estimated performance. Fi-
will mine, e.g, livesIn(A, C),isLocatedIn(C, B) ⇒ isPoliti-
nally, the precision is better if standard confidence is used.

cianOf (A, B). The problem is that ALEPH generates coun-
Predicting Precision. The confidence measures can serve
terexamples by randomly using valid constants for variables
to estimate the actual precision of a rule. In Table we rank
A and B. This means that the probability of creating a
the mined rules by their precision and report the average ab-
random example in which B is the place of residence of the
solute error of the standard and PCA confidence weighted
specific person A is very low.

by the number of predictions produced by the rules. Wecan observe that, on average, the PCA confidence estimates
the precision of the rules better than the normal confidence.

Thus, reasoning approaches can use the PCA confidence as
a weight for the rule.

Table Top Rules of ALEPH vs. AMIE
Table Average Absolute Error to Precision
AMIE with Different Metrics
We also note that our rules are insightful. Table shows
In this section, we compare the standard confidence mea-
some of the rules we mined. Being able to mine reasonable
sure to the PCA confidence measure. We ran AMIE with
rules on semantic KBs of this size is an achievement beyond
the default head coverage threshold on the YAGO2 dataset.

the current state of the art.

It contains nearly 500K entities and 948K facts. We sortthe rules first by descending PCA confidence, and then by
isMarriedTo(x, y)∧ livesIn(x, z) ⇒ livesIn(y, z)
descending standard confidence, and look at the top rules.

isCitizenOf (x, y) ⇒ livesIn(x, y)
For each rule, we evaluated the predictions beyond YAGO2
hasAdvisor (x, y)∧ graduatedFrom(x, z) ⇒ worksAt (y, z)
as described in Section Figure uses aggregated predic-
wasBornIn(x, y)∧ isLocatedIn(y, z) ⇒ isCitizenOf (x, z)
tions and aggregated precision to illustrate the results. The
hasWonPrize(x, G. W. Leibniz) ⇒ livesIn(x, Germany)
n-th dot from the left tells us the total number of predictions
hasWonPrize(x, Grammy) ⇒ hasMusicalRole(x, Guitar)
and the total precision of these predictions, aggregated over
Table Some Rules by AMIE
the first n rules. As we see, ranking the rules by standardconfidence is a very conservative approach: It identifies rules
AMIE on Different Datasets
with reasonable precision, but these do not produce many
As a proof of concept, we ran AMIE on YAGO2
predictions. Going down in the list of ranked rules, the rules
YAGO2 with constants, and DBpedia We chose an older
produce more predictions – but at lower precision. The top
version of DBpedia (2.0), so that we can evaluate the out-
30 rules produce 113K predictions at an aggregated preci-
put to a newer version of DBpedia (3.8). Due to the large
sion of 32%. If we rank the rules by PCA confidence, in con-
number of relations in DBpedia 2.0, there is an enormous
trast, we quickly get large numbers of predictions. The top
number of rules to be found. We show the time taken to
mine rules with 2 atoms. We provide also the number of
[13] B. Goethals and J. Van den Bussche. Relational Association
predicted facts that are in the newer version of the KB but
Rules: Getting WARMER. In Pattern Detection and
not in the old one (hits). As Table shows, AMIE can pro-
Discovery, volume 2447. Springer Berlin / Heidelberg, 2002.

duce rules with or without constants in a reasonable time.

[14] G. A. Grimnes, P. Edwards, and A. D. Preece. Learning
Meta-descriptions of the FOAF Network. In ISWC, 2004.

[15] S. Hellmann, J. Lehmann, and S. Auer. Learning of OWL
Class Descriptions on Very Large Knowledge Bases. Int. J.

Semantic Web Inf. Syst., 5(2), 2009.

[16] J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum.

YAGO2: a spatially and temporally enhanced knowledge
Table AMIE on Different Datasets
base from Wikipedia. Artificial Intelligence Journal, 2013.

All rules and results are available at
[17] J. Jozefowska, A. Lawrynowicz, and T. Lukaszewski. The
role of semantics in mining frequent patterns fromknowledge bases in description logics with rules. TheoryPract. Log. Program., 10(3), 2010.

[18] M. Kuramochi and G. Karypis. Frequent Subgraph
In this paper, we have presented an approach for mining
Discovery. In ICDM. IEEE Computer Society, 2001.

Horn rules on large RDF knowledge bases. We have intro-
[19] J. Lehmann. DL-Learner: Learning Concepts in
duced a formal model for rule mining under the Open World
Description Logics. Journal of Machine Learning Research
Assumption, a novel measure to simulate counter-examples,
(JMLR), 10, 2009.

[20] F. A. Lisi. Building rules on top of ontologies for the
and a scalable algorithm for the mining. In contrast to state-
semantic web with inductive logic programming. TPLP,
of-the-art approaches, our system (AMIE) requires no input
other than the KB, and does not need configurations or pa-
[21] A. Maedche and V. Zacharias. Clustering Ontology-Based
rameter tuning. As our extensive experiments have shown,
Metadata in the Semantic Web. In PKDD, 2002.

AMIE runs on millions of facts in only a few minutes and
[22] T. Mamer, C. Bryant, and J. McCall. L-modified ilp
outperforms state-of-the-art approaches not only in terms of
evaluation functions for positive-only biological grammar
runtime, but also in terms of the number and quality of the
learning. In Inductive logic programming, number 5194 in
output rules. Our confidence measure can reasonably pre-
LNAI. Springer-Verlag, 2008.

[23] C. Matuszek, J. Cabral, M. Witbrock, and J. Deoliveira.

dict the precision of the rules. In our future work, we plan
An introduction to the syntax and content of Cyc. In AAAI
to consider also the T-Box of the KB in order to produce
Spring Symposium, 2006.

more precise rules. We also aim to explore the synergies
[24] D. L. McGuinness, R. Fikes, J. Rice, and S. Wilder. An
when several rules predict the same fact, and extend the set
Environment for Merging and Testing Large Ontologies. In
of rules beyond Horn rules, so that even more complex facts
and hidden knowledge can be predicted.

[25] S. Muggleton. Inverse entailment and progol. New
Generation Comput., 13(3&4), 1995.

[26] S. Muggleton. Learning from positive data. In ILP.

[1] Z. Abedjan, J. Lorey, and F. Naumann. Reconciling
ontologies and the web of data. In CIKM, 2012.

[27] N. Nakashole, M. Sozio, F. Suchanek, and M. Theobald.

e, L. Raedt, and M. Bruynooghe. Declarative bias for
Query-time reasoning in uncertain rdf knowledge bases
specific-to-general ilp systems. Machine Learning, 20, 1995.

with soft and hard rules. In Workshop on Very Large Data
[3] R. Agrawal, T. Imieli´
nski, and A. Swami. Mining
Search (VLDS) at VLDB, 2012.

association rules between sets of items in large databases.

[28] V. Nebot and R. Berlanga. Finding association rules in
In SIGMOD, 1993.

semantic web data. Knowl.-Based Syst., 25(1), 2012.

[4] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I.

[29] T. Neumann and G. Weikum. RDF-3X: a RISC-style
Verkamo. Fast discovery of association rules. In Advances
engine for RDF. Proc. VLDB Endow., 1(1), Aug. 2008.

in knowledge discovery and data mining. 1996.

[30] N. F. Noy and M. A. Musen. PROMPT: Algorithm and
[5] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak,
Tool for Automated Ontology Merging and Alignment. In
and Z. G. Ives. DBpedia: A nucleus for a Web of open
AAAI/IAAI. AAAI Press, 2000.

data. In ISWC, 2007.

[31] M. Richardson and P. Domingos. Markov logic networks.

[6] Y. Chi, R. R. Muntz, S. Nijssen, and J. N. Kok. Frequent
Machine Learning, 62(1-2), 2006.

Subtree Mining - An Overview. Fundam. Inf., 66(1-2),
[32] S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis.

Learning first-order Horn clauses from web text. In
[7] P. Cimiano, A. Hotho, and S. Staab. Comparing
EMNLP, 2010.

Conceptual, Divisive and Agglomerative Clustering for
[33] F. M. Suchanek, S. Abiteboul, and P. Senellart. PARIS:
Learning Taxonomies from Text. In ECAI, 2004.

Probabilistic Alignment of Relations, Instances, and
[8] C. d'Amato, V. Bryl, and L. Serafini. Data-driven logical
Schema. PVLDB, 5(3), 2011.

reasoning. In URSW, 2012.

[34] F. M. Suchanek, J. Hoffart, E. Kuzey, and
[9] C. d'Amato, N. Fanizzi, and F. Esposito. Inductive learning
E. Lewis-Kelham. YAGO2s: Modular High-Quality
for the Semantic Web: What does it buy? Semant. web,
Information Extraction. In German Database Symposium
1(1,2), Apr. 2010.

(BTW), 2013.

[10] J. David, F. Guillet, and H. Briand. Association Rule
[35] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core
Ontology Matching Approach. Int. J. Semantic Web Inf.

of semantic knowledge. In WWW, 2007.

Syst., 3(2), 2007.

[36] P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right
[11] L. Dehaspe and H. Toironen. Discovery of relational
interestingness measure for association patterns. In KDD,
association rules. In Relational Data Mining.

Springer-Verlag New York, Inc., 2000.

olker and M. Niepert. Statistical schema induction. In
[12] L. Dehaspe and H. Toivonen. Discovery of frequent
DATALOG patterns. Data Min. Knowl. Discov., 3(1), Mar.

1999.

Source: https://suchanek.name/work/publications/www2013.pdf

J Periodontol • August 2008 Subantimicrobial-Dose DoxycyclineModulates Gingival Crevicular FluidBiomarkers of Periodontitisin Postmenopausal Osteopenic WomenLorne M. Golub,* Hsi Ming Lee,* Julie A. Stoner,† Timo Sorsa,‡ Richard A. Reinhardt,§Mark S. Wolff,*i Maria E. Ryan,* Pirkka V. Nummikoski,¶ and Jeffrey B. Payne§ Background: We recently demonstrated that a 2-year subantimicrobial-

Alastair Baker 9/7/2013 Investigation and treatment of liver disease with acute onset – Local hospital protocol Defined as EITHER sudden onset of jaundice with evidence of liver aetiology OR incidental discovery of raised transaminases in association with symptoms suggesting acute onset Age of onset >3 months