Inded: a distributed knowledge-based learning system - intelligent systems, ieee [see also ieee expert]
INDED: A Distributed
Knowledge-Based
Learning System
Jennifer Seitzer, James P. Buckley, and Yi Pan, University of Dayton
SOMETIMES DEFINED AS THENON-
THE AUTHORS' INDED SYSTEM PERFORMS RULE DISCOVERY
trivial process of identifying valid, novel,potentially useful, and understandable patterns
USING THE TECHNIQUES OF INDUCTIVE LOGIC PROGRAMMING,
in data,
knowledge discovery in databases
AND ACCUMULATES AND HANDLES KNOWLEDGE USING
offers powerful solutions but requires sizablequantities of time and space.1 Data mining, part
A DEDUCTIVE NONMONOTONIC REASONING ENGINE.
of the knowledge-discovery process, attempts
TO SAVE TIME AND SPACE, THE AUTHORS RUN INDED
to reveal patterns within a database to exploitimplicit information that was previously un-
IN PARALLEL ON A BEOWULF CLUSTER.
known.2 An IF-THEN rule (IF antecedentTHEN consequent), where the antecedent andconsequent are logical conjunctions of predi-cates (first-order logic) or propositions (propo-
lacious and lead to nonsensical discovered
background knowledge, INDED houses a
sitional logic), often denotes such discovered
rules. Some data, however, exhibits enough
deduction engine that uses deductive logic
patterns.3 Graphs and hypergraphs are also
mutual exclusivity to render it partitionable
programming to compute the current state
used extensively as knowledge-representation
among processors. This examination of par-
(current set of true facts) as new rules and
constructs because of their ability to depict
titionability of data has been the underlying
facts are procured.
causal chains or networks of implications by
driving force of this work. A great deal of
interconnecting the consequent of one rule to
work has been done in parallelizing unguided
Inductive logic programming. ILP em-
the antecedent of another.
discovery of association rules.5,6 The novel
bodies a new research area in artificial
In our system INDED (
induction-
deduc-
aspects of our work include the paralleliza-
intelligence that attempts to attain some
tion, pronounced "indeed"), using the lan-
tion of both a nonmonotonic reasoning sys-
machine-learning goals while using logic-
guage of logic programming, we use a
tem and an inductive logic programming
programming techniques, language, and
hypergraph to represent the knowledge base
learner. In this article, we describe the
methodologies. Others have applied ILP to
from which rules are mined. Because the
schemes we have explored and are exploring
data mining, knowledge acquisition, and sci-
hypergraph gets inordinately large in IN-
in this pursuit. We also present our data-par-
entific discovery.7 An ILP system aims to
DED's serial version,4 we have devised a
titioning algorithms that we based on data
output a rule that covers (entails) an entire
parallel implementation that creates smaller
set of positive observations, or examples,
and excludes or does not cover a set of neg-
In this article, we investigate the integrity
ative examples.8 To construct this rule, ILP
and meaning of decomposing data so that
INDED's serial implementation
uses a set of known facts and rules (knowl-
many processors can attempt to learn the
edge) called domain or
background knowl-
same global pattern simultaneously (although
Our knowledge-discovery system INDED
edge. In essence, ILP tries to synthesize a
locally, each discovered pattern is usually
uses inductive logic programming7 as its dis-
logic program, or at least part of a logic pro-
unique). Many data decompositions are fal-
covery technique. To maintain a database of
gram, using examples, background knowl-
1094-7167/00/$10.00 2000 IEEE
IEEE INTELLIGENT SYSTEMS
father(george, catherine).
married(X,Y) ← married(Y,X).
mother(catherine, mary).
ancestor(X,X) ←.
married(catherine, henry).
ancestor(X,Z) ← ancestor(X,Y), mother(Y,Z).
father(henry, mary).
ancestor(X,Z) ← ancestor(X,Y), father(Y,Z).
mother(elizabeth, henry).
relative(Y,Z) ← ancestor(X,Y), ancestor(X,Z).
married(mary, bob).
relative(X,Y) ← relative(Y,X).
inlaw(X,Z) ← relative(X,Y), married(Y,Z), relative(X,Z).
Figure 1. An extensional database used for the family
inlaw(X,Y) ← inlaw(Y,X).
tree example.
Figure 2. A family tree intensional database.
edge, and an entailment relation. The fol-lowing definitions come from
Inductive
father(george,catherine) ←.
Logic Programming by Nada Lavrac and
mother(catherine,mary) ←.
married(catherine,henry) ←.
father(henry,mary) ←.
Definition 2.3 (coverage). Given back-
mother(elizabeth,henry) ←.
ground knowledge B, hypothesis H, and
married(mary,bob) ←.
example set E, hypothesis H
covers
married(catherine,henry) ← married(henry,catherine).
example
e ∈ E with respect to B if B
married(bob,mary) ← married(mary,bob).
ancestor(elizabeth,mary) ← ancestor(elizabeth,george),
Definition 2.4 (complete). A hypothesisH
mother(george,mary).
is
complete with respect to back-
ancestor(elizabeth,mary) ← ancestor(elizabeth,henry),
ground B and examples E if all positive
father(henry,mary).
examples are covered—that is, if for all
ancestor(catherine,henry) ← ancestor(catherine,george),
e ∈ E+, B H =
e.
mother(george,henry).
Definition 2.5 (consistent). A hypothesisH
relative(george,mary) ← ancestor(henry,george),
is
consistent with respect to back-
ancestor(henry,mary).
ground B and examples E if no negative
inlaw(bob,elizabeth) ← relative(bob,george),
examples are covered—that is, if for all
e ∈ E–, B H
e.
Definition 2.6 (formal problem state-
ment). Let E be a set of training examplesconsisting of
true E+ and
false E– ground
Figure 3. A family tree ground instantiation section.
facts of an unknown (target) predicate T.
Let L be a description language specify-ing syntactic restrictions on the definitionof predicate T. Let B be background
tions of these semantics, for this article, we
Framework of operation. The input files
knowledge defining predicates
qi that
can intuitively accept stable and well-
to INDED that initialize the system are an
may be used in the definition of T and
founded models as those sets of facts gener-
extensional database and an
intensional
that provide additional information about
ated by transitively applying modus ponens
database. The EDB comprises initial ground
the arguments of the examples of predi-
to rules.) This deduction engine is, essen-
facts (facts with no variables, only con-
cate T. The ILP problem is to produce a
tially, a justification truth maintenance sys-
stants). For example, the EDB in Figure 1
definition H for T, expressed in L, such
tem that accommodates nonmonotonic
comes from the universal family tree do-
that H is
complete and
consistent with
updates in the forms of positive or negative
main.12 The family tree is a canonical appli-
respect to the examples E and back-
cation of logic programming and exempli-
ground knowledge B.
The induction engine, using the current
fies the ability to represent relations with a
state that the deduction engine creates as the
Serial architecture. INDED comprises two
background knowledge base, along with pos-
The IDB consists of universally quantified
main computation engines. The deduction
itive examples E+ and negative examples E–,
rules; we assume the syntactic constraint that
engine is a bottom-up reasoning system that
induces rules that we can then use to augment
each IDB contains no constants, only vari-
computes the current state by generating a
the deductive engine's hypergraph. INDED
ables. Figure 2 displays one possible IDB
stable model, if one exists, of the current
uses a standard top-down hypothesis con-
used in the family tree domain.
ground instantiation represented internally
struction algorithm (learning algorithm).7
Together, the EDB and IDB form the inter-
as a hypergraph, and by generating the well-
Two user-input values that indicate suffi-
nal ground instantiation represented inter-
founded model,9 if no stable model exists.10
ciency—and necessity—stopping criteria
nally as the deduction engine hypergraph.
(Although we have cited the formal defini-
dictate termination.
Because this process enumerates all possible
heir(mary,henry).
heir(mary,george).
mother(catherine,mary).
heir(mary,catherine).
heir(catherine,george).
father(henry,mary).
married(mary,bob).
Figure 5. Positive example set to learn heir(X,Y).
married(mary,bob).
married(bob,mary).
heir(bob, george).
heir(bob, catherine).
relative(george,mary).
heir(catherine, george).
inlaw(bob,elizabeth).
heir(george, catherine).
ancestor(mary, george).
heir(bob,mary).
mother(george, bob).
heir(george,mary).
relative(mary, bob).
inlaw(mary, bob).
heir(mary,bob).
inlaw(catherine, mary).
heir(catherine,henry).
Figure 4. Part of the family tree domain background knowledge.
Figure 6. A negative example set to learn heir(X,Y).
combinations of instantiations of constants
attempt to learn a rule that answers the
to variables, the ground instantiation grows
question, "What is an heir?" Each fact of
heir(X,Y) ← mother(Y,X).
exponentially. Figure 3 shows a small part of
E+ represents a pair of constants that relate
heir(X,Y) ← ancestor(Y,X).
the ground instantiation used in the family
to one another through the
heir(X,Y) rela-
tree example.
tion and can be read "
X is the heir of Y."
Figure 7. Learned rules defining heir(X,Y).
The ground rules do not necessarily make
Figure 5 displays a positive example set
semantic sense. Currently, the instantiation
exemplifying the notion.
process is an exhaustive, mechanical proce-
Additionally, the induction engine also
dure that assigns all possible combinations
uses a negative example set E–. This set of
that do not depict the relationship of
of all constants to the variables. Part of our
negated ground facts also pertain to the tar-
heir(X,Y). Figure 6 shows the negative
current research involves devising ways of
get predicate. These facts represent pairs
examples employed for our family tree
limiting the instantiation process so as to pro-
that do not exhibit the relationship ex-
duce a smaller hypergraph.13
pressed by the target predicate. In our exam-
Ultimately, from the three input files, E+,
The inference engine (also called deduc-
ple, these facts indicate pairs of constants
E–, and B, the induction engine produces a
tion engine) operates on the ground instan-tiation to deduce the current state, or equiv-alently, the domain background knowledge.
The current state is syntactically similar to
Positive examples
the EDB; it is a collection of facts. Here,
however, the engine includes both positiveand negative facts. Again, because of the
huge size, Figure 4 shows a small part of the
deduced domain background knowledge.
With the generated domain background
Negative examples
knowledge base, we can induce new knowl-
edge in the form of implications. In IN-
DED, the background knowledge serves asinput to the induction engine for this pur-pose. The induction engine also uses a pos-
Background knowledge
itive example set E+. This is a set of ground
facts that pertain to the target predicate (thepredicate that is the consequent—equiva-lently, the head—of the implication beinglearned). In the family tree example, we
Figure 8. Inded's induction and deduction engines.
IEEE INTELLIGENT SYSTEMS
set of clauses that define the target predi-cate. That is, for each clause (equivalently,
Input: Target example sets E
= E+ ≈ E–, domain knowledge B,
rules or implications), the target predicate
serves as the head, and the chosen predi-
Output: Intensional rule(s) of learned hypothesis H
cates from the background knowledge B
BEGIN ALGORITHM 2.7
form the rule antecedent (body). Figure 7
while ( Number Pos examples covered / E+
) ≤
displays the induced clauses that define
Make new intensional rule:
In every iteration, as control shifts from
–head is the target predicate found in E = E+ ≈ E–
the deduction to the induction engine,
–make body as follows:
INDED learns exactly one target predicate
while (Number Neg examples covered/ E–
) ≥
and produces one set of clauses, in the form
of implications, which define the target
- Add literal to the body by choosing
predicate. The following framework gen-
the highest ranked unchosen predicate
eralizes the above integration of a bottom-
symbol in the background knowledge base
up (forward reasoning) nonmonotonic
- Choose the variables of this literal
deductive system with a top-down, ILP
learning system. The actions reflect IN-
Append the new intensional rule to the rule set H to be returned
DED's iterative, synergistic behavior. The
following lists the steps of one iteration (see
END ALGORITHM 2.7
• Compute the ground instantiation of
Figure 9. Algorithm 2.7. We have implemented this generic top-down hypothesis algorithm in Inded. The algorithm
logic program P called PG, using the
forms the underpinnings of other top-down ILP systems such as FOIL and Golem.7
EDB and IDB as inputs. Store PG in a
• Compute the current state B of the
knowledge base using well-founded
INDED's induction engine returns the set of
one atom is not assigned true or false), we
models, stable models, or both of PG.
intensional rules in the order of generation.
factor out the total part (the hypergraph part
• Induce hypothesis H, using E+ , E–, and For example, the first generated intensional housing atoms that were assigned) and
B (as the domain background knowledge
rule is the first in the returned set.
assign truth values to the remaining sub-
base) as inputs to the induction engine.
Algorithm 2.7, a standard hypothesis
graph by finding the first truth assignment
• Augment the IDB with newly learned
construction algorithm (learning algo-
that is a stable model, if such an assignment
intensional rules.
rithm) used in INDED, (see Figure 9) uses
exists. The resultant set of true facts along
and combines the answers to the five ques-
with the negations of the false facts form
Induction engine. Any ILP system's ulti-
tions. A generic top-down ILP hypothesis
the background knowledge base, which
mate goal is to generate a set of intensional
construction algorithm typically uses two
INDED uses to indicate the current state in
rules. Creating this set of rules requires
nested programming loops. The outer (cov-
the deduction engine, or which INDED can
answers to five questions. Here, each answer
ering) loop attempts to cover all positive
use to induce more knowledge in the induc-
manifests as an algorithm in INDED's induc-
examples, while the inner loop (special-
tion engine.
ization) attempts to exclude all negativeexamples.
How many distinct rules must be cre-
ated for the same target predicate?
The deductive engine. INDED's deduc-
Apply Covering Algo to
E+.
tive-reasoning system represents the
Memory limitations greatly limit the prob-
How do we choose the target predicate
ground instantiation
P of the combined
lems that the serial version can address. By
variables? Apply Reflexive Nam-
EDB and IDB as a hypergraph
PG. INDED
parallelizing INDED we aim to
represents each atom
a ∈
P as a vertex with
For any given rule, how many literals
a set of incoming hyperedges, where each
output faster learned rules and obtain
must we include in the rule body? Apply
hyperedge corresponds to one rule body of
higher-quality rules than does serial
Covering Algo to E–.
which
a is the head. Also associated with
For any given rule, which literal do we
each vertex is a set of outgoing hyperedge
increase the problem space by decreas-
choose? Apply Predicate Ranking
parts, each corresponding to one (positive
ing the size of the internal deduction
or negative) appearance of
a in the body of
For any chosen literal, how do we
some rule
r ∈
P. To compute the current
choose the constituent variables? Apply
state, INDED first acquires the well-
In our pursuit to parallelize INDED, we
Chosen Pred Variable Naming
founded model using the Bilattice algo-
are investigating many schemes. We have
rithm.14 If the model is not total (at least
designed each scheme for implementation on
a Beowulf cluster—a collection of personal
Our strategy lets the induction engine
all specified target predicates.
computers coordinated to form a supercom-
initially discover a target predicate from
Extending this implementation, we acquire
puter.15 Physically, a local area network in-
positive and negative examples and an ini-
a pipelined system where the deduction
terconnects the computers. The collective
tial background knowledge base. Mean-
engine computes state
Si+1 while the induc-
execution of a set of protocols forming a
while, the deduction engine computes the
tion engine uses
Si to induce new rules (where
portable parallel-programming package such
current state using the initial input files.
i is the current iteration number).
as MPI (Message Passing Interface) or PVM
This current state is sent to the induction
(Parallel Virtual Machine) handles software
engine as its background knowledge base
Data-parallel decomposition with data
in the subsequent iteration. INDED then
partitioning. In this method, each worker
We are using the following paralleliza-
feeds the learned predicate from the induc-
MPI node runs INDED when invoked by the
tion schemes, each of which addresses one
tion engine from one iteration into the
master MPI node.16 Each worker executes by
of our previously stated parallelization
deductive engine for use during the next
running a partial background knowledge
iteration in its computation of the current
base which, as in the serial version, its deduc-
state. The induction engine then uses the
tion engine spawns. Worker nodes create the
large-grained-control parallel decompo-
current state as the background knowledge
partial background knowledge bases in one
sition, where one cluster node runs the
for the induction engine during the subse-
of two ways.
induction engine while another node runs
In the first method, each worker receives
the deduction engine;
the full serial IDB and a partial EDB. Using
large-grained-control parallel decompo-
a partial EDB creates a significantly
sition, where we establish a pipeline of
smaller (and different) hypergraph on each
processors, each operating on a different
THREE TYPES OF LOCALITY
Beowulf worker node. In some cases, we
current state as created in previous (or
have found the accuracy of rules obtained
OF REFERENCE CAN
subsequent) pipelined iterations;
collectively by the processors housing the
data-parallel decomposition, where each
COEXIST IN A KNOWLEDGE
smaller hypergraphs to equal those ob-
node runs the same program with smaller
tained in the original serial system. More-
BASE SYSTEM: SPATIAL,
input files (hence smaller internal hyper-
over, by using this parallel version, we can
grapple with problems involving larger
TEMPORAL, AND FUNCTIONAL.
a speculative parallel approach, where
knowledge bases than those workable on
each node attempts to learn the same rule
the serial system. This decomposition leads
using a different predicate-ranking algo-
to a faster execution owing to the signifi-
rithm in the induction engine.
cantly smaller internal hypergraph being
quent iteration.
built. The challenge is to determine the best
Naive and pipelined decomposition. Be-
During iteration
i, the induction engine
way to decompose the serial EDB into
cause the induction and deduction engine
generally computes new intensional rules for
smaller ones so that the obtained rules were
depend on each other as shown in Figure 8,
the deduction engine to use in its computa-
as accurate as those learned by the serial
parallelizing the two engines is difficult. In
tion of the current state in iteration
i + 1.
this decomposition, we create a very coarse-
Simultaneously, during iteration
i, the deduc-
The second method consists of creating
grained system in which two nodes share
tion engine computes a new current state for
a partial background knowledge base by
the execution. One node houses the deduc-
the induction engine to use as its background
directly partitioning the generated back-
tion engine; the other houses the induction
knowledge in iteration
i + 1. This process
ground knowledge from the deduction en-
repeats until the deduction engine discovers
gine. Because the induction engine orga-
Table 1. Temporal and functional locality in data.
ancestor(mary,mary).
ancestor(george,mary).
Figure 10. Background knowledge partitioning for
worker nodes.
IEEE INTELLIGENT SYSTEMS
nizes the background knowledge by pre-
ular brand of toothpaste on Monday, we
receive a full copy of the serial IDB. The ser-
dicate expression, and no precedences are
will see numerous sales for this brand on
ial EDB—the initial set of facts, therefore—
posed between the predicate expressions,
is decomposed and partitioned among the
partitioning the background knowledge is
In functional locality of reference, we
nodes. The algorithm in Figure 11 trans-
quite easy. Each node receives small back-
appeal to a semantic relationship between
forms a large serial EDB into
p smaller
ground knowledge bases solely comprising
entities that can reside physically removed
EDBs to be placed on
p Beowulf nodes. It
atoms related to one (or a small number of)
from one another and those that the data-
systematically creates sets based on con-
predicates. For example, Figure 10 shows
base can represent in different sections.
stants appearing in the positive example set
one such part from partitioning our back-
Items exhibiting functional locality of ref-
E+. Some facts from the serial EDB could
ground knowledge base for our family tree
erence, however, have a strong semantic tie
appear in more than one processor.
that affects data transactions relating to
Load balancing is another interesting
them. We can exploit all three of these
Global hypergraph using speculative
challenge that we have undertaken in this
localities in distributed knowledge mining
parallelism. In this parallelization, each
decomposition. To keep execution time
and help justify the schemes adopted in our
Beowulf node searches the space of all pos-
roughly equal on all worker nodes, we are
implementations.
sible rules independently and differently.
attempting to keep data files roughly equal
All machines have the same input files.
in size. Because generally this is an NP-
Therefore, each worker is discovering from
complete problem similar to bin-packing,
the same background knowledge base.
we employ an approximation algorithm
Every rule discovered by INDED is con-
where a binary heap organizes each predi-
structed by systematically appending cho-
cate grouping.
sen predicate expressions to an originally
TO RETAIN ALL GLOBAL
empty rule body. Employing various algo-
Data partitioning and data locality. In our
rithms, each of which designates a different
pursuit of partitioning the EDB, we found
search strategy, ranks the predicate expres-
that data transactions frequently exhibited a
THE PREDICATES IN THE
sions. INDED chooses the highest-ranked
form of locality of reference and based our
expressions to constitute the rule body
CURRENT STATE, ALL
partitioning schemes on this observation.
under construction. In this parallelization of
Cache systems ardently exploit locality of
BEOWULF NODES RECEIVE
INDED, each Beowulf node employs a dif-
reference, where the general area of mem-
ferent ranking procedure and hence can con-
A FULL COPY OF THE SERIAL
ory referenced by sequential instructions
struct very different rules. The processes
tends to be repeatedly accessed. Because
execute concurrently, although asynchro-
locality of reference in the context of knowl-
nously, depending on the processors' avail-
edge discovery also exists, we attempt to
exploit it to increase the efficiency of rule
We have implemented two methods for
mining. According to a precept of knowl-
Example data set exhibiting locality. We
handling the rules each worker generates.
edge discovery, data in a knowledge base
now show an example data set that encom-
In the first, as soon as a process converges
system are nonrandom and tend to cluster
passes the notions of temporal and functional
(finds a valid set of rules), it broadcasts a
somewhat predictably. This tendency mim-
locality of reference. The data pertains to a
message to announce the procedure's end.
ics locality of reference. Three types of
supermarket enterprise and maintains infor-
When other processes receive the message,
locality of reference can coexist in a knowl-
mation about various items sold in a sequen-
they terminate. So, the processor that fin-
edge base system:
spatial,
temporal, and
tial, date-ordered manner. Each data set seg-
ishes first procures the learned rule. The
ment represents a small snapshot of the data
other method compares each worker's rules.
In spatial locality of reference, certain data
for a particular time period.
Different processes can generate different
items appear together physically in a data-
In Table 1, the data is from a store that
rules owing to the use of different ranking
base. For example, a supermarket is divided
had a coupon for milk during the indicated
algorithms. the master node then automati-
into several departments, each maintaining a
time period, forming a temporal locality
cally verifies each rule by testing the cov-
section in the database. Clearly, the database
among many of the purchased items. In par-
erage of a separate set of E+ and E–. Each
groups information about the sale of chil-
ticular, 30% of all items purchased were
rule is assigned a verification ratio that con-
dren's toys in one section and stores the sale
milk, and 75% of all transactions contained
veys the accuracy of the learned rule on
information about different vegetables in a
a milk purchase. We also demonstrated
these new (verification) examples. Syntac-
different section. To exploit this locality, we
functional locality. We see here that when
tically, the verification examples are iden-
can mine different rules on different sections
milk is on sale, customers tend to also pur-
tical to the examples used to learn the rule.
of data independently and simultaneously
chase cereal. In the data set, we see that
However, we reserve the verification exam-
using concurrency.
50% of the purchases are either milk or
ples for the verification process rather than
In temporal locality of reference, the data
use them for the learning process. Imple-
items that have been used in the recent past
Partitioning algorithm. To retain all
menting both methods has let us accelerate
tend to appear in the near future. For exam-
global dependencies among the predicates
the mining process as well as achieve better
ple, if a supermarket has a sale on a partic-
in the current state, all Beowulf nodes
and richer solutions.
number of nodes increased.
Input: Number of processors
p in Beowulf
The problem domain with which we are
Serial extensional database (EDB)
experimenting relates to the diagnosis of dia-
Positive example set E+
betes. The accuracy rules discovered by the
Negative example set E–
cluster has varied. The rule serial INDED
Output: p individual worker node EDBs
BEGIN ALGORITHM 3.1
For each example
e ∈ E+ E– Do
For each constant
c ∈
e Do
We attribute the variance of rule accuracy
create an initially empty set
Sc of facts
by the clusters to our partitioning algorithm.
Create one (initially empty) set
Snone for facts that have
We anticipate extensive refinement of this
no constants in any example
e ∈ E+ E–
algorithm as we continue this work.
For each fact
f ∈ EDB Do
The speculative parallel implementa-
For each constant
c′ ∈
f Do
tions, produced a dramatic speedup when
Sc′ =
Sc′
f
all ranking algorithms are tested for each
If no set exists for
c then
execution. For this implementation, we
Snone =
Snone f
have implemented an automatic rule veri-
Distribute the contents of
Snone among all constant sets
fier running on each worker. This verifier
Determine load balance by summing all set cardinalities
numerically quantifies the accuracy of dis-
to reflect total parallel EDB entries
K
covered rules and, therefore, enables the
Define
min_
local_
load = [
K/
p]
master to use a quantitative method for rule
Distribute all sets
Sci where 1 ≤
i ≤
m, (
m num constants in EDB)
evenly among the processors so that each processor has an EDB ofroughly equal cardinality such that each node has an EDB ofcardinality ≥
min_
local_
load as defined above.
END ALGORITHM 3.1
Figure 11. Algorithm 3.1 (EDB partitioning algorithm). This algorithm is O(n), where n is the number of facts in theextensional database.
WE LOOK FORWARD TO EXTEN-
sive experimentation with different parti-tioning algorithms of the EDB as well as with
the background knowledge in the data-par-allel parallelization scheme. Particularly, we
intend to refine the partitioning algorithm toconsider well-connected components formedby constant appearances in chains of rule
antecedents and consequents. We anticipatea better decomposition of the EDB using
well-connected components of a represent-ing constant graph. Additionally, we intend tocontinue experimentation with the specula-
tive-parallelization scheme, and are enhanc-
ing and devising new predicate-ranking algo-
Number of processors
rithms used by the induction engine. Twoalgorithms of particular interest use set the-
Figure 12. Performance of rule mining on a cluster.
oretic operations for ranking predicates.
Additionally, we have found that one of
Current status and results
system. The naive decomposition reduced
the most interesting problems of parallel rule
execution time by 50%. The data-parallel
discovery is effective data partitioning, and
We have implemented the four paral-
implementations also experienced greatly
we have presented a data-partitioning algo-
lelization schemes for INDED, a large
reduced execution time. Figure 12 illus-
rithm suited for parallelizing ILP discovery
knowledge-based learning and reasoning
trates the consistent reduction of time as the
systems. We have also experimented in the
IEEE INTELLIGENT SYSTEMS
domain of diabetes diagnosis using a Beo-
Rules,"
Seventh IEEE Symp. Frontiers of
wulf cluster and have found that computa-
Massively Parallel Computation, IEEE Com-
tion time decreased dramatically due to the
1. "Knowledge Discovery in Databases: An
puter Soc. Press, Los Alamitos, Calif., 1999,
Overview,"
Knowledge Discovery in Data-
pp. 234–241.
substantially smaller internal hypergraphs
bases, G. Piatetsky-Shapiro and W.J. Fraw-
generated in each worker node.
ley, eds., AAAI Press/The MIT Press, Cam-
7. N. Lavrac and S. Dzeroski,
Inductive Logic
bridge, Mass., 1991, pp. 1–30.
Programming, Ellis Horwood, Chichester,UK, 1994.
2. M.S. Chen, J. Han, and P.S. Yu, "Data Min-
ing: An Overview from a Database Perspec-
8. S. Muggleton, ed.,
Inductive Logic Program-
tive,"
IEEE Trans. Knowledge and Data Eng.,
ming, Academic Press, San Diego, Calif.,
Vol. 8, No. 6, Dec. 1996, pp. 866–883.
3. J.R. Quindlan, "Induction of Decision Trees,"
9. A. VanGelder, K. Ross, and J. Schlipf, "The
Machine Learning, Vol. 1, 1986, pp. 81–106.
Well-Founded Semantics for General LogicPrograms,"
J. ACM, Vol. 38, No. 3, July 1991,
We thank the students of the Learning with Rea-
4. J. Seitzer, "INDED: A Symbiotic System of
pp. 620–650.
son research group for their help. In particular, we
Induction and Deduction,"
Proc. 10th Mid-
extend our appreciation to Lee Adams, Timothy
west Artificial Intelligence and Cognitive Sci-
10. M. Gelfond and V. Lifschitz, "The Stable
Denehy, Kevin Livingston, and Madhavi Yele-
ence Conf. (MAICS-99), AAAI Press, Menlo
Model Semantics for Logic Programming,"
swarapu. Grants from the following partially sup-
Park, Calif., 1999, pp. 93–99.
Proc. Fifth Logic Programming Symp., 1990,
ported this work: the National Science Foundation
pp. 1070–1080.
(Grants CCR-9211621, OSR-9350540, CCR-
5. M.J. Zaki, S. Parthasarathy, and M. Ogi-
9503882, and EIA-9806184); the Air Force Office
haram, "Parallel Algorithms for Discovery of
11. J. Doyle, "A Truth Maintenance System,"
of Scientific Research (Grant F49620-93-C-0063);
Association Rules,"
Data Mining and Knowl-
Artificial Intelligence, Vol. 12, 1979, pp.
the Air Force Avionics Laboratory, Wright Labora-
edge Discovery, Vol. 1, 1997, pp. 5–35.
tory (Grant F33615-C-2218); the Ohio Board ofRegents Investment Fund Competition; and the
6. L. Shen, H. Shen, and L. Chen, "New Algo-
12. L. Sterling and E. Shapiro,
The Art of Prolog,
Ohio Board of Regents Research Challenge.
rithms for Efficient Mining of Association
MIT Press, Cambridge, Mass., 1994.
M A R C H / A P R I L 1 9 9 9
Robots & Education
Guest Editor: Robin Murphy, University of South Florida
How will we educate the next generation of engineers and scientistsabout and with robots?
This issue will report educational experiences with robots and provide
recommendations for further efforts.
Room service, AI-style
Gleaning the Web
Online maps: help or hindrance?
DARPA's High-Performance Knowledge Base program
IEEE Intelligent Systems
AI News You Can Use
13. T. Denehy and J. Seitzer, "A Hypergraph Rep-
both intelligent systems and computer network-
in computer science. He is a member of the High-
resentation for Deductive Reasoning Sys-
ing. Her projects involve work in distributed
Performance Intelligent Knowledge-Based Sys-
tems." 11th Midwest Artificial Intelligence
knowledge-based systems, meta-pattern discov-
tems research group at UD. Contact him at the
and Cognitive Science Conf., AAAI Press,
ery with an emphasis on cycle mining, and col-
Dept. of Computer Science, Univ. of Dayton, 300
Menlo Park, Calif., 2000, pp.74–77.
laborative computing. She received her BS in com-
College Park, Dayton, OH 45469-2160; buckley@
puter science from Arizona State University and
14. J. Seitzer, A Study of the Well-Founded and
her MS and PhD (for her work in theoretical arti-
Stable Logic Programming Semantics, doc-
ficial intelligence) in computer science from the
toral dissertation, Dept. of Electrical and
University of Cincinnati. She is a member of the
Yi Pan is an associate professor in the Department
computer Engineering and Computer Sci-
ACM and AAAI. Contact her at the Dept. of Com-
of Computer Science at Georgia State University.
ence, Univ. of Cincinnati, Cincinnati, Ohio,
puter Science, Univ. of Dayton, 300 College Park,
He was a faculty member in the Department of
Dayton, OH 45469-2160; seitzer@ cps.udayton.
Computer Science at the University of Dayton. He
15. R. Buyya, High Performance Cluster Com-
is an area editor in chief of the Journal of Infor-
puting Programming and Applications, Pren-
mation, an editor of the Journal of Parallel and Dis-
tice-Hall, Upper Saddle River, N.J., 1999.
tributed Computing Practices, an associate editorof the International Journal of Parallel and Dis-
16. W. Gropp, E. Lusk, and A. Skjellum, Using
tributed Systems and Networks, and on the edito-
MPI; Portable Parallel Programming with
rial board of The Journal of Supercomputing. He
Message Passing Interface, MIT Press, Cam-
James P. Buckley is an assistant professor at the
received his BEng in computer engineering from
bridge, Mass., 1999.
University of Dayton. His interests include the
Tsinghua University, China, and his PhD in com-
general area of intelligent database systems, with
puter science from the University of Pittsburgh.
a specific focus on data mining, uncertainty man-
He is an IEEE Computer Society Distinguished
agement, and active databases. He has published
Visitor, a senior member of the IEEE, and a mem-
numerous papers in the areas of fuzzy logic and
ber of the IEEE Computer Society. Contact him at
Jennifer Seitzer is an assistant professor in the
databases, formal database semantics, data min-
the Dept. of Computer Science, Georgia State
Department of Computer Science at the Univer-
ing, and parallel knowledge-based systems. He
Univ., Atlanta, GA 30303; [email protected]; www.
sity of Dayton. Her research and study involves
received his ME and PhD from Tulane University
cs.gsu.edu/ cscyip.
Areas of expertise include
■ Signal Processing
■ Professional Resources
A comprehensive, peer-reviewed
resource for the scientific
Source: http://grid.cs.gsu.edu/~cscyip/papers/ieeeis00.pdf
Shohei Juku Aikido Canada September, 2005. Number 9 Continuing On With Renewed Energy The short summer went by so quickly. How was everyone's summer? Did you all enjoy plenty of sunshine? The weather isbeginning to show a touch of autumn as we see the leaves around us changing. Autumn. In this season, there is a Japanese song that I always think about and often find myself singing.
II FORUM ECONOMICO DEL MEDITERRANEO Roma, 25–26 febbraio 2010 Documentazione di supporto A cura del Settore Crediti Corporate Presenza e operatività del sistema bancario italiano nei Paesi della Sponda Sud del Mediterraneo ed altri elementi di approfondimento su questioni economico-finanziarie