## Deamo.prof.ufu.br

MILPRIT

*∗*: A Constraint-Based Algorithm for Mining
Temporal Relational Patterns

*∗*
Sandra de Amo1, Waldecir P. Junior1, Arnaud Giacometti2
1Faculdade de Computac¸˜ao – Universidade Federal de Uberlˆandia (UFU)
Campus Santa Mˆonica, Bloco B – Uberlˆandia – MG – Brazil
2Universit´e de Tours, Laboratoire d'Informatique,
2, Place Jean Jaur es 45000 Blois, France,

*Abstract. In this paper, we consider a new kind of temporal pattern where bothinterval and punctual time representation are considered. These patterns, whichwe call temporal point-interval patterns aim at capturing how events takingplace during different time periods or at different time instants relate to eachother. The datasets where these kind of patterns may appear are temporal re-lational databases whose relations contain point or interval timestamps. Weuse a simple extension of Allen's Temporal Interval Logic as a formalism forspecifying these temporal patterns. We also present the algorithm MILPRIT∗for mining temporal point-interval patterns, which uses variants of the classi-cal level-wise search algorithms. Besides, MILPRIT∗ allows a broad spectrumof constraints to be incorporated into the mining process. An extensive set ofexperiments of MILPRIT∗ executed over synthetic and real data is presentedshowing its effectiveness for mining temporal relational patterns.*
Keywords. Temporal Data Mining, Constraint-based Mining, Interval Temporal Logic.

1. Introduction and Motivation
The problem of discovering sequential patterns in temporal data has been extensivelystudied in the past years [Srikant and Agrawal 1996, Zaki 2001, Han et al. 2004] and itsimportance is fully justified by the great number of potential application domains wheremining sequential patterns appears as a crucial issue, such as financial market (evolutionof stock market shares quotations), retailing (evolution of clients purchases), medicine(evolution of patients symptoms), local weather forecast, telecommunication (sequencesof alarms output by network switches), etc. Most of these patterns are specified by for-malisms which are, in some extent, reducible to

*Propositional Linear *Temporal Logic,where time is represented by points in a straight line. For instance, let us consider aclassical sequential pattern of the form

*s *=

*< {a, b}, {c, d} > *(where

*{a, b} *and

*{*c,d

*}*are sets of items purchased by a client). This sequential pattern can be expressed inthe Propositional Linear Temporal Logic by the formula

*Pa ∧ Pb ∧ *♦(

*Pc ∧ Pd*) wherefor each

*i ∈ {a, b, c, d}*,

*Pi *is a propositional symbol standing for "client buys item

*i*".

The symbol ♦ stands for the temporal operator

*sometimes in the future *(for a compre-hensive survey on Linear Temporal Logic, see [Emerson 1990]). The need for a more

*∗*This research has been supported by CNPq-Brazil, Project 473309/2004-1
expressive kind of temporal patterns arises for instance, when modelling Unix-users be-haviour [Jacobs and Blockeel 2001]. Consider, for instance1, the following sequence ofcommands related to latex users:

*ls, vi paper.tex, latex paper.tex, dvips paper.dvi, lprpaper.ps*. This sequence of commands can be represented as a sequence of relational(or first-order) atoms of the form:

*ls*,

*vi*(

*paper.tex*),

*latex*(

*paper.tex*),

*dvips*(

*paper.dvi*),

*lpr*(

*paper.ps*). Within such a database of sequences of relational atoms, it would be pos-sible to discover that the relational sequence pattern

*vi*(

*X*),

*latex*(

*X*) is frequent. Noticethat this pattern tell us that the sequence of commands (predicates)

*vi*,

*latex *is frequentlyrequested by users. In order to specify this kind of sequential patterns, we need a morepowerful logical formalism, the First-Order Linear Temporal Logic, since the elementsinvolved in the patterns include both predicates and their parameters.

In all the above examples illustrating the propositional and relational settings of
the sequential pattern mining problem, the events are instantaneous, that is, the time whenthey happen is represented as a point (instant) in a straight line. The sequence of eventscorresponds to a sequence of instants when these events take place. However, there aremany situations where events have a certain

*duration*, and so, the underlying time is mea-sured in terms of

*intervals *instead of points. In this paper, we propose a new tempo-ral pattern involving a hybrid representation for time and a new algorithm (MILPRIT

*∗*)for mining them. These patterns, which we call

*temporal point-interval patterns *(or pi-patterns for short), aim at capturing how events taking place in time intervals or timeinstants relate to each other. For instance, (1) in a

*medical *application, we could be inter-ested in discovering if patients who take some medicine

*X *during a certain period of time

*P *, and in some moment

*m *in

*P *undergo a stomach surgery, will present the symptom

*Z *during a period of time beginning right after the surgery and finishing as soon as theystop taking the medicine

*X*; (2) in an

*agricultural *application, we could be interested indiscovering if the use of some organic fertilizer during a period of time has an effect onthe way a plant grows

*during *and

*after *the fertilizer application. The following exampleillustrates the medical application in more details.

Example 1 (Running Example) Let us consider the database schema R =

*{Patient, Med,Symp, Hist} *and the relational database instance

*D *over R illustrated in Figure 1. Noticethat the following behavior is verified by 50% of the patients (two patients out of four inthe

*Patient *relation verified it, namely, Paul and Sarah):

*the patient takes penicillin duringa certain period of time e; during a period of time f following his taking the medicine, hefeels dizzy and undergoes a stomach surgery someday t during f*.

Temporal pi-patterns are specified as a set of atomic first order formulae where
time is represented by an interval-time variable or by a point-time variable, together witha set of temporal predicates

*{ before, meets, overlaps, during, starts, finishes }*. Ourlogical formalism for specifying pi-patterns is based on Allen's First Order Interval Logic[Allen and Ferguson 1994]. For instance, the temporal pattern described in Example 1 isspecified by the triple (

*K*,

*D*,

*T *) where:

*K *=

*Patient*(

*x*) (where

*x *is a registered patient),

*D *=

*{Med*(

*x, penicillin, e*),

*Symp*(

*x, dizziness, f *),

*Hist*(

*x, st.surgery, t*)

*} *(representingthe events which take place in the pattern),

*T *=

*{before*(

*e, f *),

*during*(

*t, f *)

*} *(representingthe relationships verified by the time parameters associated to the pattern events).

In most applications, the lack of user control for specifying the kind of patterns
1This example has been taken from [Lee and De Raedt 2002].

**Pname Mname**
**Pname Sname**
**Pname Event**
stomach surgery [5]
Charles tetracyclin [4,7]
Charles dizziness [5,6]
stomach surgery [9]
10/05/2000 15/11/2000
20/07/2001 06/09/2001 09/01/2002
23/08/2002 19/12/2002

**Figure 1. A Temporal relational database instance**
she would find interesting can lead to tedious and computationally costly post-processingphases. In this paper, we use a formalism based on regular expression in order to allowusers to specify the

*shape *of potentially interesting patterns. Our temporal mining methodMILPRIT

*∗ *is constraint-based: it incorporates constraints inside the mining process andin doing so, it accomplishes two goals: it saves a lot of calculation in the post-processingphase and it allows a non-negligent reduction in the search space of potential patterns. Thefollowing example illustrates the idea of using regular expressions to specify constraintson sequential patterns.

Example 2 (Running Example) Let us consider the same situation depicted in Example1. Users could be interested only in discovering pi-patterns where

*patients take somemedicine (maybe more than one) and at the end present some symptom*. Roughly speak-ing, we can say that these patterns fit into the regular expression format

*Med∗Symp*. In thiscase, it would be important that patterns representing situations where patients present alist of symptoms and

*after *that take some medicine, are not generated in the mining pro-cess, since such patterns do not fit in the given regular expression format.

Usually, in real world applications, it is important to allow different time gran-
ularities to be considered when evaluating pattern support, otherwise some interestingpatterns will not be mined. For instance, let us consider a database schema R =

*{R(A,T),S(B,T)}*, where

*T *is an interval-time attribute. If the values of the temporal attributesin the database are dates (day/month/year), the temporal predicate

*meets(e,f) *will be truein this database only if

*e *and

*f *are mapped into time intervals [

*d*1

*, d*2], [

*d*3

*, d*4] respec-tively, where

*d*1

*, d*2

*, d*3

*, d*4 are dates and

*d*2 =

*d*3. If there is very few pair of tuples(

*R*(

*a, *[

*d*1

*, d*2]),

*S*(

*a, *[

*d*2

*, d*4]) we are unable to mine patterns of the form

*(K(x), {R(x,e),S(x,f)}, {meets(e,f)})*. On the other hand, we can have a lot of pair of tuples (

*R*(

*a, *[

*d*1

*, d*2]),

*S*(

*a, *[

*d*3

*, d*4]) where

*d*2 and

*d*3 are different days but belong to the same week in the year.

In this case, if we relax the time granularity and consider

*week *to be the time unity,then

*d*2 and

*d*3 will be consider identical and hence, the pattern

*(K(x), {R(x,e), S(x,f)},{meets(e,f)}) *will be mined. The algorithm MILPRIT

*∗ *we present in this paper allowsusers to specify different time granularities when evaluating the pattern support.

Related Work. A lot of studies on constrained-based temporal mining, in a proposi-tional context, have been investigated in order to reduce the huge amount of patternswhich are generated in the relational mining context. [Srikant and Agrawal 1996] is a
pioneer work introducing constraints in sequential pattern mining in order to reduce ex-ecution time and the number of discovered patterns. In this work, the constraints (min-max, slide-window) are related to the way the support of patterns is counted and notto the patterns format. Recently, a lot of work on constrained-based temporal mininghave investigated formalisms to specify constraints over the pattern format. Regularexpressions have been adopted in many of these work as a very general mechanism tospecify user constraints. [Garofalakis et al. 1999] was a pioneer work in this field, in-troducing the SPIRIT family of algorithms for constraint-based sequential pattern min-ing. [Lorincza and Boulicaut 2003] and [Pei et al. 2002] extended this pioneer work, byproposing new and more efficient algorithms (RE-Hackle and Prefix-Growth respectively)for sequential pattern mining where regular expressions are used to specify useful patternsbeforehand. Such constraints are pushed inside the mining process instead of being ver-ified only in a post-processing phase. [Antunes and Oliveira 2005], following the linesof [Garofalakis et al. 1999], proposed a more general formalism, based on context-freegrammars, in order to specify constraints over sequential patterns.

In [de Amo and Furtado 2005], we tackled the constraint-based temporal min-
ing problem in the relational context, by proposing a formalism to specify con-straints over the temporal patterns we had introduced in [de Amo et al. 2004].

[Lee and De Raedt 2002], the algorithm SeqLog, based on Inductive Logic Program-ming, was introduced for discovering relational sequential patterns with constraints. In[Masson and Jacquenet 2003], the algorithm Spirit-Log, designed to mine relational tem-poral patterns was introduced. Spirit-Log extends SeqLog (where time is punctual), bypushing regular expression constraints inside the mining process. A comprehensive textabout constraint-based temporal

*relational *pattern mining following an inductive databaseapproach can be found in [Boulicaut et al. 2006]. In all these work, time is representedby

*points *in a straight line. The approach we propose in this paper is more general sinceit allows mining relational temporal patterns where time is represented by

*points *as wellas by

*intervals*. We notice that most real world applications involve both kind of timerepresentation.

In [Hoppner 2001], Allen's Propositional Interval Logic has been used for
the first time, to treat the problem of discovering association rules over time series[Allen and Ferguson 1994]. In [Lattner and Herzog 2005], an approach similar to oursis presented for temporal relational pattern mining where time is represented as intervals.

However, this approach is not constraint-based like ours. Moreover, no experimentalresults concerning the performance of the proposed method are provided, which makesdifficult a comparative analysis with our approach.

The issue of allowing user-specified different time granularities during sup-
port counting has been treated in the pioneer work of [Srikant and Agrawal 1996] and[Bettini et al. 1998], in the context of sequential pattern mining and time series. Here,patterns were specified in a formalism equivalent to propositional temporal logic. More-over, time attributes are

*point*-valued. We emphasize that our approach is more generalsince it is developed in a temporal

*relational *database context where time attributes maybe

*point*-valued as well as

*interval*-valued.

Our Contribution. We can summarize our main contributions as follows: (1) the in-troduction of a new temporal relational pattern allowing interval and point-based timerepresentation in relations, as well as a logical formalism to specify these patterns. (2) theintroduction of pattern constraints in the mining process, more precisely during the can-didate generation phase, (3) the possibility of considering different user-specified timegranularities during the support count phase, (4) the design and implementation of an al-gorithm (MILPRIT

*∗*) to mine such constrained temporal patterns and (5) an extensive setof experiments over synthetic and real datasets. This is an extended version of the paper[de Amo et al. 2007]. In that paper, we had only considered interval-based time represen-tation in relations and we had not treated user-specified time granularities. MILPRIT

*∗ *is ageneralization of the MILPRIT algorithm presented in [de Amo et al. 2007], designed tomining point-interval patterns with user-specified time granularities. The introduction ofpoint-based time representation required to consider a broader set of temporal predicatesin the generation phase. We took care to consider a minimal set of new temporal predi-cates in order to simplify the generation phase without losing completeness. MILPRIT

*∗*can be considered as a general algorithm for temporal relational pattern mining sinceit can be executed over a dataset of multiple tables with point-based and interval-basedtimestamps. It can be applied in any situation which can be modeled by a temporal re-lational database involving point-valued and interval-valued attributes. Medical applica-tions, for instance, are typical applications where both time representations (point-basedand interval-based) must be considered. In this paper, we focus on this kind of applicationwhen evaluating the efficiency of our algorithm (see Section 5.2).

Paper Organization. This paper is organized as follows. In Section 2, we formalize themain concepts related to the problem of mining pi-patterns. In Section 3, we propose aconstraint-based framework for mining pi-patterns and formalize all the concepts involvedin the presentation of the main algorithm (MILPRIT

*∗*). Section 4 is dedicated to the pre-sentation of MILPRIT

*∗*, an algorithm for mining pi-patterns with constraints. In Section5 we present and analyse the experimental results obtained by executing MILPRIT

*∗ *oversynthetic data as well as over real data related to a medical application. Finally in Section6, we present our concluding remarks and discuss some future research on the subject.

2. Problem Formalization
In this section, we introduce the main concepts related to the problem of discoveringtemporal point-interval patterns.

The Dataset. First of all, let us describe the datasets where the discovery procedurewill be executed later on. We assume a set of attributes symbols

*A*1

*, A*2

*, .*, called

*dataattributes*, and two special attribute symbols

*Tit *and

*Tpt*, called interval-time attribute andpoint-time attribute respectively. Let R =

*{K, R*1

*, ., Rn} *be a database schema such thateach relational schema

*Ri *is of the form

*Ri*(

*Ai , ., Ai , T*
*, ., Ai , T*
*pt*) or

*Ri*(

*Ai*
relational schema

*K *(called

*the reference relation*) is of the form

*K*(

*A*1

*, ., Am*). Theattributes appearing in

*K *(all data attributes) are called

*reference attributes*.

The values for data attributes

*Ai *are taken within a set Dom, called the

*data do-*
*main*. The attribute

*Tit *takes values in the set

*I *=

*{*[

*i, j*]

* i, j ∈ *N

*, i < j}*. The elements of

*I *are intervals [

*i, j*] where

*i *and

*j *are natural numbers mapping dates:

*i *is the starting dateand

*j *is the ending date of the interval [

*i, j*]. The attribute

*Tpt *takes values in the set N.

Elements of

*Tpt *are natural numbers mapping dates. Naturally, a

*point *can be viewed asan

*interval *[

*i, j*] where

*i *=

*j*. In our approach, in order to optimize the generation phaseof the algorithm MILPRIT

*∗*, we treat interval and points as distinct entities, emphasizingthat the starting and ending points of an interval are different.

A

*temporal dataset *over R is a set of temporal relations

*{k, r*1

*, ., rn} *where each

*ri *is a set of tuples (

*a*1

*, ., an , e*) over

*R*
*i *and

*k *is a set of tuples over

*K *. We will assume
that the relations

*ri *with interval-valued timestamp are

*coalesced*[B¨ohlen et al. 1996], thatis, if (

*a*1

*, ., ak, e*) and (

*a*1

*, ., ak, f*)

*∈ ri*, then

*e ∩ f *=

*∅*, i.e., intervals associated to asame tuple are maximal. The dataset depicted in Figure 1 is a temporal dataset with thefollowing schema:
Temporal pi-Patterns. Let R =

*{K, R*1

*, ., Rn} *be a database schema. Let us sup-pose four disjoints sets

*V *(data variables, denoted by

*x, y, z, x*1

*, .*),

*C *(data constants,denoted by

*a, b, c, a*1

*, .*),

*Vit *(interval-time variables, denoted by

*e, f, g, e*1

*, .*) and

*Vpt*(point-time variables, denoted by

*t, s, t*1

*, s*1

*, .*). Let us also assume the set of temporalpredicates

*PT *=

*{before, meets, overlaps, during, finishes, starts}*, as in Allen's IntervalLogic [Allen and Ferguson 1994]. A

*data atom *over the database schema R is an expres-sion of the form

*R*(

*x*1

*, ., xn, e*) where

*R ∈ *R, (

*R 6*=

*K*), each

*xi *is a data variable ora data constant and

*e *is an interval-time or point-time variable (depending on the timeattribute of

*R*). A

*temporal atom *is an expression of the form

*r*(

*e, f *), where

*r *is oneof the temporal predicates in

*PT *and

*e, f *are interval-time or point-time variables. If

*r ∈ {overlaps, meets}*, then

*e *and

*f *are interval-time variables. For the temporal predi-cates

*before, during, starts, finishes *we may have different temporal atoms:

*before*(

*e, f *),

*before*(

*t, e*),

*before*(

*e, t*),

*before*(

*t, s*),

*during*(

*e, f *),

*during*(

*t, e*),

*starts*(

*e, f *),

*starts*(

*t, e*),

*finishes*(

*e, f *),

*finishes*(

*t, e*), where

*e, f *are interval-time variables and

*t, s *are point-timevariables. Figure 2 illustrates the semantics of the temporal predicates. For instance, theformula

*before*(

*e, f *) is true if the variables

*e *and

*f *are evaluated as [4

*, *6] and [8

*, *9] respec-tively. The formula

*before*(

*t, e*) is true if

*t *is evaluated as 5 and

*e *as [6

*, *7]. The formula

*during*(

*e, f *) is true if the variables

*e *and

*f *are evaluated as [5

*, *7], and [4

*, *9] respectively.

**Interval-time x Interval-time**
**Point-time x Point-time**
**Point-time x Interval-time**
before(t,e) during(t,e) starts(t,e) finishes(t,e) before(e,t)

**Figure 2. Semantics of Temporal Predicates**
Definition 1 (Temporal pi-pattern) A

*temporal pi-pattern *(temporal point-interval pat-tern) is a triple (

*K*,

*D*,

*T *) where (1)

*D *is a set of data atoms (2)

*T *is a set of temporalatoms and (3)

*K *is a special data atom

*K*(

*x*1

*, ., xn*) whose variables

*x*1

*, ., xn *appear in

*D*. The data atom

*K *is called the

*reference query *of the pi-pattern (

*K*,

*D*,

*T *). It is essential
in the definition of a pi-pattern, since it specifies

*what *is counted. It plays the same rolein our setting as the

*atom key *in the Datalog patterns of [Dehaspe and Toivonnen 1999].

Example 3 (Running Example) Let
(

*Patient*(x)

*, D, T *) where

*D *=

*{Med*(

*x, y, e*),

*Symp*(

*x, z, f *)

*} *and

*T *=

*{before*(

*e, f *)

*}*.

Intuitively, this temporal pattern translates the fact that a patient

*x *takes a medicine

*y *(notspecified) during a period of time

*e *and during a further period of time

*f *(after she hasstopped taking the medicine

*y*) presents a symptom

*z *(not specified). Note that we arealways interested in analyzing the behaviour of patients registered in the relation

*Patient*,even though relations

*Med *and

*Symp *may contain data related to other patients (not onlythose registered in

*Patient*). For this reason,

*Patient*(

*x*) is called

*the reference query*.

An

*instantiation *of an interval-time variable

*e *(resp. a point-time variable

*t*) is a
mapping

*v *associating an interval

*v*(

*e*) = [

*i, j*] to

*e *(resp. a point

*i *to

*t*), where

*i *and

*j *arenatural numbers (mapping dates). A

*ground temporal atom *is the temporal atom obtainedby instantiating its interval-time and point-time variables.

Definition 2 (Support and frequency of a pi-pattern) Let

*D *be a temporal databaseover the schema R =

*{K, R*1

*, ., Rn} *and

*M *= (

*K*(

*x*1

*, ., xm*),

*{D*1

*, .Ds}, {T*1

*, ., Tl}*)be a temporal pi-pattern over R. The

*support *of

*M *with respect to

*D*, denoted by

*sup*(

*M*)(omitting the dataset

*D*, for the sake of simplifying the notation), is defined by

*sup*(

*M*) =

* {u∈K u *=

*QM } *where

*Q*
*M *is the

*relational calculus *query

*∃y*1

*∃y*2

*.∃yk*(

*D*1

*∧ D*2

*∧ . . ∧*
*Ds ∧ T*1

*∧ . . ∧ Tl*), and

*y*1

*, ., yk *are the data variables not appearing in the referencequery

*K*(

*x*1

*, ., xm*). The expression

*u *=

*QM *means that

*u *is an answer to the query

*QM *when executed over

*D*. Given 0

*≤ α ≤ *1, we say that the temporal pi-pattern

*M *is

*frequent *with respect to

*D *and

*α *if

*sup*(

*M*)

*≥ α*.

Example 4 (Running Example) Let us consider the temporal pi-pattern

*M *= (

*Pa-tient*(

*x*),

*{Med*(

*x, y, e*),

*Symp*(

*x, z, f *)

*}*,

*{before*(

*e, f *)

*}*). Let

*D *be the temporal databaseinstance illustrated in Figure 1. The relational query

*QM *can be specified in the relationalalgebra formalism by the expression Π

*PName*(

*Patient *o
n

*Symp*). The answer of

*QM *is

*{*(Paul), (Sarah)

*}*. Note that Charles is not an answer. Indeed, Charles has takentetracycline and was dizzy

*during *(and not

*after*) the time he was taking this medicine.

John is not an answer neither, because he has not taken any medicine at all. So, the supportof

*M *w.r.t.

*D *is 1.

In what follows, we suppose that the set of temporal atoms

*T *is

*consistent*, i.e.,
the temporal variables can be instantiated in a consistent way. Formally, a set of temporalatoms

*T *is

*consistent *if there is an instantiation

*v *of the set of temporal variables appear-ing in

*T *, such that the ground atoms obtained are simultaneously true. We denote this by

*v *=

*T *. For instance, the set of temporal atoms

*{before*(

*e, f *),

*before*(

*f, g*),

*before*(

*g, e*)

*}*is not consistent, because the first two atoms imply that

*e *comes before

*g *and so, there isno way to instantiate the temporal variables

*e, f *and

*g *in order to make the three temporalformulae simultaneously true. We also suppose that the set

*T *is

*complete*, i.e., for eachpair of temporal variables

*e, f *appearing in

*T *it is possible to infer a

*unique *temporalrelationship between

*e *and

*f *. For instance, the set

*{before*(

*e, f *),

*meets*(

*f, g*)

*} *is completebecause the relationship between

*e *and

*g *is completely determined (even though it doesnot appear explicitly in the set): the only possibility is

*before*(

*e, g*). On the other hand,

*{during*(

*e, f *),

*overlaps*(

*f, g*)

*} *is not complete since the relationship between

*e *and

*g *is not
deterministically implied by the two relationships

*during*(

*e, f *),

*overlaps*(

*f, g*): it could be(1)

*before*(

*e, g*), (2)

*meets*(

*e, g*), (3)

*overlaps*(

*e, g*), (4)

*during*(

*e, g*) or (5)

*starts*(

*e, g*). Thatis, for each one of the five temporal atoms (I) (where (I)

*∈ {*(1), (2), (3), (4) (5)

*}*) the set

*{during*(

*e, f *),

*overlaps*(

*f, g*), (I)

*} *is consistent.

Representing temporal pi-patterns as sequences. The notion of consistency and com-pleteness of a set of temporal atoms is essential in our approach. As we will see next, givena consistent and complete set of temporal atoms, its temporal variables can be ordered ina natural way. This implies that a pi-pattern can be viewed as a

*sequence of data atoms*.

And so, we can naturally think about

*regular expressions *as a formalism for specifyingspecial sets of pi-patterns. Due to space limitation, we will only give an informal defini-tion of the order relation

*<T *over the set of temporal variables appearing in a pi-pattern.

The idea can be easily formalized.

Let us consider the domain

*I *for the interval-time variables and the domain N for
the point-time variables. We can view a natural number

*n *as an interval [

*n, n*]. So, weconsider the set

*I∗ *=

*I ∪ *N as being constituted of intervals [

*i, j*], with

*i ≤ j*. Let [

*a, b*]and [

*c, d*] in

*I∗*. We say that [

*a, b*]

*< *[

*c, d*] if and only if one and only one of the followingconditions is verified: (1)

*b < d *or (2)

*b *=

*d *and

*a < c*. It can easily be shown that

*< *is a(strict) total order over the set

*I∗*.

The (strict) ordering

*< *over the interval structure

*I∗ *naturally induces an ordering

*<T *over the set of temporal variables (interval-time and point-time) appearing in the set
of temporal predicates

*T *of a pi-pattern. The following example illustrates the idea: let

*T*=

*{before(e,f), starts(f,g)}*. Then the induced ordering over the set of temporal variables

*{e, f, g} *appearing in

*T *is

*e <T f <T g*. Indeed, for all instantiation

*v *of the temporal
variables,

*v*(

*e*) = [

*ae, be*]

*, v*(

*f*) = [

*af , bf *]

*, v*(

*g*) = [

*ag, bg*], such that the predicates in

*T*are simultaneously true, the ending points

*be, bf , bg *satisfy

*be < bf < bg*. That is, for allinstantiation

*v *such that

*v *=

*T *, we have

*v*(

*e*)

*< v*(

*f *)

*< v*(

*g*).

The following Theorem guarantees that the relation

*<T *is well-defined and also
an order relation.

Theorem 1 Let

*T *be a complete and consistent set of temporal atoms. Then (1) the def-inition of

*<T *does not depend on the particular instantiation

*v *of the temporal variables
appearing in

*T *and (2)

*<T *is a strict order relation over the set of temporal variables
appearing in

*T *.

Proof (Sketch). In order to simplify the presentation, we suppose that we have onlyinterval-time variables. The case where one has interval-time and point-time variables istreated in a similar way. (1) By the fact that

*T *is consistent, we can affirm that there existsa valuation

*v *of the temporal variables appearing in

*T *such that

*v *=

*T *. If

*v *is unique,then

*<T *is well-defined, since there is a unique way to define it through the valuation

*v*.

Now, let us suppose that such valuation is not unique, that is, let us suppose that

*v *and

*v0*are two valuations of temporal variables of

*T *such that

*v *=

*T *and

*v0 *=

*T *. From the factthat

*T *is complete, we can affirm without loss of generality that there exists a temporalpredicate

*r *such that

*v *=

*r(e,f) *and

*v0 *=

*r(e,f)*. So,

*v*(

*e*)

*< v*(

*f *) if and only if

*r(e,f) *iseither

*before(e,f) *or

*during(e,f) *or

*meets(e,f) *or

*overlaps(e,f) *or

*finishes(e,f) *or

*starts(e,f)*.

In each one of these cases, from the fact that

*v0 *=

*r(e,f)*, we conclude that

*v0*(

*e*)

*< v0*(

*f *).

(2) In order to prove that

*<T *is a strict order over the temporal variables appearing in

*T *,
we must show that it satisfies the irreflexivity and transitivity properties. It is obvious that

*<T *is irreflexive. Let

*e <T f *and

*f <T g*, where

*e, f, g *are temporal variables appearing
in

*T *. Let

*r*(

*e, f *),

*r0*(

*f, g*) and

*r00*(

*e, g*) be the unique temporal relationship inferred from

*T *for the pair of variables (

*e, f *), (

*f, g*) and (

*e, g*) respectively. We can guarantee this byusing the fact that

*T *is complete. As

*T *is consistent, there exists a valuation

*v *of thetemporal variables

*e, f, g *such that

*v *=

*r*(

*e, f *)

*∧ r0*(

*f, g*)

*∧ r00*(

*e, g*). Now, using the factthat

*e <T f *and

*f <T g*, and considering all the possibilities for the temporal predicates

*r*,

*r0 *and

*r00 *it is not difficult to show that

*v*(

*e*)

*< v*(

*g*) in all cases.

The induced order

*<T *over the temporal variables appearing in a temporal pi-
pattern

*M *allows us to view

*M *as a

*sequence *of data atoms. Indeed, let

*M *= (

*K*,

*D*,

*T *),where

*D *=

*{p*1(

*x*1

*, ., x*1

*, e*
*, ., xk , e*
1)

*, . . , pk*(

*xk*
*k*)

*}*. Since we are assuming that

*T *is
complete and consistent, then, according to Theorem 1 its temporal variables are orderedby the order

*<T *. Let us suppose, without loss of generality, that

*e*1

*<T e*2

*<T . . <T ek*.

So, the set

*D *can be viewed as the

*sequence < p*1(

*x*1

*, ., x*1

*, e*
*, ., xk, e*
1)

*, . . , pk*(

*xk*
3. Temporal pi-Patterns with RestrictionsIn an application where the user has a previous idea of some specific characteristics ofthe patterns she is searching, she can be overwhelmed with uninteresting patterns. In thissection, we propose to use regular expressions as a formalism to specify a broad spectrumof constraints over pi-patterns in order to better satisfy user requirements as well as toreduce the search space of patterns.

A

*mode *over a temporal database R =

*{K, R*1

*, ., Rk} *is an expression of the
form

*R*(

*u*1

*, ., us, *#), where each

*ui *is a data variable or the (new) symbol

*∗*, # is a newsymbol, and

*R*(

*A*1

*, ., An, T *) is one of the predicates

*Ri ∈ *R,

*T ∈ {Tit, Tpt}*. We saythat a data atom

*R*(

*y*1

*, ., ys, e*) is

*in accord with *the mode

*R*(

*u*1

*, ., us, *#) if for each

*i *= 1

*, ., s *we have: (1) If

*ui *is a variable then

*yi *=

*ui*; (2) If

*ui *=

*∗*, then

*yi *is avariable or a constant ; (3)

*e *is a interval-time variable if

*T *=

*Tti *or a point-time variableif

*T *=

*Tpi*. For instance,

*Med*(

*x, ∗, *#) is a mode and the atom

*Med*(

*x, penicillin, e*) isin accord with

*Med*(

*x, ∗, *#). On the other hand, the atom

*Med*(

*y, penicillin, e*) is not inaccord with the mode

*Med*(

*x, z, *#).

Let Σ be a set of modes over R =

*{K, R*1

*, ., Rn}*. In what follows, we will
consider regular languages (sets of strings) over the alphabet Σ. These languages arespecified by regular expressions. Given a regular expression

*E *over the alphabet Σ, wedenote by

*W *(

*E*) the set of strings (words) verifying

*E*. We denote by

*P *(

*E*) the set ofprefixes of strings in

*W *(

*E*). We will need this notation later.

Definition 3 (The Search Space) Let R =

*{K, R*1

*, ., Rn} *be a database schema, Σa set of modes over R and

*E *a regular expression over Σ. The

*search space *de-fined by

*E *is the set of temporal pi-patterns (

*K*(

*x*1

*, ., xm*),

*D*,

*T *) such that

*D *=

*<p*1(

*t*1

*, ., t*1

*, e*
*, ., tn , e*
1)

*, . . , pn*(

*tn*
*n*)

*> *satisfies the following properties: (1) There ex-
ists a string

*w*1

*w*2

*.wn ∈ W *(

*E*), such that

*pi *is in accord with

*wi *for each

*i *= 1

*, ., n*;(2) For each data atom

*pi*(

*xi , ., xi , e*
*i*)

*∈ D *let

*wi *=

*pi*(

*u*1

*, ., ul*
*i*) be its associated
mode; if

*uj *=

*∗ *and

*xj *is a variable, then

*xj *has only one occurrence in

*D *(a data variablesymbol

*xj *can be used to replace a symbol

*∗ *in a mode only once). We denote by

*W*(

*E*)the search space specified by

*E*. We denote by

*P*(

*E*) the set of pi-patterns defined in
a similar way as

*W*(

*E*), by considering strings

*w*1

*w*2

*.wn ∈ P *(

*E*) instead of

*W *(

*E*) incondition (1). Patterns in

*W*(

*E*) are called

*valid*. Patterns in

*P*(

*E*) are called

*prefix valid*.

As we will see in the next section, in the generation phase of our algorithm, only
pi-patterns in

*P*(

*E*) (prefixes) will be generated. The reason why we generate prefix-validpatterns (p-valid patterns for simplify) rather than valid patterns will be clearer later.

Example 5 (Running Example) Let

*E *=

*Med*(

*x, ∗, *#)

*Med*(

*x, ∗, *#)

*∗ Symp*(

*x, ∗, *#).

The pi-pattern

*M*1 = (

*Patient*(

*x*),

*{Med*(

*x, penicillin, e*),

*Med*(

*x, tetracyclin, f*),

*Symp*(

*x, diarrhea, g*)

*}*,

*{before*(

*e, f *),

*overlaps*(

*f, g*)

*}*) belongs to

*W*(

*E*). The pattern

*M*2= (

*Patient*(

*x*),

*{Med*(

*x, penicillin, e*),

*Med*(

*x, tetracyclin, f *)

*}*,

*{before*(

*e, f *)

*}*) belongs to

*P*(

*E*). The pattern

*M*3 = (

*Patient*(

*x*),

*{Med*(

*x, y, e*),

*Symp*(

*x, y, f*)

*}*,

*{before*(

*e, f*)

*}*) doesnot belong to

*P*(

*E*), since property (2) of definition 3 is not verified. Intuitively, the reg-ular expression

*E *captures the temporal pi-patterns corresponding to a patient

*x *takingcertain medicines during successive periods and eventually presenting some symptom.

Our mining task is: Given a temporal database

*D*, a minimum support threshold

*α*,0

*≤ α ≤ *1, and a regular expression

*E *over Σ, find all temporal pi-patterns in

*W*(

*E*)which are frequent with respect to

*D *and

*α*.

4. The Algorithm MILPRIT

*∗*In this section, we present the algorithm MILPRIT

*∗ *(Mining Interval Logic Patterns withRegular expressIons consTraints) which generalizes the idea of the SPIRIT algorithmintroduced in [Garofalakis et al. 1999]. In a high level, it follows the general Aprioristrategy of [Srikant and Agrawal 1996], working in passes, and each pass producing pat-terns more

*specificic *than those produced in the previous pass. The classical Aprioristrategy uses a

*Pruning Phase*, where generated patterns which are more specific thana non-frequent pattern

*p *are pruned. This pruning strategy relies on the AntimonotonicProperty: "

*a frequent pattern cannot be more specific than a non-frequent pattern*".

At a first glance, a way of conceiving a method for mining pi-patterns, following
the Apriori strategy would be: (1) at each pass

*k*, one combines frequent and valid patternsin the set

*Fk−*1 obtained in the previous pass in order to generate the set of candidates

*Ck*.

One takes care to produce only valid patterns in this combination, pushing the regularexpression

*E *in the generation process; (2) one prunes those patterns in

*Ck *containing a
subpattern which is valid and which is not in

*k−*1

*F*
*i *(this pattern surely is not frequent!);
(3) one scans the database in order to count the support of the remaining patterns. In thismethod, the number of generated patterns is in

*direct proportion *to the restrictiveness of

*E*. In the pruning phase, however, all patterns which are more specific than a pattern

*satisfying E *and not belonging to

*Fk−*1 should be pruned. In this case, the size of the setof pruned patterns is in

*inverse proportion *to the restrictiveness of the constraint

*E*. Suchtrade-offs are due to the fact that regular expression constraints are not anti-monotone,that is, if a pattern satisfies

*E *it may have some subpattern not satisfying

*E*2.

In order to find a suitable trade-off, we use a

*relaxation *of the constraint

*E*, that
is, a less restrictive one, the prefix constraint associated to E, according to which, onlyfrequent p-valid pi-patterns will be produced in the mining process. A post-processingphase will filter the valid patterns.

2For instance, the word

*aba *verifies the regular expression

*E *=

*ab∗a *but its subword

*ba *does not verify

*E*.

The general structure of the algorithm MILPRIT

*∗ *is depicted below:
Procedure MILPRIT

*∗*(

*D*,

*α*,

*E*,

*δ*)

*D *is a temporal dataset over R =

*{K, R*1

*, ., Rn}*,

*E *is a regular expression over the alphabet of modesΣ over R,

*α *is a minimum support threshold and

*δ *is the time granularity (

*day*,

*week*,

*fortnight*,

*month*,

*bimester*,

*trimester, semester, year*). Let

*M*1 be the set of data atoms

*Ri*(

*x*1

*, ., xn , e*), where

*e *is a
interval-time or a point-time variable, depending on the sort of the temporal attribute appearing in

*Ri*.

begin1.

*F *:=

*F*1 =

*{M ∈ M*1 :

*M *is frequent and

*M ∈ P*(

*E*)

*}*2.

*k *:= 23. repeat
4.

*Ck *:= Gen(

*Fk−*1,

*E*) (

*Generation Phase*)5.

*P *:=

*{M ∈ Ck *:

*∃N ∈ P*(

*E*),such that

*N 6∈ F *and

*M *is more specific than

*N}*6.

*Ck *:=

*Ck − P *(

*Pruning Phase*)7.

*Fk *:=

*{M ∈ Ck *: sup(

*M, D, δ*)

*≥ α} *(

*Validation Phase*)8.

*F *:=

*F ∪ Fk*9.

*k *:=

*k *+ 1
10. until

*Fk−*1 =

*∅*11.

*F *:=

*{M ∈ F *:

*M ∈ W*(

*E*)

*} *(

*Post-Processing Phase*)
The function sup(

*M, D, δ*) is defined as follows: First, each date appearing in
some tuple in

*D *is associated to the corresponding time value determined by the timegranularity

*δ*. For instance, if

*δ *=

*month *and date is 20

*/*09

*/*2007 and the minimum dateappearing in

*D *is 28

*/*07

*/*2006 then 20

*/*09

*/*2007 is mapped into 15, since date 20

*/*09

*/*2007occurs in the 15th month relatively to the first month July 2006. Let

*D0 *be the databaseobtained from

*D *after this transformation. The support sup(

*M, D, δ*) is obtained by cal-culating sup(

*M, D0*) as in Definition 2.

The Generation Phase. MILPRIT

*∗ *works in passes, each pass

*k *producing patterns of"

*level*"

*k*. The "level" of a temporal pi-pattern

*M *is measured in terms of the

*refinementvector *associated to

*M *as defined below.

Definition 4 The

*refinement vector *associated to

*M *= (

*K*,

*D*,

*T *) is

*v*(

*M*) = (

*n, c*) where

*n *is the size of

*D *and

*c *is the number of constants appearing in

*D*. The

*refinement level *of

*M*, denoted by

*l*(

*M*), is defined as

*n *+

*c *where (

*n, c*) =

*v*(

*M*). For instance, let

*M *be thetemporal i-pattern (

*Patient*(

*x*),

*{Med*(

*x, penicillin, e*),

*Symp*(

*x, z, f *)

*}*,

*{before*(

*e, f *)

*}*).

Then

*v*(

*M*) = (2

*, *1) and

*l*(

*M*) = 3.

The procedure Gen(

*Fk−*1,

*E*) is designed to generate temporal pi-patterns whose
refinement level is

*k*, by

*specializing *the frequent temporal pi-patterns of

*Fk−*1 (whoserefinement level is

*k − *1) in such a way that the patterns produced are in

*P*(

*E*). The coreof procedure Gen is the two

*specialization operators *(

*ρE *and

*ρI*) defined below.

Definition 5 (Extension) Let

*E *be a regular expression and

*M*
*∈ P*(

*E*),

*M *=
(

*K*(

*x*1

*, ., xn*),

*D*,

*T *). The

*extension operator ρE *executed over

*M *is defined as fol-lows: (a) if

*v*(

*M*) = (

*n, c*) with

*c > *0, then

*ρE*(

*M*) =

*∅*; (b) if

*v*(

*M*) = (

*n, *0) then

*ρE*(

*M*) is the set of temporal pi-patterns

*M0 *= (

*K*(

*x*1

*, ., xn*)

*, D0, T 0*)

*∈ P*(

*E*) such that

*D0 *=

*D∪{pn*+1(

*z*1

*, . . , zl, en*+1)

*} *where

*pn*+1

*∈ *R,

*z*1

*, ., zl *are data variables and

*en*+1 isa interval-time or a point-time variable (according to the sort of the time attribute of

*pn*+1)
and

*T 0 *=

*T ∪ {r*1(

*e*1

*, en*+1)

*, . . , rn*(

*en, en*+1)

*} *where

*ri *are temporal predicates, and

*T 0*is complete and consistent. Clearly, for each

*i *= 1

*, ., n *the temporal predicates

*ri *aretaken according to the type of the time variables

*ei, en*+1 (point or interval) (see Figure 2).

For instance, if both

*ei *and

*en*+1 are point-time,

*ri *must be the temporal predicate

*before*.

The following example illustrates the operator

*ρE*:
Example 6 (Running Example) Let

*E *be the regular expression

*Hist(x,*,#)∗Med(x,*,#)∗*.

The pi-pattern

*M *= (

*Patient(x)*,

*{Hist(x,y,t*1

*)}*,

*∅*) belongs to

*P*(

*E*).

Let us consider the following pi-patterns:

*M*1 = (

*Patient*(

*x*),

*{Hist*(

*x, y*1

*, t*1),

*Hist*(

*x, y*2

*, t*2)

*}*,

*{before*(

*t*1

*, t*2)

*}*)

*M*2 = (

*Patient*(

*x*),

*{Hist*(

*x, y*1

*, t*1),

*Hist*(

*x, y*2

*, t*2),

*Med*(

*x, z, e*3)

*}*,

*{before*(

*t*1

*, t*2),

*meets*(

*t*2

*, e*3)

*}*)

*M*3 = (

*Patient*(

*x*),

*{Hist*(

*x, y*1

*, t*1),

*Hist*(

*x, y*2

*, t*2),

*Med*(

*x, z, e*3)

*}*,

*{before*(

*t*1

*, t*2),

*starts*(

*t*1

*, e*3),

*during*(

*t*2

*, e*3)

*}*)

*M*4 = (

*Patient*(

*x*),

*{Hist*(

*x, y*1

*, t*1),

*Hist*(

*x, y*2

*, t*2),

*Med*(

*x, z*1

*, e*3)

*}*,

*Med*(

*x, z*2

*, e*4)

*}*,

*{before*(

*t*1

*, t*2),

*starts*(

*t*1

*, e*3),

*overlaps*(

*e*3

*, e*4)

*}*)

*M*1 is in

*ρE*(

*M*), but

*M*2 is not in

*ρE*(

*M*1), because

*meets *is not a valid relationship
between

*t*2 and

*e*3. The pattern

*M*3 is in

*ρE*(

*M*1), but

*M*4 is not in

*ρE*(

*M*3), because thereis no explicit nor implicit relationship between the temporal variables

*e*4 and

*t*2, that is,the set of temporal predicates in

*M*4 is not complete.

Definition 6 (Instantiation) Let

*E *be a regular expression and

*M ∈ P*(

*E*),

*M *=(

*K*(

*x*1

*, ., xn*),

*D*,

*T *), where

*D *=

*< D*1

*, ., Dm >*. The

*instantiation operator *executedover

*M *produces the set

*ρI*(

*M*) of temporal pi-patterns

*M0 *= (

*K*(

*x*1

*, ., xn*)

*, D0, T *),where

*D0 *is obtained by replacing some variable

*yk *(

*yk 6*=

*xl *for

*l ∈ *[1

*, n*]) occurring insome

*Di *=

*p*(

*y*1

*, . . , yp, e*) by a constant

*c *in such a way that if the string of modes cor-responding to

*D *is

*w*1

*w*2

*.wm *and

*wi *=

*p*(

*u*1

*, . . , up, *#), then: (a)

*uk *=

*∗*; (b) for every

*l > k*, if

*ul *=

*∗*, then

*yl *is not a constant; (c) for every

*j > i*, if

*wj *=

*p0*(

*v*1

*, . . , vq, *#),then for every

*l ∈ *[1

*, q*], if

*vl *=

*∗ *and

*Dj *=

*p0*(

*z*1

*, . . , zq, e*), then

*zl *is not a constant.

Intuitively, condition (b) states that if some variable in a data atom is instantiated
then no other variables on the left of this one in the atom can be instantiated in further stepsof the candidate generation process. Condition (c) states that if some variable in a dataatom

*p *is instantiated then no other variables appearing in data atoms coming after

*p *inthe pattern can be instantiated in the future. These conditions are necessary for assuringoptimality of the specialization operators, as stated in Lemma 2 below. The followingexample illustrates the operator

*ρI*:
Example 7 (Running Example) Let

*E *be the regular expression

*E *=

*Med(x,*,#)∗Symp(x,*,#)∗*. Consider the following pi-patterns in

*P*(

*E*):

*M*1 = (

*Patient*(

*x*),

*{Med*(

*x, w, e*1),

*Symp*(

*x, y, e*2)

*}*,

*{before*(

*e*1

*, e*2)

*}*)

*M*2 = (

*Patient*(

*x*),

*{Med*(

*x, w, e*1),

*Symp*(

*x, dizziness, e*2)

*}*,

*{before*(

*e*1

*, e*2)

*}*)

*M*3 = (

*Patient*(

*x*),

*{Med*(

*x, penicillin, e*1),

*Symp*(

*x, y, e*2),

*}*,

*{before*(

*e*1

*, e*2)

*}*)

*M*4 = (

*Patient*(

*x*),

*{Med*(

*x, penicillin, e*1),

*Symp*(

*x, dizziness, e*2)

*}*,

*{before*(

*e*1

*, e*2)

*}*)
According to definition 6,

*M*2 and

*M*3 are in

*ρI*(

*M*1) (i.e., are valid instantiations
of

*M*1),

*M*4

*6∈ ρI*(

*M*1),

*M*4

*6∈ ρI*(

*M*2),

*M*4

*∈ ρI*(

*M*3).

Two temporal pi-patterns

*M *and

*M0 *are said to be

*equivalent *if

*M0 *is obtained
from

*M *by renaming its variables.

Theorem 2 The specialization operator

*ρ *defined as

*ρ*(

*M*) =

*ρI*(

*M*)

*∪ ρE*(

*M*) satisfiesthe following properties:

*• completeness*: every pi-pattern

*M ∈ P*(

*E*) can be obtained by successively apply-
ing

*ρ *to some pi-pattern with the refinement level 1.

*• optimality*: suppose that the regular expression

*E *verifies the following condition:

*for all distinct modes m*1

*and m*2

*appearing in E, associated to the same predicatep, the positions containing the symbol * are the same in boths modes*. Under thiscondition, the instantiation operator

*ρ *is optimal, in the sense that a pi-patterncannot be obtained by specializing two non-equivalent pi-patterns.

Proof.

(1)

*Completeness*: Let

*M *= (

*K, D*,

*T *)

*∈ P*(

*E*) and

*ν*(

*M*) = (

*n, c*). We will showby induction on

*n *that for all

*c ≥ *0,

*M *is obtained by successively applying

*ρ *to somepi-pattern

*M*1 whose refinement level is 1.

*• *Let

*n *= 1. Let us show by induction on

*c *that

*M *is obtained by successively
applying

*ρ *to some pi-pattern

*M*1 with refinement level 1. For

*c *= 0: the re-finement level of

*M *is 1 and

*M *=

*ρ*0(

*M*) (we apply

*ρ *"0 times" over

*M*). Let

*c ≥ *1 and let us suppose that the result is true for

*c − *1. Let

*p*(

*t*1

*, ., tk, en*)the last atom in

*D *where, for some

*i ∈ {*1

*, ., k}*,

*ti *is a constant

*∈ C*. Let

*m*= max

*{i *1

*≤ i ≤ k *such that

*ti ∈ C}*. Let

*M0 *be the pi-pattern obtained byreplacing

*ti *by a new variable

*y *in the atom

*p*(

*t*1

*, ., tk, en*). That is, we replacethe data atom

*p*(

*t*1

*, ., ti, ., tk, en*) by

*p*(

*t*1

*, ., y, .tk, en*). Then,

*M *is obtainedby applying

*ρI *over

*M0*. By the induction hypothesis, since

*ν*(

*M0*) = (

*n, c − *1)then

*M0 *is obtained by successively applying the specialization operator

*ρ *over api-pattern

*M*1 with refinement level 1. Then,

*M *=

*ρI*(

*M0*) is also obtained in sucha way.

*• *Let us suppose the result is true for

*n − *1 and let us show it is true for

*n ≥ *2. We will show that by induction on

*c ≥ *0. For

*c *= 0:

*M *=

*< p*1(

*t*1

*, t*1

*, ., t*1

*, e*
*, tm, ., tm , e*
is a variable, for all
1)

*, ., pm*(

*tm*
*m*)

*>*, and

*tij*
*i, j*. It is easy to see that

*M *is obtained by applying successively the operator

*ρE *to the pi-pattern

*< p*1(

*t*1

*, t*1

*, ., t*1

*, e*
1)

*> *with refinement level 1 (the proof is
achieved by induction on

*m*, using the fact that each prefix of

*M *is in

*P*(

*E*). Forthe case

*c ≥ *1, we employ the same argument used in the case

*n *= 1 and

*c ≥ *1.

(2)

*Optimality*: For lack of space the proof is given in the Appendix.

Figure 3 illustrates the completeness and optimality of

*ρ*.

*M*3 cannot be obtained
by specializing both

*M*1 and

*M*2, since they are non-equivalent pi-patterns. In fact

*ρ*executed over

*M*1 does not produce

*M*3 since according to Definition 5(a), each time api-pattern is instantiated, it cannot be extended any more.

*M*5 cannot be obtained byspecializing both

*M*3 and

*M*4, since they are non-equivalent pi-patterns. In fact

*ρ *executedover

*M*4 does not produce

*M*5 since according to Definition 6(c), each time a position ina pi-pattern is instantiated no positions at its left can be instantiated in future steps.

The Validation Phase. The support counting of the generated pi-patterns is a rathertechnical procedure and its details are omitted here. Roughly speaking, an equivalence

*M *=

*Patient*
*M *=

*Patient*
*M *= (

*Patient*(

*x*), {

*Med*(

*x,y,e *),

*Symp*(

*x,z,e *)},

*T *)
Level 3

*M *= (

*Patient*(

*x*), {

*Med*(

*x,penicillin,e *),

*Symp x,z,e*
)},

*T *)

*M *= (

*Patient*(

*x*), {

*Med*(

*x,y,e *),

*Symp*(

*x,dizziness,e *)},

*T *)

*M *= (

*Patient*(

*x*), {

*Med*(

*x,penicillin,e *),

*Symp x,dizziness,e*
**Figure 3. Candidate Generation: executing the specialization operators level by level.**
relation over the set of generated patterns is defined, depending on the sequence of modesappearing in the patterns. A SQL query

*Q*(

*C*) is associated to each equivalence class

*C*.

Thus, to each equivalente class

*C*, a database

*DC *is associated. An important point toremark here is that the database

*DC *has a unique relation, namely the answer of query

*Q*(

*C*) executed over the input database

*D*. Thus, in order to evaluate the support of thepatterns in class

*C*, the relation

*DC *is scanned once at each pass. A hash-tree structure isused for optimizing support counting in each equivalence class. The construction of thesehash-trees have been inspired on the ones used in [Srikant and Agrawal 1996].

The following result is a corollary of Theorem 2 which guarantees the soundness
and completeness of MILPRIT

*∗*:
Corollary 1 (Soundness and Completeness) Let

*P *be the set of temporal pi-patternsreturned by the algorithm MILPRIT

*∗*. Then (1) all patterns in

*P *are frequent and satisfythe constraints enforced by the regular expression

*E *and (2) all frequent and valid patternsare in

*P*.

Proof. (1) If

*P ∈ P *then

*P *is frequent since, after the validation phase, only frequentpatterns are returned. Besides, step 11 of MILPRIT

*∗ *guarantees that only valid patterns(with respect to

*E*) are returned. (2) Let

*P *be a frequent and valid pattern. Then

*P *isp-valid. By Theorem 2

*P *is obtained by applying the specialization operator

*ρ n *timesover a pi-pattern

*M *in the refinement level 1. It is easy to prove, by induction on

*n*, that

*P *is obtained during the generation phase of MILPRIT

*∗*, in the refinement level

*n *+ 1.

Since

*P *is frequent, it will be returned in step 7 of MILPRIT

*∗*. Since

*P *is valid it will bereturned in step 11 of MILPRIT

*∗*.

5. Experimental ResultsIn this section, we discuss the performance and scalability of algorithm MILPRIT

*∗*through a set of experiments using both synthetic and real-life datasets. Our experi-ments were performed on an IBM SuperWorkstation7043A-8R with 2.4GHz and 4GBof main memory, running Linux. The DBMS PostgreSQL has been used to handle theSQL queries at the validation phase, as discussed before.

5.1. Synthetic DataWe have developed a synthetic data generator which generates regular expressions andtemporal databases where pi-patterns satisfying a regular expression may appear. The
regular expressions generated have the format (

*B∗B∗.B∗*), where each

*block B*
form (

*T*1

*.Tm*), and each

*Tj *(called

*term*) has the form (

*mj *+.+

*mj *), where

*mj *are
modes. Thus, our generator can produce several types of regular expressions, by varyingthe number of blocks (

*k*), the maximum number of terms in each block (

*m*) and themaximum number of modes appearing in the terms. For instance, one regular expressionwhich fits into this format is

*E *= (

*Med(x,*,#)Symp(x,*,#)*)

*∗ *(

*Med(x,*,#)Hist(x,*,#)*)

*∗*. Thisexpression has two blocks and two terms in each block. For each term there is only onemode which can be chosen. In Table 1, we list the parameters used in the synthetic datagenerator, allowing a great variety of temporal databases and regular expressions to beproduced. For the tests presented in this paper, we use the following notation for the inputdatasets:

*Ix − P y − Mz − Us − Bt − T u − V v − Qw*, where

*x, y, z, s, t, u, v *and

*w *arevalues for the input parameters

*I, P, M, U, B, T, V *and

*Q*, respectively. The remaininginput parameters (

*S, D *and

*F *) are set as (

*S *=

*U *,

*D *=

*U *and

*F *=

*U *). The parameter

*Q*
refers to the number of valid pi-patterns which surely appear in the generated dataset.

Table 1 -

*Parameters used for the synthetic data generation.*

Parameters
Number of tables with intervals
Number of tables with points
Number of attributes per table (not temporal)
Number of tuples per table - in '000s
Size of the domain of reference attributes - in '000s
Size of the domain of non-reference attributes - in '000s
Size of the domain of temporal attributes - in '000s
Number of blocks of the regular expression
Number of terms per blocks of the regular expression
Number of modes per term of the regular expression
Minimum amount of pi-patterns - in '000s
Performance Analysis. Figure 4(a) shows the performance of MILPRIT

*∗ *with respectto support variation. The tests have been executed over the dataset

*I*2

*− P *2

*− M*2

*−U*5

*− B*2

*− T *2

*− V *1

*− Q*1. As the minimum support is increased from 0.5% to 3%, thenumber of frequent patterns at each pass decreases, as expected. Thus, the execution timeof the algorithm decreases, since few candidates are potentially frequent for high values ofminimum support. Figures 4(b), (c) and (d) show how the algorithm performs with respectto variations in the regular expression. In these tests the minimum support has been set as2%. The datasets used in these tests were

*I*2

*−P *2

*−M*2

*−U*5

*−Bt−T *2

*−V *1

*−Q*1, where

*t ∈ {*1

*, *2

*, *3

*, *4

*, *5

*}*,

*I*2

*− P *2

*− M*2

*− U*5

*− B*2

*− T u − V *1

*− Q*1, where

*u ∈ {*1

*, *2

*, *3

*, *4

*, *5

*}*and

*I*2

*− P *2

*− M*2

*− U*5

*− B*2

*− T *2

*− V v − Q*1, where

*v ∈ {*1

*, *2

*, *3

*, *4

*, *5

*}*. As thevalues of parameters

*B, T *or

*V *(respectively the number of blocks, the number of termsin each block and the number of choices for modes in each term) increase, the numberof

*long *patterns potentially frequent increases as well. Thus, the execution time of thealgorithm increases too, since longer patterns have to been tested at the validation phase.

Notice that the number of attributes of the table which is scanned to evaluate the supportfor these long patterns increases as the size of the patterns increases.

Figure 5(a) shows the relationship between the number of generated and pruned
pi-patterns during the execution of MILPRIT

*∗ *over the dataset

*I*2

*− P *2

*− M*2

*− U*5

*−B*2

*− T *2

*− V *1

*− Q*1 with minimum support 0.5%. Note that the number of pruned

*pth's *is quite expressive, and consequently, a smaller amount of pi-patterns are evaluatedat the validation phase. Figure 5(b) shows the execution time associated to each phase

**P ****2-***M ***2-***U ***5-***B ***2-***N ***2-***V ***1-***Q ***1**
**I****2-***P ***2-***M ***2-***U ***5-***B ***x-***N ***2-***V ***1-***Q ***1**
**tio **30000

**Minimum Support (%)**
**Number of Blocks**
**I****2-***P ***2-***M ***2-***U ***5-***B ***2-***N ***x-**
**I2-P2-M2-U2-B2-T V ****1-***Q ***1**
**I****2-***P ***2-***M ***2-***U ***5-***B ***2-***N ***2-***V ***x-***Q ***1**
**Number of Terms**
**Number of Modes**
**Figure 4. Performance results**
(generation-pruning-validation) at each refinement level. The tests also have been exe-cuted over the dataset

*I*2

*− P *2

*− M*2

*− U*5

*− B*2

*− T *2

*− V *1

*− Q*1 with minimumsupport 0.5%. The execution time at the validation phase, especially at higher levels, isquite high. This can be explained by the fact that calculating the support of long patternsat high refinement levels involves the execution of SQL queries with a high number of

*joins*.

**I****2-***P ***2-***M ***2-***U ***5-***B ***2-***N ***2-***V ***1-***Q ***1**
**I****2-***P ***2-***M ***2-***U ***5-***B ***2-***N ***2-***V ***1-***Q ***1**
**tte **25000

**Figure 5. (a) Number of generated and pruned pi-patterns, (b) Amount of time spent in**

each phase at different levels
Scalability For the scalability experiments, the parameters related to the database havebeen changed. In these experiments, we used the default parameters for the regular ex-pression and fixed the minimum support as 2%. We first show how MILPRIT

*∗ *performsas the number of tuples per table (

*U*) increases. The datasets considered here were

*I*2

*−P *2

*− M*2

*− Us − B*2

*− T *2

*− V *1

*− Q*1, where

*s ∈ {*20000

*, *40000

*, *60000

*, *80000

*, *100000

*}*.

Figure 6(a) shows the results of these tests. Clearly, the higher the number of tuples inthe database tables, the longer the execution time of the algorithm, since the validationphase (the support evaluation) is the most computationnally costly. It can be noticedthat MILPRIT

*∗ *scales almost linearly with respect to the number of tuples in the dataset.

Figure 6(b) shows how MILPRIT

*∗ *scales up as the number of attributes per table (

*M*) in-creases. We show the results for the datasets

*I*2

*− P *2

*− Mz − U*5

*− B*2

*− T *2

*− V *1

*− Q*1,where

*z ∈ {*2

*, *3

*, *4

*, *5

*}*. As the number of attributes in the base tables increases, the num-ber of attributes of the table which is scanned in the validation phase3 increases as well.

Consequently the execution time increases. Figure 6(c) shows how the execution timeincreases as the number of generated patterns increases. This experiment consisted inrunning the algorithm over datasets

*I*2

*− P *2

*− M*2

*− U*5

*− B*2

*− T *2

*− V *1

*− Qw*, for

*w ∈ {*1000

*, *1500

*, *2000

*, *2500

*, *3000

*, *3500

*, *4000

*}*. In these three scalability tests, we cannotice that MILPRIT

*∗ *scales up almost linearly with respect to the corresponding param-eters (number of tuples per table, number of attributes per table and number of interestingpatterns appearing in the dataset).

**I****2-***P ***2-***M ***2-***U ***x-***B ***2-***N ***2-***V ***1-***Q ***1**
**I****2-***P ***2-***M ***x-***U ***5-***B ***2-***N ***2-***V ***1-***Q ***1**
**Tuples - in '000**
**Nº of Attributes per Table**
**I****2-***P ***2-***M ***2-***U ***5-***B ***2-***N ***2-***V ***1-***Q ***x**
**Frequent Patterns - in '000**
**Figure 6. Scalability results**
The

*AMDI*4 is part of a project in our laboratory, which aims at building a system support-ing an indexed atlas of digital mammograms. This system intends to be used by radiolo-gists as well as radiology students, in order to help them in breast cancer diagnosis. The

*AMDI *database 5 stores, for each patient, a series of digital mammograms, taken duringits lifetime. Besides image data, some important temporal data is stored, related to thepatient habits and life style: in which periods of time she smoked, or took contraceptivedrugs, for instance. The data used in our tests were supplied by Instituto Victorio Valeri deDiagn´osticos M´edicos, Ribeir˜ao Preto-SP, Brazil. For the time being, the

*AMDI *databasestores information about 1365 patients. In the future, as the system will be available on-line, it will be possible for radiologists to insert new patient's records and images. The

*AMDI *relational schema is R =

*{Patient*(

*PName*),

*Contraceptive*(

*PName*,

*CName*,

*Tit*),
3This table is obtained as the result of executing a SQL query over the temporal dataset.

4The

*AMDI *system is avaliable on www.lcc.ufu.br/amdi.

5The

*AMDI *database is avaliable on www.lsi.ufu.br.

*Antidepressant*(

*PName*,

*AName*,

*Tit*),

*Hormone*(

*PName*,

*HName*,

*Tit*),

*Tobacco *(

*PName*,

*Amount*,

*Tit*),

*Drink*(

*PName*,

*Type*,

*Tit*),

*FirstPregnancy*(

*PName*,

*Tpt*),

*Menarche*(

*PName*,

*Tpt*)

*Menopause*(

*PName*,

*Tpt*),

*Birads*(

*PName*,

*Category*,

*Tpt*)

*}*. The relation Birads storesinformation related to the BIRADS classification, standard published by ACR (AmericanCollege of Radiology)6. The attribute

*Category *may assume five different values, eachone corresponding to a mammogram class. MILPRIT

*∗ *has been executed over the

*AMDI*database in order to find temporal pi-patterns relating the evolution of a breast cancertumor and the patient's habits and life style. For the experiment carried out over thisdatabase, we considered the time granularity

*δ *=

*semester*.

Some of the patterns found were:

*P*2 = (

*Patient(x)*,

*{(x,20*,

*e*1),

*Birads*(

*x, *2

*, s*2)

*}*,

*{before*(

*e*1

*, s*2)

*}*)

*P*3 = (

*Patient(x)*,

*{FirstPregnancy*(

*x,YES*,

*e*1),

*Menopause(x,YES*,

*f*2),

*Birads*(

*x, *2

*, s*3)

*}*,

*{before*(

*e*1

*, f*2),

*before*(

*e*1

*, s*3),

*before*(

*f*2

*, s*3)

*}*)
The pi-pattern

*P*1 can be interpreted as follows:

*patients start taking the antide-*
*pressant fluoxetine as soon as they enter the menopause*. This pi-pattern has been verifiedby 25 patients (support = 1.8%). The pi-pattern

*P*2 has been verified by 14 patients (sup-port = 1%) and can be interpreted as follows:

*patients which smoke 20 cigarettes perday stop smoking after their mammogram is classified with Birads 2*. The pi-pattern

*P*3can be interpreted as follows:

*patients who have already had children presents mammo-grams with Birads 2 after entering the menopause*, this pi-pattern has been verified in 177patients (support = 12%).

We notice that, for the time being, MILPRIT

*∗ *has not mined expressive patterns
relating patient's lifestyle and cancer occurrence as we expected in the beginning of theprojet. Clearly, this is due to the fact that in the current stage of the

*AMDI *database, withonly 1365 records, the Birads classification of the mammograms is almost unchanged.

Moreover, there are no cases of mammograms with Birads 5 in our database. We hopethat as the

*AMDI *database gets more populated and diversified, MILPRIT

*∗ *will be able todiscover more interesting patterns.

6. Conclusion and Further WorkIn this paper we introduced a first-order (or relational) temporal pattern mining frameworkwhere time can be viewed as

*intervals *as well as

*points *in a straight line. We also proposeda formalism to specify a broad class of constraints over such temporal patterns. We de-veloped and implemented the algorithm MILPRIT

*∗ *designed to mining temporal patternswith constraints. For the time being, our temporal pattern formulation does not include apattern where

*several *data atoms are related to the same temporal variable. For instance,we does not consider patterns like (

*Patient*(

*x*),

*{Med*(

*x, penicillin, e*),

*Med*(

*x, xanax, f *)

*}*,

*{equal*(

*e, f *)

*}*).

Our future research focuses on three main targets: (1) the development of a sec-
ond algorithm for mining pi-patterns based on the technique PrefixSpan [Han et al. 2004]
6Breast Imaging Reporting and Data System BIRADS. American College of Radiology, 2004.

which presents a much better performance when compared to Apriori-based techniquesin the sequence pattern mining context, (2) the refinement of the method in order to in-corporate more general temporal patterns and (3) the development of a tool for temporalrelational pattern mining.

Allen, J. F. and Ferguson, G. (1994). Actions and events in interval temporal logic. Tech-
nical Report TR521.

Antunes, C. and Oliveira, A. L. (2005). Constraint relaxations for discovering unknown
sequential patterns. In

*Knowledge Discovery in Inductive Databases, LNCS 3377*,pages 11–32. Springer Berlin, Heidelberg.

Bettini, C., Wang, X. S., Jajodia, S., and Lin, J.-L. (1998). Discovering frequent event pat-
terns with multiple granularities in time sequences.

*IEEE Transactions on KnowledgeData Engineering*, 10(2):222–237.

B¨ohlen, M. H., Snodgrass, R. T., and Soo, M. D. (1996). Coalescing in temporal
databases. In

*VLDB 1996*, pages 180–191.

Boulicaut, J.-F., De Raedt, L., and (Eds.), H. M. (2006). Constraint-based mining and
inductive databases. In

*European Workshop on Inductive Databases*, volume 3848 ofLecture Notes in Artificial Intelligence.

de Amo, S. and Furtado, D. (2005). First-order temporal pattern mining with regular
expression constraints. In

*20th Brazilian Symposium on Databases*, pages 280–294.

de Amo, S., Giacometti, A., Furtado, D. A., and Laurent, D. (2004). An apriori-based
approach for first-order temporal pattern mining. In

*19th Brazilian Symposium onDatabases*, pages 48–62.

de Amo, S., Giacometti, A., and Pereira-Jr., W. (2007). Mining first-order temporal in-
terval patterns with regular expression constraints. In

*9th International Conferenceon Data Warehousing and Knowledge Discovery (DaWaK 2007), LNCS 4654*, pages459–469.

Dehaspe, L. and Toivonnen, H. (1999). Discovery of frequent datalog patterns.

*Data*
*Mining and Knowledge Discovery*, 3:7–36.

Emerson, E. A. (1990). Temporal and modal logic. In van Leeuwen, J., editor,

*In*
*Handbook of Theoretical Computer Science, Volume B: Formal Models and Seman-tics*, pages 995–1072. Elsevier Science Publishers.

Garofalakis, M., Rastogi, R., and Shim, K. (1999). Spirit: Sequential pattern mining with
regular expression constraints. In

*The VLDB Journal*, pages 223–234.

Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., and Dayal, U. (2004). Mining sequential pat-
terns by pattern-growth: The prefix-span approach.

*IEEE Transactions on Knowledgeand Data Engineering*, 16(11):1424 – 1440.

Hoppner, F. (2001). Discovery of temporal patterns – learning rules about the qualitative
behavior of time series. In

*Proceedings of the 5th European Conference on Principlesand Practice of Knowledge Discovery in Databases*, pages 192–203.

Jacobs, N. and Blockeel, H. (2001). From shell logs to shell scripts. In Rouveirol, C. and
Sebag, M., editors,

*Proceedings of the 11th Int. Conf. on Inductive Logic Programming*,volume 2157 of

*LNAI*, pages 80–90.

Lattner, A. and Herzog, O. (2005). Mining temporal patterns from relational data. In

*LWA*
*2005*, pages 184–189.

Lee, S. D. and De Raedt, L. (2002). Constraint based mining of first order sequences in
seqlog. In

*KDID'02*, pages 76–92.

Lorincza, H. A. and Boulicaut, J.-F. (2003). Mining frequent sequential patterns under
regular expressions: a highly adaptive strategy for pushing constraints. In

*ICDM'03*,pages 316–320.

Masson, C. and Jacquenet, F. (2003). Mining frequent logical sequences with SPIRIT-
LoG. In

*Proceedings of the 12th International Conference on Inductive Logic Pro-gramming*, volume 2583 of Lecture Notes in Computer Science, pages 166–181.

Springer.

Pei, J., Han, J., and Wang, W. (2002). Mining sequential patterns with constraints in large
databases. In

*Proceedings of the 2002 ACM CIKM Conference*, pages 18–25.

Srikant, R. and Agrawal, R. (1996). Mining sequential patterns: Generalizations and
performance improvements. In

*EDBT'96*, volume 1057, pages 3–17, Avignon, France.

Springer-Verlag.

Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences.

*Ma-*
*chine Learning*, 42(1-2):31–60.

Appendix: Proof of Theorem 2 -

*Optimality*.

Let

*M *= (

*K, D, T *)

*∈ P*(

*E*)

*,M *= (

*K, D, T *) and D =

*{p*1(

*t*1

*, ., t*1

*, e*
*, ., tn , e*
1)

*, ., pn*(

*tn*
*n*)

*} *and let

*ν*(

*M *) = (

*n, c*).

1. Case 1: if

*c *= 0 then

*M *must be generated by extension, that is,

*M *=

*ρE*(

*M*1),
for some pi-pattern

*M*1. Since the atoms in

*D *are ordered according to the partialorder introduced in Section 2, then

*M0 *= (

*K,D0*,

*T 0*)

*∈ P*(

*E*) with

*• D0 *=

*{p*1(

*t*1

*, ., t*1

*, e*
*, ., tn−*1

*, e*
1)

*, ., pn−*1(

*tn−*1

*• T 0 *=

*{r*(

*ei, ej*)

*∈ T i, j < n}*
2. Case 2: if

*c > *0: Let us suppose that

*M *=

*ρ*(

*M*1) =

*ρ*(

*M*2). Notice that, since

*c > *0, it is not possible that

*M *is obtained by extension from

*M*1 or

*M*2. Then,

*M *=

*ρI*(

*M*1) =

*ρI*(

*M*2). Let

*M*1 = (

*K,D*1,

*T *1) and

*M*2 = (

*K,D*2,

*T *2).

*• D*1 =

*{p*1(

*x*1

*, ., x*1

*, e*
*, ., xn , e*
1)

*, ., pn*(

*xn*
*• D*2 =

*{p*1(

*y*1

*, ., y*1

*, e*
*, ., yn , e*
1)

*, ., pn*(

*yn*
The strings of modes associated to

*M*1 and

*M*2 are respectively7:

*• W *1 =

*p*1(

*u*1

*, ., u*1 )

*, ., p*
*• W *2 =

*p*1(

*v*1

*, ., v*1 )

*, ., p*
(I) Since

*M *=

*ρI*(

*M*1) then there exists

*i*1

*, j*1 (uniquely determined) such that

*ui*1

*j*1=

*∗*,

*xi*1 is a variable and

*ti*1 is a constant. Moreover, for all

*i, j*, if

*i 6*=

*i*
*j 6*=

*j*1 then

*xi *=

*ti *.

(II) Since

*M *=

*ρI*(

*M*2) then there exists

*i*2

*, j*2 (uniquely determined) such that

*ui*2 =

*∗*,

*xi*2 is a variable and

*ti*2 is a constant. Moreover, for all

*i, j*, if

*i 6*=

*i*
*j 6*=

*j*2 then

*xi *=

*ti *.

We must consider three possibilities, namely:
(a)

*i*1 =

*i*2 and

*j*1

*< j*2: in that case, both

*ti*1 and

*ti*2 are constants (according
to points (I) and (II)). Moreover, since

*M *=

*ρI*(

*M*1) and

*j*1

*< j*2 then

*M *is obtained by instantiating a variable in position

*j*1 in the

*i*1-th atom.

But in this

*i*1-th atom, a variable in position

*j*2

*> j*1 is already instantiated.

Contradiction (see Definition 6).

(b)

*i*1

*< i*2: in that case

*vi*2 =

*ui*1 =

*∗ *and

*xi*1

*, yi*2 are constants. That is
impossible, since by Definition 6, a variable

*xi*1 cannot be instantiated if it
appears in the

*i*1-th atom and a variable

*xi*2 in the

*i*
2-th atom (

*i*2

*> i*1) has
already been instantiated before.

(c)

*i*1 =

*i*2 and

*j*1 =

*j*2: in that case we have

*ui*1 =

*vi*2 =

*∗ *and

*xi*1 ,

*yi*2 are
variables. So,

*M*1 and

*M*2 are equivalent (

*M*1 can be obtained from

*M*2 byrenaming its variable

*xi*1 as

*yi*2).

7For the sake of simplifying the notation, we omitted the symbol # at the end of the modes.

Source: http://www.deamo.prof.ufu.br/arquivos/IJDWM2008.pdf

Sexual Offender Treatment ISSN 1862-2941 Legal and Ethical issues when usingAntiandrogenic Pharmacotherapy with SexOffenders Karen HarrisonUniversity of the West of England - Bristol [Sexual Offender Treatment, Volume 3 (2008), Issue 2] The treatment of sex offenders and more specifically the treatment of high-risk sex offenders is asubject of great importance for practitioners, professionals, policymakers and the public at large.Whilst treatment is thought to largely centre upon cognitive-behavioural methods and otherpsychotherapy techniques, in more recent years the use of pharmacotherapy has also begun togain ground. Current debate often centres upon how effective such treatment is; with bothsupporters and opponents of its use existing. This article, however, does not specifically look atwhether pharmacotherapy as a method of treatment with sex offenders actually works, but ratherlooks at the legal and ethical issues surrounding its use. In particular it considers issues such aswhether the treatment should be voluntary or mandatory; whether it should indeed even beclassified as treatment or should instead be seen as punishment and finally whether it should beused with convicted offenders or made freely available to all.

ELECTROSPINNING OF POLY(VINYL ALCOHOL)/CHITOSAN FIBERS FOR WOUND DRESSING APPLICATIONS MS.SUNISA PANBOON A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE (MATERIALS SCIENCE) DEPARTMENT OF INDUSTRIAL CHEMISTRY GRADUATE COLLEGE KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK ACADEMIC YEAR 2005 ISBN 974-19-0476-2