close

Вход

Забыли?

вход по аккаунту

?

SCC.2017.64

код для вставкиСкачать
2017 IEEE 14th International Conference on Services Computing
The Mining of Activity Dependence Relation based on Business Process Models
Guangchang Hu, Budan Wu, Junliang Chen
State Key laboratory of Networking and Switching Technology
Beijing University of Posts and Telecommunications, Beijing, China
relations. Therefore, both the design stage and the run stage
of business process require the support of dependence
relations. Moreover, there are other uses, such as checking
the logical consistency of existing models and reasoning new
relations, etc. Therefore, the roles of activity dependence
relations are as follows:
Process recommendation: recommend the next activity,
process fragment, or a process in imperative modelling;
or recommend a set of relations as constraints in
declarative modelling at design stage [10];
Automatic modelling: generate a complete process
model automatically at design stage or run stage;
Dynamic adaptation: compose a process fragment to
handle the ad hoc change at run stage;
Dynamic execution: compute and execute the next
activity without any process models at run stage;
Model detection: check the logical consistency of a
process model;
Relation reasoning: infer the indirect relations.
However, there is no formal description of dependence
relations in most business domain, where these relations or
business rules exist in the minds of domain experts or in the
unstructured documents [11]. It restricts the development of
process recommendation, process adaptation, automatic
modelling, etc. In order to solve this problem, we propose a
method to obtain the dependence relations automatically
which are implicit in the process models. This method is
different from data mining based on the tables of database
[12] and process mining based on the event logs of process
instances [13], which is based on the process models of a
business domain. We call it dependence relation mining, or
relation mining for short, which will generate dependence
relations by decomposing the existing process models in a
business domain from the control flow dimension. And it
can be used as a supplement to process mining. Therefore,
the research of relation mining has important significance
and practical application value.
Fig. 1 shows the main idea of relation mining and the
basic uses in process modelling or service orchestration. We
separately mine the process models and obtain the
dependence relations which are divided by each business
domain. With these dependence relations, a set of relations
that belong to same domain can orchestrate a new process
model of this domain, e.g. Model 1; and a set of relations
that belong to different domains can orchestrate a new
process model across domains, e.g. Model 2, where the
orchestration approach has been proposed in our previous
study on process flexibility and adaptation [14].
Abstract—With the development of process recommendation,
dynamic adaptation and automatic modeling, the requirement
of explicit and formalized expression of activity dependence
relation in the business domain is becoming more and more
urgent. However, these relations more exist in the minds of
domain experts or in the unstructured documents, which leads
process modeling and adaptation are a time-consuming and
error-prone process. To solve this problem, a relation mining
method is proposed for obtaining activity dependence relations.
The formal description of these relations is defined in control
flow perspective, which is expressed as serial-dependence
relations and parallel-dependence relations in the form of three
tuples after analyzing all the control flow patterns. And a
mining algorithm is proposed for mining these two types of
relations based on the process model. The correctness and
performance of this algorithm are verified by a large number
of experiments, and the experimental results show this method
can quickly and accurately extract all the activity dependence
relations from the existing process models in a business domain.
Keywords-Activity dependence relation; business process
management; information system; pattern discovery; process
mining; workflow pattern
I.
INTRODUCTION
Nowadays, there exists a growing interest in aligning
information systems in a process-oriented way, especially in
process-aware information systems (PAISs) [1-4]. In
business process management (BPM), a service or business
is usually described in the form of business process which
defines the context and the logical relationships between
activities and also specifies both the order of invocations
(control flow) and the rules for data transfer (data flow) [5-7].
The business process is modelled as process model at design
stage and executed as process instance at run stage. In the
aspect of modeling, when a new business requirement
appears, the corresponding process model is modelled to
meet this requirement by the domain experts and IT
developers in a manual manner. However, manual modelling
is a time-consuming, labor-consuming and error-prone way,
while recommendation or automatic modelling is a good way
which provides the next activity or the whole process based
on dependence relations of activities (we call it activity
dependence relations, or dependence relations for short). In
the aspect of adaptation, when an ad hoc change occurs, the
affected process instance needs to be adaptively adjusted at
run stage [8, 9]. However, the alternative processes are
difficult to be predefined for the ad hoc change, where the
changed part can be replaced by a new process fragment
which is dynamically combined based on dependence
2474-2473/17 $31.00 © 2017 IEEE
DOI 10.1109/SCC.2017.64
450
modifying the ECA rule at run time. However, this mapping
is a simple decomposition, where the generated rules are
more applicable to local modification of process instances,
but it cannot build a complex control flow structure or a
complete process model. Bae in [25] proposes a block
detection algorithm to detect blocks from a given process
model for executing the process model dynamically, which
can discover iterative pattern, serial pattern and parallel
pattern. These patterns are similar to the activity dependence
relations defined in this paper. However, this algorithm only
identifies specific structures, and it cannot distinguish
between conditional branch and parallel branch.
In our previous research on process flexibility and
adaptation [14, 26], we found that there was a lack of
dependence relations of activities. This seriously affected the
efficiency and automation degree of process adaptation.
Relation mining can be regarded as another aspect of process
mining, which focuses on analyzing the causal dependence
relationships among activities in one specific business
domain. These relations provide basic support for process
recommendation, adaptation and automatic modelling, etc.
This paper is organized as follows: section II summarizes
the related work, section III gives the definitions of two
kinds of dependence relations, section IV proposes the
mining algorithm, section V shows the experiments of this
algorithm and analyses its performance, and section VI
includes conclusions and future work.
Domain X
Domain Y
Domain Z
Process model
Model 1
Model 2
New process model
Dependency relation
Figure 1. The diagram of dependence relation mining
II.
RELATED WORK
III.
In fact, relation mining is a pattern discovery technique to
obtain the dependence relations between activities in a
business domain. It is similar to the association rule of data
mining and the order relation of process mining.
In data mining [15, 16], association rule learning is the
most famous pattern discovery technology [17]. The apriori
algorithm is used to find the frequent itemsets in the sample,
but it does not consider the order of itemsets. Sequence
mining analyses the order of itemsets [18], but it can only
find the sequential pattern [17, 19]. Episodic mining can find
the partial order relations which are defined by some
frequent episodic, but there are problems in dealing with
concurrency, and it cannot fine the patterns such as selected
pattern, loop pattern [20]. Other similar studies are used to
analyze the event sequence [21], where the processed object
is a standard format or network structure, but they cannot
handle the structure containing conditional branch or parallel
branch, etc. For example, text mining is used to discover the
sequence of letters and words from a text file; DNA
sequence analysis is used to determine the order of the four
bases from a strand of DNA; web analytics is used to analyze
the behavior of visitors from the event logs.
In process mining, process discovery extracts process
knowledge from the event logs. The algorithm is used to
find specific patterns such as sequential-pattern, XORpattern and AND-pattern [22-24], which are identified by
defining four order relations. But there is problem in dealing
with noise, incompleteness and complex paths, and these
four order relations cannot represent OR pattern which can
select multiple branches compared to the XOR pattern.
Although other improved algorithms reduce the effects of
noise and incompleteness, they still cannot obtain all of the
control flow patterns and branching conditions, e.g. heredity
mining and regional comprehensive analysis.
In the similar research based on process model, Jose in [7]
proposes a pattern identification method to map a process
model to ECA rules, which increases process flexibility by
DEPENDENCE RELATION REPRESENTATION
In the control flow perspective, there are 20 workflow
patterns and 43 workflow control-flow patterns [27-29],
which have been widely used in many modeling languages
such as BPMN, Petri net, BPEL, jBPM and EPCs, etc [3033]. In order to simplify and standardize the representation of
dependence relations, this section further refine these
patterns and divide them into 6 workflow meta-patterns
(WMP1-WMP6), as shown in Table I. And we use these 6
meta-patterns to define two types of dependence relations:
one is serial-dependence relation which describes one-to-one
(WMP1-4) relationships; the other is parallel-dependence
relation which describes many-to-one (WMP5) and one-tomany (WMP6) relationships.
Definition 1. (Serial-dependence Relation, RS)
A serial-dependence relation is a tuple RS=(At, Ax, CS),
where:
At and Ax are two adjacent activities (or events) in logic,
where At,AxTE and t,xN*.
At and Ax constitute a ordered relation, where <At,
Ax>(T×T)(T×E)(E×T).
CS is the execution condition of Ax when At is completed.
And the value of CS is C0, Ci or Ex which represent ever
true, an expression or an event. It is represented as
CS=C0CiEx, where CiC, ExE and i,xN*.
T, C and E respectively represent activity set, condition
set and event set in a business domain.
Therefore, the RS describes the serial relationship of two
activities which are directly related to each other, where At,
Ax and CS are called RS-Former, RS-Latter and RS-Condition.
In the BPMN symbol, the RS-Former and RS-Latter of an RS
are two adjacent activities which are directly connected by a
sequence flow or indirectly connected by multiple sequence
flows across one or more gateways, and its RS-Condition is
the superposition of all related conditions of sequence flows.
451
TABLE I.
THE MAPPING OF WORKFLOW PATTERN, WORKFLOW CONTROL-FLOW PATTERN AND WORKFLOW META-PATTERN
Workflow Pattern
WP1: Sequence
WP2: Parallel Split
WP3: Synchronization
WP4: Exclusive Choice
WP5: Simple Merge
WP6: Multiple Choice
WP7: Synchronizing Merge
WP8: Multiple Merge
WP9: Discriminator
(include N-out-of-M)
WP10: Arbitrary Cycles
WP11: Implicit Termination
WP12-WP15: Multiple Instance Patterns
WP16-WP18: State-based Patterns
WP19-WP20: Cancellation Patterns
TABLE II.
Dependence Relation
RS=(At, Ax, CS)
(CS=C0CiEx)
RP=(At, UA, CP)
(CP=CaCb)
Workflow Control-flow Pattern
WCP1: Sequence
WCP2: Parallel Split
WCP3: Synchronization
WCP33: Generalized AND-Join
WCP4: Exclusive Choice
WCP5: Simple Merge
WCP6: Multi-Choice
WCP7: Structured Synchronizing Merge
WCP37: Local Synchronizing Merge
WCP38: General Synchronizing Merge
WCP8: Multi-Merge
WCP9: Structured Discriminator
WCP28: Blocking Discriminator
WCP29: Cancelling Discriminator
WCP30: Structured Partial Join
WCP31: Blocking Partial Join
WCP32: Cancelling Partial Join
WCP10: Arbitrary Cycles
WCP21: Structured Loop
WCP22: Recursion
WCP23: Transient Trigger
WCP24: Persistent Trigger
WCP11: Implicit Termination
WCP43: Explicit Termination
WCP12-WCP15, WCP34-WCP36, WCP41-WCP42
WCP16-18, WCP39-WCP40
WCP19-WCP20, WCP25-WCP27
Workflow Meta-Pattern
WMP1: (At, Ax, C0)
WMP1: (At, Ax, C0) and WMP5: (At, UA, Ca), AxUA
WMP1: (Ax, At, C0) and WMP6: (At, UA, Cb), AxUA
WMP2: (At, Ax, Ci)
WMP1: (At, Ax, C0)
WMP2: (At, Ax, Ci) or WMP3: (At, Ex, Ex)
WMP1: (At, Ax, C0)
WMP1: (At, Ax, C0)
WMP1: (At, Ax, C0)
WMP1: (At, Ax, C0) and WMP2: (Ax, At, Ci); or
WMP4: (At, At, Ci)
WMP1: (At, Ex, C0) or WMP3: (At, Ex, Ex)
WMP1: (At, Ax, C0)
WMP1: (At, Ax, C0) and/or WMP2: (At, Ax, Ci)
WMP1: (At, Ex, C0)
THE MAPPING BETWEEN DEPENDENCE RELATION AND WORKFLOW META-PATTERN
Workflow Meta-Pattern
WMP1: (At, Ax, C0)
WMP2: (At, Ax, Ci)
WMP3: (At, Ex, Ex)
WMP4: (At, At, Ci)
WMP5: (At, UA, Ca)
WMP6: (At, UA, Cb)
Workflow Pattern
WP1-WP3, WP5, WP7-WP20
WP4, WP6, WP10, WP16-WP18
WP6, WP11
WP10
WP2
WP3
BPMN symbol
Sequence flow, AND-Split, *-Join
XOR/Inclusive/Complex-Split
Event-based
XOR/Inclusive/Complex-Split
AND-Split
AND-Join
the execution condition. Moreover, if there is an RP (e.g.
RP=(At, UA, Ca)) and UA={Ax, Ay}), it must exist n (n=|UA|)
RS (e.g. RS=(At, Ax, C0) and RS=(At, Ay, C0)). Table II shows
the corresponding relationships between dependence relation,
workflow meta-pattern, workflow pattern and BPMN [34].
The "*-Join" represents all the join part of gateways, such as
AND-Join, XOR-Join, Inclusive-Join, and Complex-Join. In
addition, the notations of different modeling languages can
be mapped to each other.
Definition 2. (Parallel-dependence Relation, RP)
A parallel-dependence relation is a tuple RP=(At, UA, CP),
where:
At is an activity (or event), where AtTE and tN*.
UA is a set of activities which are executed in a parallel
way with respect to At.
CP is the execution condition and its value is Ca or Cb,
which is represented as CP=CaCb. The Ca represents the
activities in UA need be executed after At is completed
(that is AxUA, meet <At, Ax>). And the Cb represents
the activities in UA need to be completed before At is
executed (that is AxUA, meet <Ax, At>).
T and E respectively represent activity set and event set
in a business domain.
Therefore, the RP describes the parallel relationship of a
set of activities with respect to a particular activity, where At,
UA and CP are respectively called RP-Former, RP-Latter and
RP-Condition. In the BPMN symbol, the RP-Former and RPLatter of an RP are directly connected by parallel gateway
and sequence flows, and its RP-Condition is decided by the
AND-Split (CP=Ca) or AND-Join (CP=Cb).
The activity dependence relation defines the execution
order and condition of business activities in a domain. The
different types of dependence relation are distinguished by
IV.
DEPENDENCE RELATION MINING
Normally, a business process is modelled by one or more
workflow meta-patterns in the control flow perspective. That
is, a process model explicitly or implicitly describes a set of
dependence relations. There are a large number of process
models have been validated in each business domain, which
basically cover the basic or core business of this domain.
Therefore, we can obtain these dependence relations in a
business domain by mining their process models. This
section proposes a method for extracting dependence
relations from one process model in the control flow
perspective, which is called DRM (Dependence Relation
Mining, DRM). Taking the process model defined by BPMN
2.0 as an example, we discuss the DRM algorithm and its
two sub-algorithms in this section.
452
A. DRM algorithm
The goal of DRM algorithm is to generate the activity
dependence relations which have been included in a process
model, where its input is a process model. This algorithm
includes two sub-algorithms which are SDRM (Serial
Dependence Relation Mining, SDRM) sub-algorithm and
PDRM (Parallel Dependence Relation Mining, PDRM) subalgorithm. The basic idea of DRM algorithm is to deal with
the elements (that is flow objects in BPMN) in a process
model one by one, where: if the current element is an activity
or event, SDRM will be invoked and generates one or more
RS; if the current element is AND-Split or AND-Join, PDRM
will be invoked and generates one or more RP.
items of this QUEUE one by one until it is empty. The main
process for handling the current head item "Lt" is as follows:
if Lt is an activity or event, the SDRM sub-algorithm is
invoked and generates all the RS with Lt as the RS-Former; if
Lt is an AND-Split or AND-Join, the PDRM sub-algorithm is
invoked and generates all the RP with Lt as the RP-Former; if
Lt is the other type of elements (e.g. other gateways), it is
deleted from the QUEUE. Moreover, when the invoked subalgorithm is returned, DRM algorithm will delete the current
head item from the QUEUE and begin to deal with the next
head item.
End DRM algorithm
Y
Delete the head item
and clear all arc labels
QUEUE empty?
N
Algorithm 1: DRM algorithm
INPUT: process model (M)
OUTPUT: activity dependence relation (RS and RP)
Description:
Lt: an element in M which is an activity, event or gateway;
Flag: a completed mark of RP-Latter;
aFlag, bFlag: a completed mark of RP-Former.
Algorithm:
1
For each element Lt from M do
2
If (Lt is not a gateway and Lt is not an end event) then
3
Select the arc which starts at Lt and ends at element Lx;
4
Initialize CS=C0;
5
SDRM(Lx);
6
Else if (Lt is an AND-Split or AND-Join) then
7
Initialize UA=, Flag=0, aFlag=bFlag=1;
8
PDRM(Lt);
9
Else if (Lt is an end event) then
10
Continue;
11
End if
12
End for
Put all the elements of M
in QUEUE
Assign the head item
of QUEUE to Lt
Start DRM algorithm
Lt is gateway?
PDRM
sub-algorithm
Y
N
Y
AND-Split/Join?
N
Output <At, Lx, CS>
Push Lt into STACK
and Set: At=Lt, CS=C0
Assign the top item
of STACK to Lt
Push Lx into STACK
and Set Lt=Lx
Select an unlabeled arc
L t Ci L x
Set CS=CSCi
Y
N
N
Lx is gateway?
Exist?
Label Lt Ci Lx
Set CS=CS+Ci
N
STACK empty?
Y
Delete the top item
of STACK
SDRM
Y
Figure 2. The flowchart of DRM algorithm
As shown in algorithm 1, DRM algorithm is mainly to
invoke the corresponding sub-algorithms by identifying the
type of elements. Since there is only one arc (that is sequence
flow in BPMN) out of each activity or event (but not end
event), we just need to select this arc when the current
element is an activity, start event or middle event. In all
types of gateways, other gateways are ignored except for the
parallel gateway which will generate RP. Moreover, this
algorithm implements a series of initialization before
invoking the sub-algorithms, where the RS-Condition is
initialized to C0 before invoking SDRM sub-algorithm and
the RP-Latter is initialized to (empty set) before invoking
PDRM sub-algorithm. Because AND-Split/Join has more
than one branch, we need to set up a flag "Flag" to mark
whether all the branches have been accessed. For example,
"Flag==1" represents all the arcs starting at the current
element have been accessed. Although gateways and
activities are often placed alternately in a process model, but
there is a direct connection between the AND-Split/Join and
other gateways. So, we need to set up two flags, "aFlag" and
"bFlag", to mark whether all the RP-Formers have been
found. For example, "aFlag==1" represents all the precursors
(activities) of AND-Split have been found, and "bFlag==1"
represents all the successors (activities) of AND-Join have
been found.
The flowchart of DRM algorithm is shown in Fig. 2.
Suppose we use a queue "QUEUE" to store all the elements
of a process model. So, DRM algorithm deals with the head
The detailed process of SDRM sub-algorithm is shown
in the lower dashed boxes of Fig. 2. We use a stack
"STACK" to store Lt and other elements which are
connected by those arcs that start at Lt. In the SDRM subalgorithm, the RS-Condition need to be generated in addition
to search for the RS-Latter. Because an RS may involve one
or more arcs (e.g. across multiple XOR-Split), the RSCondition is the superposition of all the conditions of
relevant arcs (e.g. the conditions of sequence flows).
Moreover, because one Lt may generate more than one RS,
the RS-Condition of different RS which has been superposed
may require rollback. We use "+" and "" to represent the
superposition operation and rollback operation, where:
Superposition: C0+C0C0, C0+CiCi, Ci+CjCi&&Cj;
Rollback: C0C0C0, CiC0Ci, Ci&&CjCjCi.
Moreover, we have not given the detailed process of the
PDRM sub-algorithm in Fig. 2, which is similar to the
SDRM sub-algorithm except for dealing with the RPConditions. According to the different types of Lt, the
process of PDRM sub-algorithm is divided into two cases.
When Lt is AND-Split, this sub-algorithm searches the RPLatter in the direction of the relevant arcs and searches the
RP-Former in the opposite direction of the relevant arcs, and
the RP-Condition is Ca. When Lt is AND-Join, this subalgorithm searches the RP-Latter in the opposite direction of
the relevant arcs and searches the RP-Former in the direction
of the relevant arcs, and the RP-Condition is Cb.
453
B. DRM sub-algorithm
As shown in algorithm 2, SDRM sub-algorithm is mainly
to search for each RS-Latter and the corresponding RSCondition according to the current head item of QUEUE
which is the RS-Former. For each head item Lt (Lt is an
activity or event), a multi-tree is formed by Lt and other
related elements which are connected by Lt. The root node is
Lt in this multi-tree. The leaf nodes are activities or events,
and the intermediate nodes are other types of elements.
As shown in algorithm 3, PDRM sub-algorithm is mainly
to search for each RS-Latter and RS-Former according to the
current head item of QUEUE which is an AND-Split/Join.
For each head item Lt (Lt is an AND-Split or AND-Join), two
multi-trees are formed by Lt and other related elements
which are connected by Lt. The root node is Lt in these two
multi-trees. The leaf nodes are activities or events, and the
intermediate nodes are other types of elements.
Algorithm 3: PDRM Sub-algorithm
Description:
Flag==1: RP-Latter search has been completed;
aFlag==1: RP-Former search has been completed from AND-Split;
bFlag==1: RP-Former search has been completed from AND-Join.
Algorithm:
1
If ((Lt is AND-Split)||(aFlag==0)) then
2
If (Flag==0) then
3
For each element Ly which has an arc from Lt to Ly do
4
If (Ly is AND-Split) then
5
PDRM(Ly);
6
Else if (Ly is not a gateway) then
7
Add Ly into UA;
8
End if
9
End for
10
Set Flag=1;
11
Else if (aFlag==0) then
12
For each element Lz which has an arc from Lz to Lt do
13
If (Lz is a gateway and Lz is not AND-Split) then
14
aFlag=0; PDRM(Lz); aFlag=1;
15
Else if (Lz is not a Gateway) then
16
Generate RP=(Lz, UA, Ca);
17
End if
18
End for
19
End if
20 Else if ((Lt is AND-Join)||(bFlag==0)) then
21
If (Flag==0) then
22
For each element Ly which has an arc from Ly to Lt do
23
If (Ly is AND-Join) then
24
PDRM(Ly);
25
Else if (Ly is not a gateway) then
26
Add Ly into UA;
27
End if
28
End for
29
Set Flag=1;
30
Else if (bFlag==0) then
31
For each element Lz which has an arc from Lt to Lz do
32
If (Lz is a gateway and Lz is not AND-Join) then
33
bFlag=0; PDRM(Lz); bFlag=1;
34
Else if (Lz is not a gateway) then
35
Generate RP=(Lz, UA, Cb);
36
End if
37
End for
38
End if
39 End if
Algorithm 2: SDRM Sub-algorithm
Description:
CS=CS+Ci: superposition of the condition of the current arc;
CS=CSCi: rollback of the condition of the current arc.
Algorithm:
1
If (Lx is a gateway) then
2
For each element Ly which has an arc from Lx to Ly with Ci do
3
If (Lx is an event-based gateway) then
4
Set Ci=Ly;
5
End if
6
Set CS=CS+Ci;
7
SDRM(Ly);
8
Set CS=CSCi;
9
End for
10 Else
11
Generate RS=(Lt, Lx, CS);
12 End if
The basic idea of SDRM sub-algorithm is to search the
multi-tree in a depth first search manner, and each path from
the root node to the leaf node can generate an RS, where: the
root node and leaf node are the RS-Former and RS-Latter, and
the superposition of all the conditions of edges in this path is
the RS-Condition. For example, there are three RS which are
(At, Ax, Ci), (At, Ay, Cj&&Ck) and (At, Az, Cj&&Cp) in Fig. 3,
where the head item Lt is At and two XOR-Split correspond
to Am and An in the multi-tree.
Ci
At
Ax
Ck
At
Am
Search
direction
An
Am
Ay
Ci
Cj
Cj
Ax
Cp
Az
Process model
An
Ck
Multi-tree
Cp
Ay
Az
Figure 3. The corresponding relationship about XOR-Split
Search direction
At
Am
Ap
An
Ax
At
Am
Ay
Az
Process model
Ap
An
Ax
Ay
Az
The basic idea of PDRM sub-algorithm is to search these
two multi-trees separately in a depth first search manner, and
all the leaf nodes of one tree and each leaf node of another
tree can generate an RP, where the former is RS-Latter, the
latter is RS-Former. When Lt is AND-Split, it searches for all
leaf nodes from Lt according to the direction of edges, and it
searches for each leaf node from Lt according to the opposite
direction of edges. When Lt is AND-Join, it searches for all
leaf nodes from Lt according to the opposite direction of
edges, and it searches for each leaf node from Lt according to
the direction of edges. For example, there are four RP which
are (At, {Ax, Ay, Az}, Ca), (Ap, {Ax, Ay, Az}, Ca), (At, {Ax, Ay,
Az}, Cb), (Ap, {Ax, Ay, Az}, Cb) in Fig. 4 and Fig. 5, where
An
Multi-tree
Figure 4. The corresponding relationship about AND-Split
Ax
Ax
An
At
Am
Ap
Ay
Az
Process model
Ay
An
Az
Am
An
Search direction
Multi-tree
At
Ap
Figure 5. The corresponding relationship about AND-Join
454
the head item Lt is An and the XOR-Join correspond to Am in
these multi-trees.
Generally, there exist two or more gateways that are
directly connected in a nested way. When these gateways
are the same type, direct nesting of multiple gateways is
equivalent to a single gateway which has the same number
of branches. The PDRM sub-algorithm can handle these
structures, where it only needs to handle the outermost
AND-Split/Join. Meanwhile, when these gateways are the
different type, direct nesting of multiple gateways can
transform the structures which have the equivalent effects.
The PDRM sub-algorithm can handle these structures by
simply modifying. Therefore, this algorithm can easily deal
with complex nested structures.
Moreover, two gateways are in pairs except for eventbased gateway in a standard model structure. However,
there are some non-standard cases in the existing process
model, which are not affect the description and execution of
process model. For example, there is a XOR-Split but no
XOR-Join, or there is multiple XOR-Split but only one
XOR-Join in a process model. The DRM algorithm can be
applied to these non-standard structures.
Initial treatment & operation planning (T8), Operative
treatment (T9); Clinical Suspicion of Cruciate Rupture = no
(C1), Clinical Suspicion of Cruciate Rupture = yes (C2),
Cruciate Rupture = no or Operation Indicated = no (C3),
Cruciate Rupture = yes and Operation Indicated = yes (C4).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi="http://www.omg.org/spec...
<process id="PROCESS_1" isClosed="false" isExecutable="true" processType="None">
<task completionQuantity="1" id="_2" isForCompensation="false" name="A2" startQuantity="1">
<incoming>_21</incoming>
<outgoing>_23</outgoing>
</task>
<exclusiveGateway gatewayDirection="Diverging" id="_12" name="XOR-Split 1">
</exclusiveGateway>
<parallelGateway gatewayDirection="Converging" id="_14" name="AND-Join 3">
</parallelGateway>
<startEvent id="_10" isInterrupting="true" name="Es" parallelMultiple="false">
</startEvent>
<endEvent id="_11" name="Ee">
</endEvent>
<sequenceFlow id="_21" name="Condition 1" sourceRef="_12" targetRef="_2">
<conditionExpression><![CDATA[C1]]></conditionExpression>
</sequenceFlow>
</process>
XML file
</definitions>
C1
T2
T1
C2
Ee
C4
T6
A10
0
A1
0
A12
2
A13
0
0
0
The experiments of dependence relation mining are
mainly used to analyze the number of dependence relations
and the time of mining process, where the former and the
latter are called relation number and mining time for short.
In this section, we first give the mapping method of XML
parsing before verifying the DRM algorithm. Then, we
verify the correctness of DRM algorithm by mining the
process model of cruciate rupture treatment [35] and
compare the relation number and mining time of different
process models by mining 20 process models [36] provided
by OMG organization. And last, we analyze the complexity
of DRM algorithm and evaluate the performance of DRM
algorithms based on the process models with different scales.
T8
T9
BPD view
0
A2
DRM EXPERIMENT
T3
T7
T5
1
V.
C3
T4
Es
A4
A5
0
0
A6
0
A17
3
A14
0
A3
0
A11
0
0
A7
0
A16
A15
0
4
A8
0
A9
Direct graph
Figure 6. The XML parsing of treatment model
Taking this model as an example, the mapping between
XML file, BPD view and directed graph is shown in Fig. 6.
The mapping relations are as follows:
Tag "<task>" represents this element is an activity,
which is corresponding to A2 in the direct graph. All the
elements are uniquely identified by the attribute "id",
and there is a one-to-one correspondence between
element ID and vertex ID.
Tag "<startEvent>" and "<endEvent>" represent the start
and end event of process model, which respectively
correspond to A10 and A11 in the direct graph.
Tag "<exclusiveGateway>" and "<parallelGateway>"
represent the exclusive and parallel gateway. The split
part and join part are distinguished by the attribute
"gatewayDirection". In the directed graph, these vertexes
which are corresponding to gateways include A12 (XORSplit), A14 (AND-Join), and A17 (XOR-Join), etc.
Tag "<sequenceFlow>" represents the arc between
elements, and its start point and end point are given by
the attribute "sourceRef" and "targetRef". If the
"sourceRef" is XOR-Split, Inclusive-Split or ComplexSplit, the arc has tag "<conditionExpression>" which
gives the execution condition. And this condition is
corresponding to the weight of edge in the directed
graph. For example, the weight of the edge <A12, A2> is
1. Moreover, when the condition of arc is C0, the weight
of edge is 0 in the direct graph.
A. XML parsing
In our experiments, the process models are defined by
BPMN notation. In BPMN 2.0, process model is a formal
model with the ".bpmn" or ".bpmn2" extension which is
described in XML format (XML file) and displayed in a
graphical way (BPD view). We need to parse the XML file
of one process model before executing the DRM algorithm.
The purpose of XML parsing is to extract all the elements
(flow objects), arcs (sequence flows) as well as their
conditions in the process model and map them into a directed
graph. These elements and arcs are mapped to the vertexes
and edges of a directed graph, and the condition of each arc
is mapped to the weight of the corresponding edge.
Therefore, the DRM algorithm will generate all the activity
dependence relations of one process model by traversing the
directed graph corresponding to this process model.
The events, activities and execution conditions included
in treatment model are Patient admission (Es), Patient
discharge (Ee); Anamnesis & clinical examination (T1),
Conventional therapy (T2), Documentation (T3), Sonography
(T4), X-ray (T5), MRT (T6), Non-operative therapy (T7),
B. Comparison on mining different models
By analyzing the treatment model in Fig. 6, there are 17
elements and 19 activity dependence relations which contain
16 RS and 3 RP. In order to verify the correctness of DRM
455
algorithm, we perform the DRM algorithm on this model.
The experiment result is shown in Fig. 7, and it is consistent
with the analysis results of this model. Therefore, this
algorithm can accurately extract activity dependence
relations from process models.
time" and "PDRM time". The experiment results show that
the "Relation number" and "Mining time" generally increase
with the "Element number", and the "RS number" accounts
for most of the "Relation number", where the "SDRM time"
accounts for most of the "DRM time".
Through analysis of these 21 models and experiment
results, the average number of activity dependence relations
generated by one process model is 19.52, where the average
number of elements is 20.05. And the average time of
mining one process model is 30.45ms, where the average
time of XML parsing and DRM algorithm are 20.36ms and
10.09ms. Moreover, the relation number and mining time of
the treatment model (M11) are 19 and 28.53ms, where the
time of DRM algorithm is 9.41ms. Therefore, for most of
the process models, mining one process model will consume
about 30ms, and the relation number is approximately equal
to the element number in the model.
There are a few inconsistent cases in Fig. 8. For example,
the element number of M17 is greater than M16, but both the
relation number and mining time of M17 is less than M16.
This is because there are other factors affecting DRM results
in addition to the element number, such as element type and
logical structure. In the following section, we will verify the
performance of DRM algorithm by comparing different
element number, element type and logical structure in one
process model.
The activity dependence relations of process model “cruciate
rupture treatment” in business domain “Medical field”:
Serial-dependency Relation (RS):
(Es, T1, C0), (T1, T2, C1), (T1, T4, C2), (T1, T5, C2), (T1, T6, C2),
(T2, T3, C0), (T4, T7, C3), (T4, T8, C4), (T5, T7, C3), (T5, T8, C4),
(T6, T7, C3), (T6, T8, C4), (T7, T3, C0), (T8, T9, C0), (T9, T3, C0),
(T3, Ee, C0).
Parallel-dependency Relation (RP):
(T1, {T4, T5, T6}, Ca); (T7, {T4, T5, T6}, Cb), (T8, {T4, T5, T6}, Cb).
Figure 7. The DRM result of treatment model
In addition to the treatment model, we choose other 20
process models to compare the relation number and mining
time among different process models, which are provided by
the OMG and BPI [36]. These 21 models are numbered as
M1-M21 according to the number of elements contained in
each model. Among them, the smallest model (M1) and
largest model (M21) contains 11 and 50 elements, and the ID
of treatment model (containing 17 elements) is M11. We use
MATLAB R2014b to carry out DRM experiments on these
21 process models, where the experiment environment is the
dual core 2.20GHz, 4G memory, and 64 bit system.
C. DRM algorithm performance
Before verifying the performance of DRM algorithm, we
first analyse the time complexity and space complexity of
this algorithm. In fact, the DRM algorithm adopts the idea
of depth first search, and each element and its connected
elements are considered to be a multi-tree. According to the
complexity of depth first search [37], the time complexity
and space complexity of DRM algorithm are O(nbl) and
O(nbl), where: n is the element number of process model
(also the number of multi-trees); b and l are the maximum
branch number and leaf depth of all multi-trees (the size of
multi-tree). In particular, the time complexity and space
complexity of DRM algorithm are O(n) when the process
model does not have any branch structure (that is b=l=1),
where this process model has (n1) RS and each RSCondition is C0.
In order to evaluate the performance of DRM algorithm,
we design a series of model samples with different scales
for experiment, where these samples involve the different
number, type and logical structure of elements. Fig. 9 shows
five basic structures which are formed by one or more
workflow meta-patterns (that is activity dependence
relation), where: x is the branch number of gateway; m is the
repeated times of one or a group of workflow meta-patterns,
|RS| and |RP| are the number of RS and RP included in the
structure. We use different values of m and x to form
multiple model samples for verifying the performance of
DRM algorithm, which are shown in Table III. When
comparing one of these parameters, the other parameters are
set to a fixed value, where the element number of the largest
sample is 100 (nmax=100).
Figure 8. The comparison experiment of these 21 process models
Fig. 8 shows the parameters of these 21 models and
experiment results, where the abscissa is the process model
ID and the ordinate is the number or time. The "Element
number" represents the number of elements in each model,
which contains "Activity number" and "Gateway number".
The "Relation number" represents the number of activity
dependence relations which contains the "RS number" and
"RP number". The "Mining time" represents the time of the
whole mining process which contains the "XML time" and
"DRM time". And the "DRM time" is divided into "SDRM
456
TABLE III.
Model
SM1
SM2
SM3
SM4
SM5
THE MODEL SAMPLES WITH DIFFERENT SCALES
mmin
1
2
2
Type
WMP1
WMP2A
WMP2B
WMP5A
WMP5B
mmax
99
33
33
xmin
2
2
2
2
xmax
98
2
98
2
nmin
2
4
7
4
7
In addition to the fast mining time, the DRM method can
handle process models with standard, non-standard and
complex nested structure. This method generates the activity
dependence relations of a business domain which has a wide
application in process modelling, such as recommendation,
dynamic adaptation, automatic modelling and so on. Table
IV shows the difference between this method and other
methods which are mentioned in Section II. Therefore, the
DRM method can effectively solve the problem of lack of
relations among business activities in each domain.
nmax
100
100
100
100
100
Fig. 10 shows the trends of model samples with different
scales, where the abscissa is the value of m or x and the
ordinate is the algorithm time. The experiment results show
that the XML time and DRM time increase linearly and
nonlinear with the increase of sample scale. XML time is
mainly related to the number of elements and execution
conditions, but not with the type or structure of elements.
DRM time is related to the number, type and structure of
elements, where the DRM time of parallel structure is
longer due to the existence of PDRM time. Although DRM
time will increase with the increase of the scale of process
model, the DRM algorithm can complete the mining of a
process model in a short time (millisecond) for most of the
process models.
VI.
D. Comparison with relevant methods
Generally, a business domain usually contains multiple
process models. For example, medical treatment domain has
98 models, and automobile manufacturing domain has 49
models [38, 39]. It will take a longer time for mining all the
process models for a domain. Take multimedia conference
system [40, 41] as an example, it has 30 process models
which include 60 events and 51 activities. The experiment
results show the total relation number is 228, where the RS
number and RP number are 226 and 2. And the total mining
time is 449.67ms, where XML time and DRM time are
321.76ms and 127.91ms. Although the number of process
models varies in different domains, the total mining time
may range from a few milliseconds to a few seconds.
C1
A1
A2
An
At
m=1,2,...
C2
Cx
C1
A1
A2
At
Cx
Ax
(a) C0: |RS|=m; b=l=1, n=m+1. (b) Cx: |RS|=x; b=x, l=2, n=x+2.
C2
A1
ACKNOWLEDGMENT
This research is supported by the National 973 Programs
(Grant No.2013CB329102), the National Natural Science
Foundation of China (Grant No. 61003067, 61132001);
Project of New Generation Broad band Wireless Network
under Grant No. 2014ZX03006003; and the technology
development and experiment of innovative network
architecture (CNGI-12-03-007).
C1’
C2 ’
A2
Ax
CONCLUSION
Activity dependence relations are the foundation and
precondition for many business applications. In this paper,
we propose a method to obtain this kind of relations from
the existing process models in a business domain. This
method can quickly and accurately extract these relations
from one process model. However, some relations generated
from different models in the same domain may be redundant,
inconsistent or conflicting, which requires a set of strategies
to manage these relations in a domain. In the future work,
we will present a set of management strategies and develop
a relation mining and management tool. And we will further
analyze the relations of multimedia conference system to
provide an automated recommendation method for process
modeling and process flexibility.
Cx’
m=2,3,...
A 1’
A 2’
At
A2
Ax’
(c) Cx: |RS|=mx; b=x, l=2, n=m(x+1)+1.
C1
A1
At
Ax
C2
Cx
A1
A2
Ax
C 1’
C2 ’
Cx’
m=2,3,...
A 1’
A 2’
Ax’
(d) C0: |RP|=1; b=x, l=2, n=x+2. (e) C0: |RS|=mx, |RP|=m; b=x, l=2, n=m(x+1)+1.
Figure 9. The various models with different scales
Figure 10. The DRM time of model samples with different scales
TABLE IV.
Relevant method
Dependence relation mining
Association rule learning
Process discovery
Pattern identification
Block detection
THE COMPARISON WITH RELEVANT METHODS
Processing object
Process models in a domain
Tables in database
Process instance in event logs
One process model
One process model
Objective
Mining activity relation
Mining frequent itemsets
Discovering process model
Mapping ECA rule
Detecting model structure
457
Application
Process modelling
Behavior analysis
Model discovery
Process changes
Automatic execution
Reusability
All models of a domain
None
None
All instances of the model
All instances of the model
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21] WMPvd A. Process Mining Discovery, Conformance and
Enhancement of Business Processes[J]. 2010.
[22] Van Dongen B F, De Medeiros A K A, Wen L. Process mining:
Overview and outlook of petri net discovery algorithms[M]//
Transactions on Petri Nets and Other Models of Concurrency II.
Springer Berlin Heidelberg, 2009: 225-242.
[23] Van der Aalst W, Weijters T, Maruster L. Workflow mining:
Discovering process models from event logs[J]. IEEE Transactions
on Knowledge and Data Engineering, 2004, 16(9): 1128-1142.
[24] van der Aalst W M P, Stahl C. Modeling business processes: a petri
net-oriented approach[M]. MIT press, 2011.
[25] Bae J, Bae H, Kang S H, et al. Automatic control of workflow
processes using ECA rules[J]. IEEE transactions on knowledge and
data engineering, 2004, 16(8): 1010-1023.
[26] Hu G, Wu B, Zou H, et al. Dynamic integration mechanism of
business process for Internet of Things[C]//Services Computing
(SCC), 2014 IEEE International Conference on. IEEE, 2014: 852-853.
[27] van Der Aalst W M P, Ter Hofstede A H M, Kiepuszewski B, et al.
Workflow patterns[J]. Distributed and parallel databases, 2003, 14(1):
5-51.
[28] Börger E. Approaches to modeling business processes: a critical
analysis of BPMN, workflow patterns and YAWL[J]. Software &
Systems Modeling, 2012, 11(3): 305-318.
[29] Russell N, Ter Hofstede A H M, Van Der Aalst W M P, et al.
Workflow control-flow patterns: A revised view[J]. BPM Center
Report BPM-06-22, BPMcenter. org, 2006: 06-22.
[30] Curtis B, Kellner M I, Over J. Process modeling[J]. Communications
of the ACM, 1992, 35(9): 75-90.
[31] Dijkman R M, Dumas M, Ouyang C. Semantics and analysis of
business process models in BPMN[J]. Information and Software
technology, 2008, 50(12): 1281-1294.
[32] Papazoglou M. Web services: principles and technology[M]. Pearson
Education, 2008.
[33] Allweyer T. BPMN 2.0: introduction to the standard for business
process modeling[M]. BoD–Books on Demand, 2016.
[34] Notation (BPMN) Poster[J]. Faculty of Electrical Engineering and
Computer Science, Institute of Informatics, University of
Maribor,[Available at http://www. itposter. net/itPosters/bpmn/bpmn.
htm](January 1st 2009), 2007.
[35] Lenz R, Reichert M. IT support for healthcare processes–premises,
challenges, perspectives[J]. Data & Knowledge Engineering,
2007,61(1):39-58.
[36] BPI. Business Process Model[EB/OL]. https://www.businessprocess
incubator.com /category/type/templates/.
[37] Russell S J, Norvig P. Artificial intelligence: a modern approach (The
Third Edition)[J]. 2012.
[38] Weber B, Reichert M, Rinderle-ma S. Change Patterns and Change
Support
Features:
Enhancing
Flexibility
in
ProcessAwareInformation Systems[J]. Data & Knowledge Engineering,
2008,66(3):438-466.
[39] Weber B, Sadiq S W, Reichert M. Beyond rigidity - dynamic process
lifecycle support[J]. Computer Science - Research and Development,
2009,23(2):47-65.
[40] Bo C, Jie G, Junliang C, et al. A preliminary practice for BPEL based
multimedia
conference
web
services
orchestration[C].
Telecommunications, 2008. AICT'08. Fourth Advanced International
Conference on, 2008. IEEE.
[41] Bo C, Junliang C, Min D. Petri net based formal analysis for
multimedia conferencing services orchestration[J]. Expert Systems
with Applications, 2012,39(1):696-705.
Dumas M, Van der Aalst W M, Ter Hofstede A H. Process-aware
information systems: bridging people and software through process
technology[M]. John Wiley & Sons, 2005.
Reichert M, Weber B. Process-Aware Information Systems[M].
Enabling Flexibility in Process-Aware Information Systems. Springer
Berlin Heidelberg, 2012: 9-42.
Weske M, Hündling D W I J, Schuschel D W I H. Process-oriented
Information Systems[J].
Reichert M, Rinderle-Ma S, Dadam P. Flexibility in process-aware
information systems[M]//Transactions on Petri Nets and Other
Models of Concurrency II. Springer Berlin Heidelberg, 2009: 115135.
Van Der Aalst W M P, Ter Hofstede A H M, Weske M. Business
process management: A survey[C]//International conference on
business process management. Springer Berlin Heidelberg, 2003: 112.
Weske M. Business process management architectures[M]//Business
Process Management. Springer Berlin Heidelberg, 2012: 333-371.
Bernal J F M, Morisio M. Towards a dynamic rule-based business
process[J]. International Journal of Web and Grid Services, 2010,
6(4): 385-399.
Van Der Aalst W M P. Business process management: a
comprehensive survey[J]. ISRN Software Engineering, 2013, 2013.
Rinderle S, Reichert M, Dadam P. Correctness criteria for dynamic
changes in workflow systems - a survey.[J]. Data Knowl. Eng.,
2004,50(1):9-34.
De Giacomo G, Dumas M, Maggi F M, et al. Declarative process
modeling in BPMN[C]//International Conference on Advanced
Information Systems Engineering. Springer International Publishing,
2015: 84-100.
Zur Muehlen M, Indulska M, Kamp G. Business process and business
rule modeling languages for compliance management: a
representational analysis[C]//Tutorials, posters, panels and industrial
contributions at the 26th international conference on Conceptual
modeling-Volume 83. Australian Computer Society, Inc., 2007: 127132.
Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to
knowledge discovery in databases[J]. AI magazine, 1996, 17(3): 37.
van der Aalst W M P, Reijers H A, Weijters A J M M, et al. Business
process mining: An industrial application[J]. Information Systems,
2007, 32(5): 713-732.
Hu G, Wu B, Chen J. Dynamic adaptation of business process based
on context changes: a rule-oriented approach[C]//International
Conference on Service-Oriented Computing. Springer International
Publishing, 2013: 492-504.
Berry M J, Linoff G. Data mining techniques: for marketing, sales,
and customer support[M]. John Wiley & Sons, Inc., 1997.
Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. ieee
transactions on knowledge and data engineering, 2014, 26(1): 97-107.
Han J, Pei J, Kamber M. Data mining: concepts and techniques[M].
Elsevier, 2011.
Perego R, Orlando S, Palmerini P. Enhancing the apriori algorithm
for frequent set counting[C]//International Conference on Data
Warehousing and Knowledge Discovery. Springer Berlin Heidelberg,
2001: 71-82.
Parthasarathy S, Zaki M J, Ogihara M, et al. Incremental and
interactive sequence mining[C]//Proceedings of the eighth
international conference on Information and knowledge management.
ACM, 1999: 251-258.
Chattamvelli R. Data mining algorithms[M]. Alpha science
international, 2011.
458
Документ
Категория
Без категории
Просмотров
3
Размер файла
545 Кб
Теги
2017, scc
1/--страниц
Пожаловаться на содержимое документа