2017 IEEE 14th International Conference on Services Computing The Mining of Activity Dependence Relation based on Business Process Models Guangchang Hu, Budan Wu, Junliang Chen State Key laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications, Beijing, China relations. Therefore, both the design stage and the run stage of business process require the support of dependence relations. Moreover, there are other uses, such as checking the logical consistency of existing models and reasoning new relations, etc. Therefore, the roles of activity dependence relations are as follows: Process recommendation: recommend the next activity, process fragment, or a process in imperative modelling; or recommend a set of relations as constraints in declarative modelling at design stage [10]; Automatic modelling: generate a complete process model automatically at design stage or run stage; Dynamic adaptation: compose a process fragment to handle the ad hoc change at run stage; Dynamic execution: compute and execute the next activity without any process models at run stage; Model detection: check the logical consistency of a process model; Relation reasoning: infer the indirect relations. However, there is no formal description of dependence relations in most business domain, where these relations or business rules exist in the minds of domain experts or in the unstructured documents [11]. It restricts the development of process recommendation, process adaptation, automatic modelling, etc. In order to solve this problem, we propose a method to obtain the dependence relations automatically which are implicit in the process models. This method is different from data mining based on the tables of database [12] and process mining based on the event logs of process instances [13], which is based on the process models of a business domain. We call it dependence relation mining, or relation mining for short, which will generate dependence relations by decomposing the existing process models in a business domain from the control flow dimension. And it can be used as a supplement to process mining. Therefore, the research of relation mining has important significance and practical application value. Fig. 1 shows the main idea of relation mining and the basic uses in process modelling or service orchestration. We separately mine the process models and obtain the dependence relations which are divided by each business domain. With these dependence relations, a set of relations that belong to same domain can orchestrate a new process model of this domain, e.g. Model 1; and a set of relations that belong to different domains can orchestrate a new process model across domains, e.g. Model 2, where the orchestration approach has been proposed in our previous study on process flexibility and adaptation [14]. Abstract—With the development of process recommendation, dynamic adaptation and automatic modeling, the requirement of explicit and formalized expression of activity dependence relation in the business domain is becoming more and more urgent. However, these relations more exist in the minds of domain experts or in the unstructured documents, which leads process modeling and adaptation are a time-consuming and error-prone process. To solve this problem, a relation mining method is proposed for obtaining activity dependence relations. The formal description of these relations is defined in control flow perspective, which is expressed as serial-dependence relations and parallel-dependence relations in the form of three tuples after analyzing all the control flow patterns. And a mining algorithm is proposed for mining these two types of relations based on the process model. The correctness and performance of this algorithm are verified by a large number of experiments, and the experimental results show this method can quickly and accurately extract all the activity dependence relations from the existing process models in a business domain. Keywords-Activity dependence relation; business process management; information system; pattern discovery; process mining; workflow pattern I. INTRODUCTION Nowadays, there exists a growing interest in aligning information systems in a process-oriented way, especially in process-aware information systems (PAISs) [1-4]. In business process management (BPM), a service or business is usually described in the form of business process which defines the context and the logical relationships between activities and also specifies both the order of invocations (control flow) and the rules for data transfer (data flow) [5-7]. The business process is modelled as process model at design stage and executed as process instance at run stage. In the aspect of modeling, when a new business requirement appears, the corresponding process model is modelled to meet this requirement by the domain experts and IT developers in a manual manner. However, manual modelling is a time-consuming, labor-consuming and error-prone way, while recommendation or automatic modelling is a good way which provides the next activity or the whole process based on dependence relations of activities (we call it activity dependence relations, or dependence relations for short). In the aspect of adaptation, when an ad hoc change occurs, the affected process instance needs to be adaptively adjusted at run stage [8, 9]. However, the alternative processes are difficult to be predefined for the ad hoc change, where the changed part can be replaced by a new process fragment which is dynamically combined based on dependence 2474-2473/17 $31.00 © 2017 IEEE DOI 10.1109/SCC.2017.64 450 modifying the ECA rule at run time. However, this mapping is a simple decomposition, where the generated rules are more applicable to local modification of process instances, but it cannot build a complex control flow structure or a complete process model. Bae in [25] proposes a block detection algorithm to detect blocks from a given process model for executing the process model dynamically, which can discover iterative pattern, serial pattern and parallel pattern. These patterns are similar to the activity dependence relations defined in this paper. However, this algorithm only identifies specific structures, and it cannot distinguish between conditional branch and parallel branch. In our previous research on process flexibility and adaptation [14, 26], we found that there was a lack of dependence relations of activities. This seriously affected the efficiency and automation degree of process adaptation. Relation mining can be regarded as another aspect of process mining, which focuses on analyzing the causal dependence relationships among activities in one specific business domain. These relations provide basic support for process recommendation, adaptation and automatic modelling, etc. This paper is organized as follows: section II summarizes the related work, section III gives the definitions of two kinds of dependence relations, section IV proposes the mining algorithm, section V shows the experiments of this algorithm and analyses its performance, and section VI includes conclusions and future work. Domain X Domain Y Domain Z Process model Model 1 Model 2 New process model Dependency relation Figure 1. The diagram of dependence relation mining II. RELATED WORK III. In fact, relation mining is a pattern discovery technique to obtain the dependence relations between activities in a business domain. It is similar to the association rule of data mining and the order relation of process mining. In data mining [15, 16], association rule learning is the most famous pattern discovery technology [17]. The apriori algorithm is used to find the frequent itemsets in the sample, but it does not consider the order of itemsets. Sequence mining analyses the order of itemsets [18], but it can only find the sequential pattern [17, 19]. Episodic mining can find the partial order relations which are defined by some frequent episodic, but there are problems in dealing with concurrency, and it cannot fine the patterns such as selected pattern, loop pattern [20]. Other similar studies are used to analyze the event sequence [21], where the processed object is a standard format or network structure, but they cannot handle the structure containing conditional branch or parallel branch, etc. For example, text mining is used to discover the sequence of letters and words from a text file; DNA sequence analysis is used to determine the order of the four bases from a strand of DNA; web analytics is used to analyze the behavior of visitors from the event logs. In process mining, process discovery extracts process knowledge from the event logs. The algorithm is used to find specific patterns such as sequential-pattern, XORpattern and AND-pattern [22-24], which are identified by defining four order relations. But there is problem in dealing with noise, incompleteness and complex paths, and these four order relations cannot represent OR pattern which can select multiple branches compared to the XOR pattern. Although other improved algorithms reduce the effects of noise and incompleteness, they still cannot obtain all of the control flow patterns and branching conditions, e.g. heredity mining and regional comprehensive analysis. In the similar research based on process model, Jose in [7] proposes a pattern identification method to map a process model to ECA rules, which increases process flexibility by DEPENDENCE RELATION REPRESENTATION In the control flow perspective, there are 20 workflow patterns and 43 workflow control-flow patterns [27-29], which have been widely used in many modeling languages such as BPMN, Petri net, BPEL, jBPM and EPCs, etc [3033]. In order to simplify and standardize the representation of dependence relations, this section further refine these patterns and divide them into 6 workflow meta-patterns (WMP1-WMP6), as shown in Table I. And we use these 6 meta-patterns to define two types of dependence relations: one is serial-dependence relation which describes one-to-one (WMP1-4) relationships; the other is parallel-dependence relation which describes many-to-one (WMP5) and one-tomany (WMP6) relationships. Definition 1. (Serial-dependence Relation, RS) A serial-dependence relation is a tuple RS=(At, Ax, CS), where: At and Ax are two adjacent activities (or events) in logic, where At,AxTE and t,xN*. At and Ax constitute a ordered relation, where <At, Ax>(T×T)(T×E)(E×T). CS is the execution condition of Ax when At is completed. And the value of CS is C0, Ci or Ex which represent ever true, an expression or an event. It is represented as CS=C0CiEx, where CiC, ExE and i,xN*. T, C and E respectively represent activity set, condition set and event set in a business domain. Therefore, the RS describes the serial relationship of two activities which are directly related to each other, where At, Ax and CS are called RS-Former, RS-Latter and RS-Condition. In the BPMN symbol, the RS-Former and RS-Latter of an RS are two adjacent activities which are directly connected by a sequence flow or indirectly connected by multiple sequence flows across one or more gateways, and its RS-Condition is the superposition of all related conditions of sequence flows. 451 TABLE I. THE MAPPING OF WORKFLOW PATTERN, WORKFLOW CONTROL-FLOW PATTERN AND WORKFLOW META-PATTERN Workflow Pattern WP1: Sequence WP2: Parallel Split WP3: Synchronization WP4: Exclusive Choice WP5: Simple Merge WP6: Multiple Choice WP7: Synchronizing Merge WP8: Multiple Merge WP9: Discriminator (include N-out-of-M) WP10: Arbitrary Cycles WP11: Implicit Termination WP12-WP15: Multiple Instance Patterns WP16-WP18: State-based Patterns WP19-WP20: Cancellation Patterns TABLE II. Dependence Relation RS=(At, Ax, CS) (CS=C0CiEx) RP=(At, UA, CP) (CP=CaCb) Workflow Control-flow Pattern WCP1: Sequence WCP2: Parallel Split WCP3: Synchronization WCP33: Generalized AND-Join WCP4: Exclusive Choice WCP5: Simple Merge WCP6: Multi-Choice WCP7: Structured Synchronizing Merge WCP37: Local Synchronizing Merge WCP38: General Synchronizing Merge WCP8: Multi-Merge WCP9: Structured Discriminator WCP28: Blocking Discriminator WCP29: Cancelling Discriminator WCP30: Structured Partial Join WCP31: Blocking Partial Join WCP32: Cancelling Partial Join WCP10: Arbitrary Cycles WCP21: Structured Loop WCP22: Recursion WCP23: Transient Trigger WCP24: Persistent Trigger WCP11: Implicit Termination WCP43: Explicit Termination WCP12-WCP15, WCP34-WCP36, WCP41-WCP42 WCP16-18, WCP39-WCP40 WCP19-WCP20, WCP25-WCP27 Workflow Meta-Pattern WMP1: (At, Ax, C0) WMP1: (At, Ax, C0) and WMP5: (At, UA, Ca), AxUA WMP1: (Ax, At, C0) and WMP6: (At, UA, Cb), AxUA WMP2: (At, Ax, Ci) WMP1: (At, Ax, C0) WMP2: (At, Ax, Ci) or WMP3: (At, Ex, Ex) WMP1: (At, Ax, C0) WMP1: (At, Ax, C0) WMP1: (At, Ax, C0) WMP1: (At, Ax, C0) and WMP2: (Ax, At, Ci); or WMP4: (At, At, Ci) WMP1: (At, Ex, C0) or WMP3: (At, Ex, Ex) WMP1: (At, Ax, C0) WMP1: (At, Ax, C0) and/or WMP2: (At, Ax, Ci) WMP1: (At, Ex, C0) THE MAPPING BETWEEN DEPENDENCE RELATION AND WORKFLOW META-PATTERN Workflow Meta-Pattern WMP1: (At, Ax, C0) WMP2: (At, Ax, Ci) WMP3: (At, Ex, Ex) WMP4: (At, At, Ci) WMP5: (At, UA, Ca) WMP6: (At, UA, Cb) Workflow Pattern WP1-WP3, WP5, WP7-WP20 WP4, WP6, WP10, WP16-WP18 WP6, WP11 WP10 WP2 WP3 BPMN symbol Sequence flow, AND-Split, *-Join XOR/Inclusive/Complex-Split Event-based XOR/Inclusive/Complex-Split AND-Split AND-Join the execution condition. Moreover, if there is an RP (e.g. RP=(At, UA, Ca)) and UA={Ax, Ay}), it must exist n (n=|UA|) RS (e.g. RS=(At, Ax, C0) and RS=(At, Ay, C0)). Table II shows the corresponding relationships between dependence relation, workflow meta-pattern, workflow pattern and BPMN [34]. The "*-Join" represents all the join part of gateways, such as AND-Join, XOR-Join, Inclusive-Join, and Complex-Join. In addition, the notations of different modeling languages can be mapped to each other. Definition 2. (Parallel-dependence Relation, RP) A parallel-dependence relation is a tuple RP=(At, UA, CP), where: At is an activity (or event), where AtTE and tN*. UA is a set of activities which are executed in a parallel way with respect to At. CP is the execution condition and its value is Ca or Cb, which is represented as CP=CaCb. The Ca represents the activities in UA need be executed after At is completed (that is AxUA, meet <At, Ax>). And the Cb represents the activities in UA need to be completed before At is executed (that is AxUA, meet <Ax, At>). T and E respectively represent activity set and event set in a business domain. Therefore, the RP describes the parallel relationship of a set of activities with respect to a particular activity, where At, UA and CP are respectively called RP-Former, RP-Latter and RP-Condition. In the BPMN symbol, the RP-Former and RPLatter of an RP are directly connected by parallel gateway and sequence flows, and its RP-Condition is decided by the AND-Split (CP=Ca) or AND-Join (CP=Cb). The activity dependence relation defines the execution order and condition of business activities in a domain. The different types of dependence relation are distinguished by IV. DEPENDENCE RELATION MINING Normally, a business process is modelled by one or more workflow meta-patterns in the control flow perspective. That is, a process model explicitly or implicitly describes a set of dependence relations. There are a large number of process models have been validated in each business domain, which basically cover the basic or core business of this domain. Therefore, we can obtain these dependence relations in a business domain by mining their process models. This section proposes a method for extracting dependence relations from one process model in the control flow perspective, which is called DRM (Dependence Relation Mining, DRM). Taking the process model defined by BPMN 2.0 as an example, we discuss the DRM algorithm and its two sub-algorithms in this section. 452 A. DRM algorithm The goal of DRM algorithm is to generate the activity dependence relations which have been included in a process model, where its input is a process model. This algorithm includes two sub-algorithms which are SDRM (Serial Dependence Relation Mining, SDRM) sub-algorithm and PDRM (Parallel Dependence Relation Mining, PDRM) subalgorithm. The basic idea of DRM algorithm is to deal with the elements (that is flow objects in BPMN) in a process model one by one, where: if the current element is an activity or event, SDRM will be invoked and generates one or more RS; if the current element is AND-Split or AND-Join, PDRM will be invoked and generates one or more RP. items of this QUEUE one by one until it is empty. The main process for handling the current head item "Lt" is as follows: if Lt is an activity or event, the SDRM sub-algorithm is invoked and generates all the RS with Lt as the RS-Former; if Lt is an AND-Split or AND-Join, the PDRM sub-algorithm is invoked and generates all the RP with Lt as the RP-Former; if Lt is the other type of elements (e.g. other gateways), it is deleted from the QUEUE. Moreover, when the invoked subalgorithm is returned, DRM algorithm will delete the current head item from the QUEUE and begin to deal with the next head item. End DRM algorithm Y Delete the head item and clear all arc labels QUEUE empty? N Algorithm 1: DRM algorithm INPUT: process model (M) OUTPUT: activity dependence relation (RS and RP) Description: Lt: an element in M which is an activity, event or gateway; Flag: a completed mark of RP-Latter; aFlag, bFlag: a completed mark of RP-Former. Algorithm: 1 For each element Lt from M do 2 If (Lt is not a gateway and Lt is not an end event) then 3 Select the arc which starts at Lt and ends at element Lx; 4 Initialize CS=C0; 5 SDRM(Lx); 6 Else if (Lt is an AND-Split or AND-Join) then 7 Initialize UA=, Flag=0, aFlag=bFlag=1; 8 PDRM(Lt); 9 Else if (Lt is an end event) then 10 Continue; 11 End if 12 End for Put all the elements of M in QUEUE Assign the head item of QUEUE to Lt Start DRM algorithm Lt is gateway? PDRM sub-algorithm Y N Y AND-Split/Join? N Output <At, Lx, CS> Push Lt into STACK and Set: At=Lt, CS=C0 Assign the top item of STACK to Lt Push Lx into STACK and Set Lt=Lx Select an unlabeled arc L t Ci L x Set CS=CSCi Y N N Lx is gateway? Exist? Label Lt Ci Lx Set CS=CS+Ci N STACK empty? Y Delete the top item of STACK SDRM Y Figure 2. The flowchart of DRM algorithm As shown in algorithm 1, DRM algorithm is mainly to invoke the corresponding sub-algorithms by identifying the type of elements. Since there is only one arc (that is sequence flow in BPMN) out of each activity or event (but not end event), we just need to select this arc when the current element is an activity, start event or middle event. In all types of gateways, other gateways are ignored except for the parallel gateway which will generate RP. Moreover, this algorithm implements a series of initialization before invoking the sub-algorithms, where the RS-Condition is initialized to C0 before invoking SDRM sub-algorithm and the RP-Latter is initialized to (empty set) before invoking PDRM sub-algorithm. Because AND-Split/Join has more than one branch, we need to set up a flag "Flag" to mark whether all the branches have been accessed. For example, "Flag==1" represents all the arcs starting at the current element have been accessed. Although gateways and activities are often placed alternately in a process model, but there is a direct connection between the AND-Split/Join and other gateways. So, we need to set up two flags, "aFlag" and "bFlag", to mark whether all the RP-Formers have been found. For example, "aFlag==1" represents all the precursors (activities) of AND-Split have been found, and "bFlag==1" represents all the successors (activities) of AND-Join have been found. The flowchart of DRM algorithm is shown in Fig. 2. Suppose we use a queue "QUEUE" to store all the elements of a process model. So, DRM algorithm deals with the head The detailed process of SDRM sub-algorithm is shown in the lower dashed boxes of Fig. 2. We use a stack "STACK" to store Lt and other elements which are connected by those arcs that start at Lt. In the SDRM subalgorithm, the RS-Condition need to be generated in addition to search for the RS-Latter. Because an RS may involve one or more arcs (e.g. across multiple XOR-Split), the RSCondition is the superposition of all the conditions of relevant arcs (e.g. the conditions of sequence flows). Moreover, because one Lt may generate more than one RS, the RS-Condition of different RS which has been superposed may require rollback. We use "+" and "" to represent the superposition operation and rollback operation, where: Superposition: C0+C0C0, C0+CiCi, Ci+CjCi&&Cj; Rollback: C0C0C0, CiC0Ci, Ci&&CjCjCi. Moreover, we have not given the detailed process of the PDRM sub-algorithm in Fig. 2, which is similar to the SDRM sub-algorithm except for dealing with the RPConditions. According to the different types of Lt, the process of PDRM sub-algorithm is divided into two cases. When Lt is AND-Split, this sub-algorithm searches the RPLatter in the direction of the relevant arcs and searches the RP-Former in the opposite direction of the relevant arcs, and the RP-Condition is Ca. When Lt is AND-Join, this subalgorithm searches the RP-Latter in the opposite direction of the relevant arcs and searches the RP-Former in the direction of the relevant arcs, and the RP-Condition is Cb. 453 B. DRM sub-algorithm As shown in algorithm 2, SDRM sub-algorithm is mainly to search for each RS-Latter and the corresponding RSCondition according to the current head item of QUEUE which is the RS-Former. For each head item Lt (Lt is an activity or event), a multi-tree is formed by Lt and other related elements which are connected by Lt. The root node is Lt in this multi-tree. The leaf nodes are activities or events, and the intermediate nodes are other types of elements. As shown in algorithm 3, PDRM sub-algorithm is mainly to search for each RS-Latter and RS-Former according to the current head item of QUEUE which is an AND-Split/Join. For each head item Lt (Lt is an AND-Split or AND-Join), two multi-trees are formed by Lt and other related elements which are connected by Lt. The root node is Lt in these two multi-trees. The leaf nodes are activities or events, and the intermediate nodes are other types of elements. Algorithm 3: PDRM Sub-algorithm Description: Flag==1: RP-Latter search has been completed; aFlag==1: RP-Former search has been completed from AND-Split; bFlag==1: RP-Former search has been completed from AND-Join. Algorithm: 1 If ((Lt is AND-Split)||(aFlag==0)) then 2 If (Flag==0) then 3 For each element Ly which has an arc from Lt to Ly do 4 If (Ly is AND-Split) then 5 PDRM(Ly); 6 Else if (Ly is not a gateway) then 7 Add Ly into UA; 8 End if 9 End for 10 Set Flag=1; 11 Else if (aFlag==0) then 12 For each element Lz which has an arc from Lz to Lt do 13 If (Lz is a gateway and Lz is not AND-Split) then 14 aFlag=0; PDRM(Lz); aFlag=1; 15 Else if (Lz is not a Gateway) then 16 Generate RP=(Lz, UA, Ca); 17 End if 18 End for 19 End if 20 Else if ((Lt is AND-Join)||(bFlag==0)) then 21 If (Flag==0) then 22 For each element Ly which has an arc from Ly to Lt do 23 If (Ly is AND-Join) then 24 PDRM(Ly); 25 Else if (Ly is not a gateway) then 26 Add Ly into UA; 27 End if 28 End for 29 Set Flag=1; 30 Else if (bFlag==0) then 31 For each element Lz which has an arc from Lt to Lz do 32 If (Lz is a gateway and Lz is not AND-Join) then 33 bFlag=0; PDRM(Lz); bFlag=1; 34 Else if (Lz is not a gateway) then 35 Generate RP=(Lz, UA, Cb); 36 End if 37 End for 38 End if 39 End if Algorithm 2: SDRM Sub-algorithm Description: CS=CS+Ci: superposition of the condition of the current arc; CS=CSCi: rollback of the condition of the current arc. Algorithm: 1 If (Lx is a gateway) then 2 For each element Ly which has an arc from Lx to Ly with Ci do 3 If (Lx is an event-based gateway) then 4 Set Ci=Ly; 5 End if 6 Set CS=CS+Ci; 7 SDRM(Ly); 8 Set CS=CSCi; 9 End for 10 Else 11 Generate RS=(Lt, Lx, CS); 12 End if The basic idea of SDRM sub-algorithm is to search the multi-tree in a depth first search manner, and each path from the root node to the leaf node can generate an RS, where: the root node and leaf node are the RS-Former and RS-Latter, and the superposition of all the conditions of edges in this path is the RS-Condition. For example, there are three RS which are (At, Ax, Ci), (At, Ay, Cj&&Ck) and (At, Az, Cj&&Cp) in Fig. 3, where the head item Lt is At and two XOR-Split correspond to Am and An in the multi-tree. Ci At Ax Ck At Am Search direction An Am Ay Ci Cj Cj Ax Cp Az Process model An Ck Multi-tree Cp Ay Az Figure 3. The corresponding relationship about XOR-Split Search direction At Am Ap An Ax At Am Ay Az Process model Ap An Ax Ay Az The basic idea of PDRM sub-algorithm is to search these two multi-trees separately in a depth first search manner, and all the leaf nodes of one tree and each leaf node of another tree can generate an RP, where the former is RS-Latter, the latter is RS-Former. When Lt is AND-Split, it searches for all leaf nodes from Lt according to the direction of edges, and it searches for each leaf node from Lt according to the opposite direction of edges. When Lt is AND-Join, it searches for all leaf nodes from Lt according to the opposite direction of edges, and it searches for each leaf node from Lt according to the direction of edges. For example, there are four RP which are (At, {Ax, Ay, Az}, Ca), (Ap, {Ax, Ay, Az}, Ca), (At, {Ax, Ay, Az}, Cb), (Ap, {Ax, Ay, Az}, Cb) in Fig. 4 and Fig. 5, where An Multi-tree Figure 4. The corresponding relationship about AND-Split Ax Ax An At Am Ap Ay Az Process model Ay An Az Am An Search direction Multi-tree At Ap Figure 5. The corresponding relationship about AND-Join 454 the head item Lt is An and the XOR-Join correspond to Am in these multi-trees. Generally, there exist two or more gateways that are directly connected in a nested way. When these gateways are the same type, direct nesting of multiple gateways is equivalent to a single gateway which has the same number of branches. The PDRM sub-algorithm can handle these structures, where it only needs to handle the outermost AND-Split/Join. Meanwhile, when these gateways are the different type, direct nesting of multiple gateways can transform the structures which have the equivalent effects. The PDRM sub-algorithm can handle these structures by simply modifying. Therefore, this algorithm can easily deal with complex nested structures. Moreover, two gateways are in pairs except for eventbased gateway in a standard model structure. However, there are some non-standard cases in the existing process model, which are not affect the description and execution of process model. For example, there is a XOR-Split but no XOR-Join, or there is multiple XOR-Split but only one XOR-Join in a process model. The DRM algorithm can be applied to these non-standard structures. Initial treatment & operation planning (T8), Operative treatment (T9); Clinical Suspicion of Cruciate Rupture = no (C1), Clinical Suspicion of Cruciate Rupture = yes (C2), Cruciate Rupture = no or Operation Indicated = no (C3), Cruciate Rupture = yes and Operation Indicated = yes (C4). <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi="http://www.omg.org/spec... <process id="PROCESS_1" isClosed="false" isExecutable="true" processType="None"> <task completionQuantity="1" id="_2" isForCompensation="false" name="A2" startQuantity="1"> <incoming>_21</incoming> <outgoing>_23</outgoing> </task> <exclusiveGateway gatewayDirection="Diverging" id="_12" name="XOR-Split 1"> </exclusiveGateway> <parallelGateway gatewayDirection="Converging" id="_14" name="AND-Join 3"> </parallelGateway> <startEvent id="_10" isInterrupting="true" name="Es" parallelMultiple="false"> </startEvent> <endEvent id="_11" name="Ee"> </endEvent> <sequenceFlow id="_21" name="Condition 1" sourceRef="_12" targetRef="_2"> <conditionExpression><![CDATA[C1]]></conditionExpression> </sequenceFlow> </process> XML file </definitions> C1 T2 T1 C2 Ee C4 T6 A10 0 A1 0 A12 2 A13 0 0 0 The experiments of dependence relation mining are mainly used to analyze the number of dependence relations and the time of mining process, where the former and the latter are called relation number and mining time for short. In this section, we first give the mapping method of XML parsing before verifying the DRM algorithm. Then, we verify the correctness of DRM algorithm by mining the process model of cruciate rupture treatment [35] and compare the relation number and mining time of different process models by mining 20 process models [36] provided by OMG organization. And last, we analyze the complexity of DRM algorithm and evaluate the performance of DRM algorithms based on the process models with different scales. T8 T9 BPD view 0 A2 DRM EXPERIMENT T3 T7 T5 1 V. C3 T4 Es A4 A5 0 0 A6 0 A17 3 A14 0 A3 0 A11 0 0 A7 0 A16 A15 0 4 A8 0 A9 Direct graph Figure 6. The XML parsing of treatment model Taking this model as an example, the mapping between XML file, BPD view and directed graph is shown in Fig. 6. The mapping relations are as follows: Tag "<task>" represents this element is an activity, which is corresponding to A2 in the direct graph. All the elements are uniquely identified by the attribute "id", and there is a one-to-one correspondence between element ID and vertex ID. Tag "<startEvent>" and "<endEvent>" represent the start and end event of process model, which respectively correspond to A10 and A11 in the direct graph. Tag "<exclusiveGateway>" and "<parallelGateway>" represent the exclusive and parallel gateway. The split part and join part are distinguished by the attribute "gatewayDirection". In the directed graph, these vertexes which are corresponding to gateways include A12 (XORSplit), A14 (AND-Join), and A17 (XOR-Join), etc. Tag "<sequenceFlow>" represents the arc between elements, and its start point and end point are given by the attribute "sourceRef" and "targetRef". If the "sourceRef" is XOR-Split, Inclusive-Split or ComplexSplit, the arc has tag "<conditionExpression>" which gives the execution condition. And this condition is corresponding to the weight of edge in the directed graph. For example, the weight of the edge <A12, A2> is 1. Moreover, when the condition of arc is C0, the weight of edge is 0 in the direct graph. A. XML parsing In our experiments, the process models are defined by BPMN notation. In BPMN 2.0, process model is a formal model with the ".bpmn" or ".bpmn2" extension which is described in XML format (XML file) and displayed in a graphical way (BPD view). We need to parse the XML file of one process model before executing the DRM algorithm. The purpose of XML parsing is to extract all the elements (flow objects), arcs (sequence flows) as well as their conditions in the process model and map them into a directed graph. These elements and arcs are mapped to the vertexes and edges of a directed graph, and the condition of each arc is mapped to the weight of the corresponding edge. Therefore, the DRM algorithm will generate all the activity dependence relations of one process model by traversing the directed graph corresponding to this process model. The events, activities and execution conditions included in treatment model are Patient admission (Es), Patient discharge (Ee); Anamnesis & clinical examination (T1), Conventional therapy (T2), Documentation (T3), Sonography (T4), X-ray (T5), MRT (T6), Non-operative therapy (T7), B. Comparison on mining different models By analyzing the treatment model in Fig. 6, there are 17 elements and 19 activity dependence relations which contain 16 RS and 3 RP. In order to verify the correctness of DRM 455 algorithm, we perform the DRM algorithm on this model. The experiment result is shown in Fig. 7, and it is consistent with the analysis results of this model. Therefore, this algorithm can accurately extract activity dependence relations from process models. time" and "PDRM time". The experiment results show that the "Relation number" and "Mining time" generally increase with the "Element number", and the "RS number" accounts for most of the "Relation number", where the "SDRM time" accounts for most of the "DRM time". Through analysis of these 21 models and experiment results, the average number of activity dependence relations generated by one process model is 19.52, where the average number of elements is 20.05. And the average time of mining one process model is 30.45ms, where the average time of XML parsing and DRM algorithm are 20.36ms and 10.09ms. Moreover, the relation number and mining time of the treatment model (M11) are 19 and 28.53ms, where the time of DRM algorithm is 9.41ms. Therefore, for most of the process models, mining one process model will consume about 30ms, and the relation number is approximately equal to the element number in the model. There are a few inconsistent cases in Fig. 8. For example, the element number of M17 is greater than M16, but both the relation number and mining time of M17 is less than M16. This is because there are other factors affecting DRM results in addition to the element number, such as element type and logical structure. In the following section, we will verify the performance of DRM algorithm by comparing different element number, element type and logical structure in one process model. The activity dependence relations of process model “cruciate rupture treatment” in business domain “Medical field”: Serial-dependency Relation (RS): (Es, T1, C0), (T1, T2, C1), (T1, T4, C2), (T1, T5, C2), (T1, T6, C2), (T2, T3, C0), (T4, T7, C3), (T4, T8, C4), (T5, T7, C3), (T5, T8, C4), (T6, T7, C3), (T6, T8, C4), (T7, T3, C0), (T8, T9, C0), (T9, T3, C0), (T3, Ee, C0). Parallel-dependency Relation (RP): (T1, {T4, T5, T6}, Ca); (T7, {T4, T5, T6}, Cb), (T8, {T4, T5, T6}, Cb). Figure 7. The DRM result of treatment model In addition to the treatment model, we choose other 20 process models to compare the relation number and mining time among different process models, which are provided by the OMG and BPI [36]. These 21 models are numbered as M1-M21 according to the number of elements contained in each model. Among them, the smallest model (M1) and largest model (M21) contains 11 and 50 elements, and the ID of treatment model (containing 17 elements) is M11. We use MATLAB R2014b to carry out DRM experiments on these 21 process models, where the experiment environment is the dual core 2.20GHz, 4G memory, and 64 bit system. C. DRM algorithm performance Before verifying the performance of DRM algorithm, we first analyse the time complexity and space complexity of this algorithm. In fact, the DRM algorithm adopts the idea of depth first search, and each element and its connected elements are considered to be a multi-tree. According to the complexity of depth first search [37], the time complexity and space complexity of DRM algorithm are O(nbl) and O(nbl), where: n is the element number of process model (also the number of multi-trees); b and l are the maximum branch number and leaf depth of all multi-trees (the size of multi-tree). In particular, the time complexity and space complexity of DRM algorithm are O(n) when the process model does not have any branch structure (that is b=l=1), where this process model has (n1) RS and each RSCondition is C0. In order to evaluate the performance of DRM algorithm, we design a series of model samples with different scales for experiment, where these samples involve the different number, type and logical structure of elements. Fig. 9 shows five basic structures which are formed by one or more workflow meta-patterns (that is activity dependence relation), where: x is the branch number of gateway; m is the repeated times of one or a group of workflow meta-patterns, |RS| and |RP| are the number of RS and RP included in the structure. We use different values of m and x to form multiple model samples for verifying the performance of DRM algorithm, which are shown in Table III. When comparing one of these parameters, the other parameters are set to a fixed value, where the element number of the largest sample is 100 (nmax=100). Figure 8. The comparison experiment of these 21 process models Fig. 8 shows the parameters of these 21 models and experiment results, where the abscissa is the process model ID and the ordinate is the number or time. The "Element number" represents the number of elements in each model, which contains "Activity number" and "Gateway number". The "Relation number" represents the number of activity dependence relations which contains the "RS number" and "RP number". The "Mining time" represents the time of the whole mining process which contains the "XML time" and "DRM time". And the "DRM time" is divided into "SDRM 456 TABLE III. Model SM1 SM2 SM3 SM4 SM5 THE MODEL SAMPLES WITH DIFFERENT SCALES mmin 1 2 2 Type WMP1 WMP2A WMP2B WMP5A WMP5B mmax 99 33 33 xmin 2 2 2 2 xmax 98 2 98 2 nmin 2 4 7 4 7 In addition to the fast mining time, the DRM method can handle process models with standard, non-standard and complex nested structure. This method generates the activity dependence relations of a business domain which has a wide application in process modelling, such as recommendation, dynamic adaptation, automatic modelling and so on. Table IV shows the difference between this method and other methods which are mentioned in Section II. Therefore, the DRM method can effectively solve the problem of lack of relations among business activities in each domain. nmax 100 100 100 100 100 Fig. 10 shows the trends of model samples with different scales, where the abscissa is the value of m or x and the ordinate is the algorithm time. The experiment results show that the XML time and DRM time increase linearly and nonlinear with the increase of sample scale. XML time is mainly related to the number of elements and execution conditions, but not with the type or structure of elements. DRM time is related to the number, type and structure of elements, where the DRM time of parallel structure is longer due to the existence of PDRM time. Although DRM time will increase with the increase of the scale of process model, the DRM algorithm can complete the mining of a process model in a short time (millisecond) for most of the process models. VI. D. Comparison with relevant methods Generally, a business domain usually contains multiple process models. For example, medical treatment domain has 98 models, and automobile manufacturing domain has 49 models [38, 39]. It will take a longer time for mining all the process models for a domain. Take multimedia conference system [40, 41] as an example, it has 30 process models which include 60 events and 51 activities. The experiment results show the total relation number is 228, where the RS number and RP number are 226 and 2. And the total mining time is 449.67ms, where XML time and DRM time are 321.76ms and 127.91ms. Although the number of process models varies in different domains, the total mining time may range from a few milliseconds to a few seconds. C1 A1 A2 An At m=1,2,... C2 Cx C1 A1 A2 At Cx Ax (a) C0: |RS|=m; b=l=1, n=m+1. (b) Cx: |RS|=x; b=x, l=2, n=x+2. C2 A1 ACKNOWLEDGMENT This research is supported by the National 973 Programs (Grant No.2013CB329102), the National Natural Science Foundation of China (Grant No. 61003067, 61132001); Project of New Generation Broad band Wireless Network under Grant No. 2014ZX03006003; and the technology development and experiment of innovative network architecture (CNGI-12-03-007). C1’ C2 ’ A2 Ax CONCLUSION Activity dependence relations are the foundation and precondition for many business applications. In this paper, we propose a method to obtain this kind of relations from the existing process models in a business domain. This method can quickly and accurately extract these relations from one process model. However, some relations generated from different models in the same domain may be redundant, inconsistent or conflicting, which requires a set of strategies to manage these relations in a domain. In the future work, we will present a set of management strategies and develop a relation mining and management tool. And we will further analyze the relations of multimedia conference system to provide an automated recommendation method for process modeling and process flexibility. Cx’ m=2,3,... A 1’ A 2’ At A2 Ax’ (c) Cx: |RS|=mx; b=x, l=2, n=m(x+1)+1. C1 A1 At Ax C2 Cx A1 A2 Ax C 1’ C2 ’ Cx’ m=2,3,... A 1’ A 2’ Ax’ (d) C0: |RP|=1; b=x, l=2, n=x+2. (e) C0: |RS|=mx, |RP|=m; b=x, l=2, n=m(x+1)+1. Figure 9. The various models with different scales Figure 10. The DRM time of model samples with different scales TABLE IV. Relevant method Dependence relation mining Association rule learning Process discovery Pattern identification Block detection THE COMPARISON WITH RELEVANT METHODS Processing object Process models in a domain Tables in database Process instance in event logs One process model One process model Objective Mining activity relation Mining frequent itemsets Discovering process model Mapping ECA rule Detecting model structure 457 Application Process modelling Behavior analysis Model discovery Process changes Automatic execution Reusability All models of a domain None None All instances of the model All instances of the model REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] WMPvd A. Process Mining Discovery, Conformance and Enhancement of Business Processes[J]. 2010. [22] Van Dongen B F, De Medeiros A K A, Wen L. Process mining: Overview and outlook of petri net discovery algorithms[M]// Transactions on Petri Nets and Other Models of Concurrency II. Springer Berlin Heidelberg, 2009: 225-242. [23] Van der Aalst W, Weijters T, Maruster L. Workflow mining: Discovering process models from event logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9): 1128-1142. [24] van der Aalst W M P, Stahl C. Modeling business processes: a petri net-oriented approach[M]. MIT press, 2011. [25] Bae J, Bae H, Kang S H, et al. Automatic control of workflow processes using ECA rules[J]. IEEE transactions on knowledge and data engineering, 2004, 16(8): 1010-1023. [26] Hu G, Wu B, Zou H, et al. Dynamic integration mechanism of business process for Internet of Things[C]//Services Computing (SCC), 2014 IEEE International Conference on. IEEE, 2014: 852-853. [27] van Der Aalst W M P, Ter Hofstede A H M, Kiepuszewski B, et al. Workflow patterns[J]. Distributed and parallel databases, 2003, 14(1): 5-51. [28] Börger E. Approaches to modeling business processes: a critical analysis of BPMN, workflow patterns and YAWL[J]. Software & Systems Modeling, 2012, 11(3): 305-318. [29] Russell N, Ter Hofstede A H M, Van Der Aalst W M P, et al. Workflow control-flow patterns: A revised view[J]. BPM Center Report BPM-06-22, BPMcenter. org, 2006: 06-22. [30] Curtis B, Kellner M I, Over J. Process modeling[J]. Communications of the ACM, 1992, 35(9): 75-90. [31] Dijkman R M, Dumas M, Ouyang C. Semantics and analysis of business process models in BPMN[J]. Information and Software technology, 2008, 50(12): 1281-1294. [32] Papazoglou M. Web services: principles and technology[M]. Pearson Education, 2008. [33] Allweyer T. BPMN 2.0: introduction to the standard for business process modeling[M]. BoD–Books on Demand, 2016. [34] Notation (BPMN) Poster[J]. Faculty of Electrical Engineering and Computer Science, Institute of Informatics, University of Maribor,[Available at http://www. itposter. net/itPosters/bpmn/bpmn. htm](January 1st 2009), 2007. [35] Lenz R, Reichert M. IT support for healthcare processes–premises, challenges, perspectives[J]. Data & Knowledge Engineering, 2007,61(1):39-58. [36] BPI. Business Process Model[EB/OL]. https://www.businessprocess incubator.com /category/type/templates/. [37] Russell S J, Norvig P. Artificial intelligence: a modern approach (The Third Edition)[J]. 2012. [38] Weber B, Reichert M, Rinderle-ma S. Change Patterns and Change Support Features: Enhancing Flexibility in ProcessAwareInformation Systems[J]. Data & Knowledge Engineering, 2008,66(3):438-466. [39] Weber B, Sadiq S W, Reichert M. Beyond rigidity - dynamic process lifecycle support[J]. Computer Science - Research and Development, 2009,23(2):47-65. [40] Bo C, Jie G, Junliang C, et al. A preliminary practice for BPEL based multimedia conference web services orchestration[C]. Telecommunications, 2008. AICT'08. Fourth Advanced International Conference on, 2008. IEEE. [41] Bo C, Junliang C, Min D. Petri net based formal analysis for multimedia conferencing services orchestration[J]. Expert Systems with Applications, 2012,39(1):696-705. Dumas M, Van der Aalst W M, Ter Hofstede A H. Process-aware information systems: bridging people and software through process technology[M]. John Wiley & Sons, 2005. Reichert M, Weber B. Process-Aware Information Systems[M]. Enabling Flexibility in Process-Aware Information Systems. Springer Berlin Heidelberg, 2012: 9-42. Weske M, Hündling D W I J, Schuschel D W I H. Process-oriented Information Systems[J]. Reichert M, Rinderle-Ma S, Dadam P. Flexibility in process-aware information systems[M]//Transactions on Petri Nets and Other Models of Concurrency II. Springer Berlin Heidelberg, 2009: 115135. Van Der Aalst W M P, Ter Hofstede A H M, Weske M. Business process management: A survey[C]//International conference on business process management. Springer Berlin Heidelberg, 2003: 112. Weske M. Business process management architectures[M]//Business Process Management. Springer Berlin Heidelberg, 2012: 333-371. Bernal J F M, Morisio M. Towards a dynamic rule-based business process[J]. International Journal of Web and Grid Services, 2010, 6(4): 385-399. Van Der Aalst W M P. Business process management: a comprehensive survey[J]. ISRN Software Engineering, 2013, 2013. Rinderle S, Reichert M, Dadam P. Correctness criteria for dynamic changes in workflow systems - a survey.[J]. Data Knowl. Eng., 2004,50(1):9-34. De Giacomo G, Dumas M, Maggi F M, et al. Declarative process modeling in BPMN[C]//International Conference on Advanced Information Systems Engineering. Springer International Publishing, 2015: 84-100. Zur Muehlen M, Indulska M, Kamp G. Business process and business rule modeling languages for compliance management: a representational analysis[C]//Tutorials, posters, panels and industrial contributions at the 26th international conference on Conceptual modeling-Volume 83. Australian Computer Society, Inc., 2007: 127132. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases[J]. AI magazine, 1996, 17(3): 37. van der Aalst W M P, Reijers H A, Weijters A J M M, et al. Business process mining: An industrial application[J]. Information Systems, 2007, 32(5): 713-732. Hu G, Wu B, Chen J. Dynamic adaptation of business process based on context changes: a rule-oriented approach[C]//International Conference on Service-Oriented Computing. Springer International Publishing, 2013: 492-504. Berry M J, Linoff G. Data mining techniques: for marketing, sales, and customer support[M]. John Wiley & Sons, Inc., 1997. Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. ieee transactions on knowledge and data engineering, 2014, 26(1): 97-107. Han J, Pei J, Kamber M. Data mining: concepts and techniques[M]. Elsevier, 2011. Perego R, Orlando S, Palmerini P. Enhancing the apriori algorithm for frequent set counting[C]//International Conference on Data Warehousing and Knowledge Discovery. Springer Berlin Heidelberg, 2001: 71-82. Parthasarathy S, Zaki M J, Ogihara M, et al. Incremental and interactive sequence mining[C]//Proceedings of the eighth international conference on Information and knowledge management. ACM, 1999: 251-258. Chattamvelli R. Data mining algorithms[M]. Alpha science international, 2011. 458

1/--страниц