2017 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) Data Mining in the Diagnostics of Oil Extraction Equipment Tagirova K.F., Vulfin A.M., Ramazanov A.R. Department of Computer Science and Robotics Ufa State Aviation Technical University Russia, Ufa Abstract—We present the key steps in the dynamogram classification algorithm development. These are data processing, procedures of generation and selection of features, constructing of a neural network classifier and estimation of its work quality. To estimate the possibility to single out complex defects (subclasses), we analyzed the structure of the input pattern sample with the aid of clusterization algorithms. Possibilities of the diagnostic algorithm for submersible equipment on the basis of the pretreated dynamometer cards recognition that characterize the current object state were explored. The key steps of the algorithm implementation for the dynamometer card classification are: feature generation, feature selection and classifiers building. The classifiers based on single neural networks and hierarchic neural network committees with different architectures have demonstrated the accuracy of recognition of the equipment condition classes at the level of 8094%. The reliability of recognition for the equipment condition classes is at the level of 78-98% for the data that were not included in the training set. Keywords—submersible equipment; neural network classifiers; data mining I. INTRODUCTION The most common method of oil production from marginal wells is the use of a downhole sucker-rod pump (DSRP). The work of the majority of wells equipped with DSRP is controlled by portable and stationary dynamographs. DSRP oil production diagnostics is often based on dynamometer card analysis . Analysis of the dependence of F force at the rod string point of suspension on polished rod stroke S allows classifying equipment condition into classes of normal duty or malfunction by using F(S) signature and set of features. Closed curve of dynamometer card contains sections describing each phase of polished rod stroke movement. Diagnosis algorithms of aggregate current status classification can operate the dynamometer card entirely by applying feature generation and feature selection, or they can analyze separate parts and feature points of dynamometer card . A generalized approach of the development of the algorithm of DSRP submersible equipment diagnostics according to dynamometry data by using the data mining methodology is considered in previous communications [1-8]. These methods are based on recognition of pretreated dynamometer cards using single neural networks and hierarchic neural networks committees with different architectures with the validity of equipment condition class recognition at the level of 80-94%. II. DYNAMOMETRY DATA ANALYSIS AND SELECTION OF ESSENTIAL DYNAMOMETER CARD FEATURES Previous paper  discusses possibilities of the neural network classifiers based on multilayer perceptron. These classifiers use as an input vector following parameters: • Nɋ1 –discrete counts of force and stroke parameter ( ) taken regularly in quantity of 16, 32, 64, x = F d sd 128, 256, 512. It was implemented a normalization of the counts to obtain s d , F d , that describe the dynamometer card, in a range s n ∈ [ −1;1] , F n ∈ [ −1;1] , spline interpolation of the normalized counts suite and the following sampling to build the suite s d , F d with the l = 2k , k = 4, 5, " , 9 length. • Nɋ2 – counts ( ( )) taken as a ( ( )) taken as a h = haar F d s d concatenation of approximation and specification vectors for several levels of discrete Haar wavelettransform decomposition. • Nɋ3 – counts db = db4 F d s d concatenation of approximation and specification vectors for all levels with using a first order of Daubechies wavelet. ( ) using • Nɋ4 – counts taken from the vector x = F d s d principal component analysis. • Nɋ5 – counts taken using principal counts taken from using principal analysis. • Nɋ6 – d ( ( s )) db = db4 F analysis. from ( ( s )) h = haar F d d d the vector component the vector component Guided by the classification results, according to the procedure of sliding window with 10 runs, it is found that the procedure of feature generation using Haar wavelet-transform is the most suitable for building a classifier: it uses a vector of 978-1-5090-5648-417$31.00 ©2017 IEEE 2017 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) 128 approximation and specification coefficients for several levels of decomposition . Original multilayer perceptron NC2 (with the 128-32-8 neurons architecture) can be replaced by the massively smaller network with 8-8-8 neurons architecture that significantly accelerates training process and increases classifier generality because the amount of training weights is decreasing. and interpolating by b-spline to suppress highfrequency noise related to the work of the sensors of stroke movement and force; and preparing data to obtain the set of equidistant counts. 3) Obtaining the required amount of discrete counts s d , F d at regular time intervals with a period l = 2k , k = 4, 5, " , 9 . 4) Expanding the curves F(t) and S(t) in time for the counts sets independent analysis (Fig. 1). 5) Multilayer discrete decomposition of the set of counts x = F d ( t ) on the basis of Haar wavelet-functions Hierarchic neural network classifiers based on the committees of multilayer perceptrons were discussed in  and showed the same results. Implementation of the proposed diagnosis algorithms as a part of hardware-software DSRP control station complex assumes limited computational resources of the controller, so it is more preferable to implement the single net. forms the coefficient vector ª¬ca j , cd j , cd j −1 , " , cd1 º¼ , where ca, cd – approximation and specification coefficients respectively to the levels j, j-1,…, which is required to generate dynamometer card multipurpose features related to the different classes [3, 10]. As shown in previous researches, such approach allows obtaining the set of features that increase the classifier accuracy of the test set unlike the features, which were obtained as a result of Fourier transform. Estimated quality of signal approximation is based on ɫaj coefficients and amount of the ca, cd nonzero coefficients, and can be analyzed visually from the given chart. As a result, there is a vector of features from which the most informative features have to be selected using a principal component analysis. 6) Selecting the most important features from the vector of Haar wavelet-coefficients using principal component analysis (PCA). Variance of the first ten principal components is more than 90% of total variance so the 8-10 principal components are enough for compact description of dynamometer cards and build the classifier. Selection the necessary principal components number and parameters of the neural network classifier is based on multilayer perceptron, that is specified in [2, 3, 11, 12]. Original multilayer perceptron with the input dimension of 128 is replaced by the network taking the input vector of 8 features. 7) Based on the results of applying PCA the feature vector is made in a new space of fewer dimensions for the following classifier training. Thus, the procedure of selecting parameters of feature generation and feature selection (wavelet decomposition for feature generation, the type of wavelet, the depth of decomposition, coefficients that encodes essential dynamometer card features, and principal component analysis for feature selection) described completely. There is still the question of the classifier type. Multilayer neural networks were used in [3, 9] and amount of hidden layers and neurons was substantiated but there wasn’t any comparison with the other types of classifiers. In this paper an analysis of possibilities of the classifiers based on support vector machine ensemble and decision trees ensemble has been preformed. The problem of submersible equipment diagnostics was formulated in  as a problem of classification of the patterns described with the aggregate of process variables (dynamometry data). Each class characterizes the equipment condition as a normal duty or one of malfunctions. Generalized approach of development the submersible equipment diagnosis methods and algorithms is based on data mining methodology . Thus, the process of dynamometer card pretreatment and classifier building consists of the following steps: 1) Scaling the dependences of rod stroke movement and force to the range s n ∈ [ −1;1] , F n ∈ [ −1;1] . 2) Filtering the normalized counts by using median filter S F 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 20 40 60 80 100 a) 120 140 160 180 200 0 0 50 100 150 200 250 300 n n b) Fig. 1. a) normalized polished rod stroke dependence on time S(n); b) normalized force at the rod string point of suspension dependence on time F(n) 2017 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) Therefore, main steps of implementation the algorithm of dynamometer card classification are: estimations should be found to make k separations of the available samples amount in a training set and a test set . 1) Feature generation – selecting features the most informative describing the signal. 2) Feature selection – locating features which have the best classification characteristic for the set of dynamometer cards samples. 3) Building the classifier. The generalizing possibilities of classifiers were compared using counts as an input vector which was selected by using Classifiers are based on single neural networks and hierarchic neural network committees with different architectures. They have demonstrated the accuracy of recognition of the equipment condition classes at the level of 80-94% [1-3]. A. Experiment Equipment condition classes took into account small, medium and extra datasets and are shown in Tables 1, 2 and 3 respectively. TABLE 1. Equipment condition classes in small training dataset ʋ Equipment condition class 1 2 3 4 5 Gas influence High plunger fit Plunger extension out of the pump Degree of dirt-choking pump Significant friction forces Total: Number of samples 32 51 21 52 34 190 TABLE 2. Equipment condition classes in medium training dataset ʋ 1 2 3 4 5 6 Equipment condition class Gas influence High plunger fit Plunger extension out of the pump Degree of dirt-choking pump Plunger jamming Significant friction forces Total: Number of samples 32 24 21 52 51 34 214 TABLE 3. Equipment condition classes in extra training dataset ʋ Equipment condition class 1 2 3 Large paraffin deposit Gas influence High plunger fit Number of samples 21 51 24 4 5 6 7 8 Plunger extension out of the pump Degree of dirt-choking pump Plunger jamming Significant friction forces Low level, no pumping 21 32 25 54 52 Total: 280 Three dynamometer cards sets were used for testing. Each class contains more than 20 samples. This increases the classes representability and the general set parameters. Against  there is no asymmetry in samples number . K-fold cross-validation was used to estimate the generalizing abilities of classifiers. An average value of the classification accuracy and standard deviation of the obtained PCA from ( ( )) h = haar F d s d – composition of the approximation and specification vectors for all levels of discrete Haar wavelet-transform decomposition [14, 15, 16, 17]. • Neural network classifier (NK1) – perceptron [2, 18]. • Support vector machine ensemble (MK2); • Decision trees ensemble (DK3). multilayer B. Support vector machine in the problem of classification into several classes Single support vector machine solves the problem of building an optimal separating hyperplane in case of binary classification. The one-vs-all type classifier can be built using an ensemble of binary classifiers. Single SVM separates samples into sets of belonging to class i (output feature encoding by 1) or not belonging to this class (-1). Matrix is shown in table 4. It was built and analyzed to obtain the result of sample classification. TABLE 4. Matrix of ensemble classifiers output values Sample Classifier 1 Classifier 2 Classifier i Classifier m 1 -1 -1 1 -1 2 -1 -1 1 -1 j … n 1 -1 -1 -1 Sample would be related to the class with the index which is matched by the output vector element with maximum value. The problem of using SVM resides in building rectifying planes or cores , which are the most relevant for a specific target. Several standard cores were tested for dynamometer cards classification. These cores narrow SVM down to: • Polynomial quadratic), separating hyperplanes (linear and • Two-layers neural networks, • Potential functions (radial basis networks or radial basis functions). Ensemble of SVM-RBF classifiers with the Gaussian showed the best results of classification. Quadratic programming method was used to select the separating hyperplanes parameters. The other type of classifier is the decision tress ensemble. It is built according to the method of the continual improvement of classification result. A variation of the busting algorithm was used, because the arbitrary complex compositions can be built from weak classifiers. Such compositions can demonstrate 2017 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) good results in classification problems if they are configured correctly. Number of classifiers is 10, method of forming the classifiers ensemble is boosting . Results of the k-fold cross-validation procedure with k=10 shown in tables 5 and 6. TABLE 5. Results of classifier training Classifier Training Average value of type dataset ɫorrectly recognized samples from training dataset and standard deviation, % Multilayer small 94.743 ± 3.621 perceptron medium 91.742 ± 1.719  extra 80.792 ± 1.776 SVM-RBF small 98.363 ± 0.369 ensemble medium 97.611 ± 0.744 extra 93.691 ± 0.675 Trees small 100.000 ± 0.000 decision medium 99.948 ± 0.164 ensemble + extra 97.936 ± 0.251 boosting Average value of ɫorrectly recognized samples from test dataset and standard deviation, % 95.196 ± 4.710 90.763 ± 6.903 78.175 ± 9.072 80.910 ± 1.616 70.734 ± 2.649 67.259 ± 2.845 78.293 ± 3.276 72.273 ± 4.460 69.104 ± 1.626 TABLE 6. Assessments of precision, recall and their F-score Classifier type Taining precision recall dataset Multilayer perceptron  SVM-RBF ensemble Trees decision ensemble + boosting F-score small 0.9420 0.9345 0.9382 medium extra small 0.9120 0.7747 0.8021 0.9059 0.7656 0.7688 0.9089 0.7701 0.7851 medium extra small 0.7535 0.6258 0.6880 0.5446 0.7192 0.5824 medium extra The following concepts are used to assess the quality of classifiers performance . Precision of classification within the class is the part of samples which belong to this class in relation to all of the samples related to this class by the classifier: Pr ( ci ) = pin pin + nout pin – number of samples of the class that were related to the class by the classifier, pout – number of samples of the same class that were not related to , nout – number of remain samples of the class that belong to other classes. Recall of classification within the class is the part of samples which belong to this class Balanced F1-measure is the harmonic mean of the precision and recall: F1 ( ci ) = 2·Pr ( ci )·Rc ( ci ) Pr ( ci ) + Rc ( ci ) As shown in table 5 and 6, multilayer perceptron gives the best results in the limited and symmetric number of samples datasets, which called small, as its classification accuracy on the test dataset is much higher than accuracy of the other architectures. On the other hand, classification accuracy of the trees decision ensemble and SVM is higher on the training dataset than accuracy of the multilayer perceptron. Such results show a good generalizing capability of neural network decisions (approaches) and the overtraining effect that consists in ensemble structures undue sensitivity to the noise contained in the training set. It is in good agreement with the information  about SVM fluctuation in relation to the noise in the source data as the influence of outliers on building the separating hyperplane is significant. Therefore, to suppress this effect it is necessary to investigate the relevance vector machine (RVM) possibilities or further dataset refinement techniques those are associated with significant technical difficulty connected with accumulation of full-size dataset of all submersible equipment conditions. As for ensembles of weak classifiers built on the basis of continual improvement algorithms, it is difficult to configure them and select the architecture on the limited datasets. Overcoming of overtraining effect is failed. III. RESULTS AND DISCUSSIONS Conclusions made on the basis of [1-3, 19, 20] about the promising direction for further improvement of the approaches to the construction of the automated dynamometer card classifier in the problem of DSRP submersible equipment diagnosis are: 1) Expert labeling of the equipment condition class should include a degree of expert certainty about the sample belonging to the particular class (or it should be an information about the results of repairing or disassembling equipment); 2) At the stage of dynamometer card pretreatment it is necessary to allocate a set of feature points of the curve , to split F(S) curve into a certain number of areas that characterize the processes taking place in the facility, and to apply the described procedures of feature generation and selection for particular areas; 3) Neuro-fuzzy classifier is the good prospect because it considers the description of particular parts of dynamometer card curve as a set of features selected by plant engineer or in result of the analysis of numerical schemes of equipment submersible part performing. IV. CONCLUSIONS Problem discussed in this paper is about selecting the type of classifier by comparing generalizing possibilities of trained systems. Paper analyzes the possibilities of classifiers based on a support vector machines ensemble, multilayer perceptron and trees ensembles solutions. Possibilities of diagnostic algorithm of DSRP submersible equipment on the basis of recognition the pretreated dynamometer cards that characterize the current state of the object were explored. The reliability of recognition the equipment condition classes is at the level of 78-98% for the data that were not a part of the training set. 2017 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) REFERENCES        R.A. Badamshin, B.G. Il’yasov, K.F. Tagirova, I.V. Dunaev, “Neural networks in the pump equipment diagnosis problem,” Neurocomputers: development and application, no. 10, pp. 66-69, 2007. K.F. Tagirova, A.M. Vulfin, “Algorithms of neural network data processing in the problem of oil-production enterprise deep-pumping equipment diagnosis,” Automation, telemechanization and communication in oil industry, no. 12, pp. 28-32, 2013. A.M. Vulfin, K.F. Tagirova, “Enhancement of accuracy of deeppumping equipment based on data minning,” Optical Memory and Neural Networks, vol. 24, no. 1, pp. 28-35, 2015. A.M. Vulfin, A.I. Frid, “Diagnosis of the oil-production enterprise equipment conditions on the basis of data mining,” in Proc. Information Technologies for Intelligent Decision Making Support, ITIDS’2014, Ufa, 2014, pp. 14-18. A.M. Vulfin, A.I. Frid, “Safety Increasing of Oil Companies Engineering Networks Operation with Use of Artificial Intelligence Systems,” in Proc. of the 16 international scientific conference CSIT’2014. Sheffield, vol. 3, pp. 53-58, 2014. K.F. Tagirova, A.M. Vulfin, A.R. Ramazanov, A.A. Fatkhulov, “Modified algorithm of determiming the current DSRP operating parameters accorging to the dynamometry data,” Automation, telemechanization and communication in oil industry, no. 12, pp. 37-41, 2015. K.F. Tagirova, A.M. Vulfin, G.T. Bulgakova, A.R. Ramazanov, A.A. Fatkhulov, “Coordinated control system of oil-wells group based on the hierarchic system of dynamic models,” in Proc. Mathematical modeling and computer technologies in the field development processes, Ufa, 2016, pp. 31.              A.M. Vulfin, A.I. Frid, V.M. Giniyatullin, “Data processing system for oil production engineering network diagnosing on the basis of data mining,” Bashkir chemical journal, vol. 19, no. 4, pp. 72-78, 2012. V.I. Vasil’ev, B.G. Il’yasov, Intelligent control systems. Theory and practice: a training manual, Radiotekhnika, 2009. S.A. Terekhov, “Wavelets and neural networks,” in Proc. of the III AllRussian scientific and technical conference Neuroinformatics-2001: lectures of Neuroinformatics. Part 2, Moscow, 2001, pp. 94-116. S.A. Ayvazyan, V.M. Bukhtshtaber, I.S. Enyukov, Practical statistics: classification and dimension reduction, Moscow: Finance and statistics, 1989. I.T. Jolliffe, Principal Component Analysis. Springer Series in statistics, 2002. K.V. Vorontsov, Machine learning. Lecture course. [Online]. Available: http://www.machinelearning.ru/wiki/ index.php. A.B. Stepanov, “Application of neural networks in the synthesis of wavelets for continuous wavelet transform,” Scientific and technical SPbSTU journal. Physics and mathematics, no. 1, pp. 39-44, 2013. M. Thuillard, A review of wavelet networks, wavenets, fuzzy wavenets and their applications. Advances in computational intelligence and learning, Netherlands: Springer, 2002. V.N. Kopenkov, “Efficient algorithms for local discrete wavelet transform with the Haar basis,” Computer optics, vol. 32, no. 1, pp. 7884, 2008. J. Cordova, W. Yu, “Two types of Haar wavelet neural networks for nonlinear system identification,” Neural Process Lett., no. 35, pp. 283300, 2005. S. Haykin, Neural Networks. A comprehensive foundation, Pearson Education, 2005. R.Yu. Mansafov, “New approach to DSRP operating diagnosis by dynamometer card,” Engineering practice, no. 9, pp. 92-99, 2010. I.G. Belov, Research of the deep well pump operating by using dynamometer, Moscow: Gostoptechizdat, 1960.