Impact of Geospatial Classification Method on Interpretation of Intelligent Compaction Data Mehran Mazari, Cesar Tirado, Soheil Nazarian, and Raed Aldouri Administration in 1974. In 1975, Geodynamik was founded to continue the development of roller-mounted compaction meters (3). Later, Geodynamik and Dynapac introduced the compaction meter value to monitor the roller-integrated compaction process. For the following five years, a number of roller manufacturers began offering systems that used compaction meter values. In 1982, Bomag introduced the omega value (which was a measure of compaction energy and time) and Terrameter. With the introduction of mechanistic and performance-related soil properties, Bomag launched the vibration modulus as a measure of dynamic soil stiffness. In 1999, Ammann introduced the soil stiffness parameter, followed by the compaction control value of SAKAI in 2004. The IC systems have been under continuous development since then. The IC data, in the generic form of intelligent compaction measurement values (ICMV), are best described and interpreted as colorcoded maps. These maps, which are also known as choropleth or thematic maps, display the geo-referenced or spatial data on a map in which each class is separated by color. An understanding of the spatial pattern of the ICMV data depends on the optimal selection of both the number of classes and the values of the breaks between classes. The use of more than three colors is common in many geospatial and cartographic analyses. In the case of IC data, assigning the three colors to be gray, light gray, and dark gray was considered practical. The ultimate goal of using the ICMV color-coded maps is to identify lessstiff areas (often marked as red spots but appearing as dark gray in this paper) and to improve the uniformity of compaction throughout the construction area. Supplemental spot tests could then be suggested within the identified less-stiff areas to investigate the mechanistic and the moisture and density properties of the compacted geomaterials as part of the quality control process. A number of studies have been dedicated to investigating different geospatial classification techniques. Brewer and Pickle evaluated the impact of a few classification methods on the interpretation of the geo-referenced data (4). They recommended the quantile method, followed by the natural breaks method and a modified version of the equal intervals method, to describe the spatial data effectively. Osaragi evaluated the performance of several classification methods on different sets of population data using the principles of information theory and data entropy (5). He concluded that the distribution of the geospatial data affected the selection of the optimal classification approach. Xiao et al. highlighted the effects of geospatial data uncertainty on the classification method (6). They defined a minimum classification uncertainty level and an overall classification robustness factor. The latter term was defined as the proportion of data that met a minimum uncertainty threshold. Xiao et al. concluded that a Intelligent compaction is an emerging technology in the management of pavement layers, more specifically, of unbound geomaterial layers. Different types of intelligent compaction measurement values (ICMVs) are available on the basis of the configuration of the roller, vibration mechanism, and data collection and reduction algorithms. The spatial distribution of the estimated ICMVs is usually displayed as a color-coded map, with the ICMVs categorized into a number of classes with specific color codes. The number of classes, as well as the values of the breaks between classes, significantly affect the perception of compaction quality during the quality management process. In this study, three sets of ICMV data collected as a part of a field investigation were subjected to geostatistical analyses to evaluate different classification scenarios and their impact on the interpretation of the data. The classification techniques were evaluated on the basis of the information theory concept of minimizing the information loss ratio. The effect of the ICMV distribution on the selection of the classification method was also studied. An optimization technique was developed to find the optimal class breaks that minimize the information loss ratio. The optimization algorithm returned the best results, followed by the natural breaks and quantile methods, which are suited to the skewness of the ICMV distribution. The identification of less-stiff areas by using the methods presented will assist highway agencies to improve process control approaches and further evaluate construction quality criteria. Although the concepts discussed can apply to any compacted geomaterial layer, the conclusions apply to the type of compacted soil in this particular test section. Intelligent compaction (IC) is an emerging technology for monitoring the compaction of base and soil layers and for managing compaction data to improve the quality of compacted layers. The advantages of IC are reported as improved quality and uniformity, reduced overcompaction and undercompaction, better identification of less-stiff spots, and increased lifetime of the roller (1, 2). IC is a specific terminology for a wider concept belonging to continuous compaction control, initiated by the Swedish Highway M. Mazari, Civil Engineering Department, California State University, Los Angeles, 5151 State University Drive, Los Angeles, CA 90032. C. Tirado, Metallurgy Building, Room M-105D; S. Nazarian, Engineering Building, Room A-207, Center for Transportation Infrastructure Systems; and R. Aldouri, Center for Regional Geospatial Service, Engineering Building, Room A-219, University of Texas at El Paso, 500 West University Avenue, El Paso, TX 79968. Corresponding author: M. Mazari, [email protected] Transportation Research Record: Journal of the Transportation Research Board, No. 2657, 2017, pp. 37–46. http://dx.doi.org/10.3141/2657-05 37 38 Transportation Research Record 2657 smaller number of classes would yield a more robust classification. Sun and Wong introduced a modified natural breaks method that ensured the statistical differences among the values that defined the class breaks (7). Jiang proposed a new classification method for geospatial data with heavy-tailed or positively skewed distributions, called the head/tail breaks method (8). In that method, the number of classes and class breaks was determined on the basis of the hierarchical levels among data. Jiang concluded that the head/tail method was superior to the natural breaks algorithm for geo-referenced data with heavy-tailed distributions. The main objective of this research was to evaluate different classification methods and their impact on the interpretation of geospatial IC data. Three sets of ICMV data, collected along a test section, were evaluated after being subjected to different classification algorithms. The impact of the ICMV data distribution on the selection of the class breaks was also studied. The following sections of the paper first describe the principles of ICMV, followed by a description of the process of data collection, the analysis, and the evaluation of classification methods. Intelligent Compaction Measurement Values The concept of correlating the stiffness of the compacted layer to the excitation frequency initiated the use of accelerometers to monitor the compaction process (3). This idea was further improved and became the basis of measurement for some of the IC roller vendors. The acceleration-based ICMV is generically defined as given in Equation 1: A ICMV = C × 4 A2 (1) vibration harmonics could also be identified during the compaction process (3). Field Investigation Field investigation was carried out along US-67 near Cleburne, Texas. The premapping of the existing embankment layer was followed by the placement, compaction, and mapping of a 12-in. clayey layer. A 500-ft-long and 25-ft-wide test section was selected to perform the IC data collection. Three IC systems were employed to collect the data during the premapping of the existing embankment layer. Two of these IC systems were installed on the first roller; one was an original equipment manufacturer system and the other one was a retrofit kit. The third system was an original equipment manufacturer IC system on the second roller. Once the process of field data collection was completed, the ICMV data were exported in comma-separated values format for further geospatial and geostatistical analyses. Figure 1a illustrates the cumulative distributions of the ICMV data collected by the three IC systems. IC Systems 1 and 2 provided similar cumulative distributions of ICMVs, whereas IC System 3 yielded a slightly different pattern. The ICMV data are further demonstrated in the box plot format in Figure 1b. IC System 1 shows a lower median and range of collected data; IC System 2 shows the lowest interquartile range (IQR) among the three systems. (The IQR is defined as the difference between the third and first quartiles.) On the basis of the outlier analysis of IC System 2, ICMVs that are greater than 56.9 and less than 8.1 (defined as 1.5 × IQR above the third quartile or below the first quartile) can be categorized as suspected outliers, illustrated in Figure 1b as bold dashes. The only applicable outliers (defined as 3 × IQR above the third quartile or below the first quartile) are shown as bold triangles in Figure 1b. where Statistical Analysis of ICMV Data A2=acceleration of the forcing component of the vibration, A4=acceleration of the first harmonic of the vibration, and C=a constant value. This type of ICMV takes only the forcing frequency and first harmonic into account. However, if the compacted layer becomes stiffer, other The descriptive statistics of the ICMVs from different IC systems are summarized in Table 1. The mean ICMVs from the three systems are close to one another. However, the standard deviation, and, as a result, the coefficient of variation, of the third set is significantly greater 100 80% 80 ICMV 60% ICMV 0 0 20 10 80 70 60 50 40 30 20 10 0 0% 40 IC System 1 IC System 2 IC System 3 M 20% 60 or e 40% 90 Cumulative Frequency 100% IC System 1 IC System 2 (b) (a) FIGURE 1 Illustration of (a) cumulative distribution of ICMVs and (b) box plots of ICMVs from IC systems. IC System 3 Mazari, Tirado, Nazarian, and Aldouri 39 TABLE 1 Descriptive Statistics of ICMV Data System IC System 1 IC System 2 IC System 3 Mean Median Mode SD CV (%) Skewness Kurtosis 29.6 27.6 26.8 28.6 28.1 22.0 25.8 28.7 16.0 12.0 10.6 18.3 40.5 38.4 68.3 0.48 0.26 1.95 0.15 0.83 3.86 Note: CV = coefficient of variation. collected ICMVs can affect the selection of the class breaks during the classification of the data and also affect the interpretation of the results. Furthermore, the Kurtosis statistic, which is similar to the skewness and measures the heavy tails, shows that the ICMV data from System 3 have the heaviest-tailed distribution (9). Figure 2 compares the distributions of the three sets of ICMV data. The mean (µ) and standard deviation (σ) values are also illustrated in that figure, appearing as vertical dashed and dash-dot lines, than that of the other two. This could be because of the different algorithm and mechanism of obtaining the ICMVs in that IC system. The coefficient of variation of IC System 2 is the lowest of the three systems. The skewness of the third data set is significantly greater than that of the first and second data sets. This means that the third set of ICMV data contains a longer tail in the distribution curve and is positively skewed compared with the other two sets of data. The skewness of the 800 1,000 700 900 800 700 +1σ −1σ 500 400 Frequency µ 300 +1σ −1σ 600 500 µ 400 300 200 ICMV ICMV (a) (b) 160 140 Frequency 120 100 +1σ −1σ 80 µ 60 40 20 96 88 80 72 64 56 48 40 32 24 16 8 0 0 ICMV (c) FIGURE 2 Distribution of ICMV data from IC systems: (a) IC System 1, (b) IC System 2, and (c) IC System 3. 90 84 78 72 66 60 54 48 42 36 30 24 18 0 78 72 66 60 54 48 42 36 30 24 18 6 0 12 0 0 100 6 200 100 12 Frequency 600 40 Transportation Research Record 2657 respectively. As in Table 1, the distribution of the ICMV data from IC System 3 shows a long tail compared with the other two systems. This fact is well illustrated in Figure 2c as the distribution shows a positive skewness (right-hand tail). The following sections highlight the impact of the distribution characteristics on the selection of the classification method. data. For the xi geospatial data (i.e., the ICMV data in this study), the AIC statistic can be estimated from the following equation (5): m AIC = −2∑ ∑ xi log qˆ k + 2 ( m − 1) (2) k =1 i ∈Gk where xi=geospatial data, m=number of classes, q̂ k= Si∈Gk xi /Xnk Evaluating the Classification Methods Information and data classification are terms that apply to a wide range of attributes. In the case of geospatial and geo-referenced data (e.g., IC data), the choice of the classification method could significantly affect the interpretation of the collected information. In almost every classification problem, two main issues should be addressed: (a) the optimal number of classes to represent the geospatial data and (b) the break values that define the boundaries among different classes of data (5). The most common classification methods used in the analysis of geospatial data will be applied to the IC data collected by System 1, to compare the results and seek insight in selecting the most optimized classification method. Figure 3 illustrates implications of the ICMV classification. The class breaks and the domain of each class are demonstrated on the rank-size distribution, where the ICMV data are first sorted from the greatest value to the least one; then the rank of each ICMV among the total number of data is estimated (Figure 3a), and the cumulative distribution graph is drawn (Figure 3b). The selection of the class breaks significantly affects the number of data points within each class and consequently impacts the interpretation of the produced color-coded map. One approach to finding the most suitable classification method is to minimize the Akaike’s information criterion (AIC) among different classification techniques (10). The use of the AIC is popular in the field of information theory. It involves minimizing the KullbackLeibler (KL) distance between the proposed model and the expected targets. The KL distance has been used in information theory to measure the difference between two probability distributions; it shows the amount of information loss or gain when migrating from one probability distribution to another (11). Although the AIC statistic is mainly applied to the selection of the best model in a set of information models, it could also be applied to the classification of geospatial and Gk=kth class of the data, nk=number of data in class Gk, X=global sum of xi data, and n=total number of features. The AIC statistic was calculated for four popular classification methods in this study. Figure 4 illustrates the results of these classification methods as color-coded maps along with the frequency distributions of the ICMV data from IC System 1. These methods include the quantile, natural breaks, geometrical intervals, and equal intervals methods. In the quantile method, each class contains an equal number of data points. The natural breaks classification method (also called the Jenks natural breaks classification method) breaks the data into a specified number of classes on the basis of the maximization of the difference, or variance-minimization, between the groups of data (12). This method is also known as the goodness of variance fit (GVF) to optimize the classification accuracy. The GVF criterion is defined as GVF = 100 − ( xi∈Gk − Gk ) ( xi − X ) × 100 2 2 (3) where xi=a geo-referenced feature, xi∈Gk=representation of features in class Gk, – Gk=average of features in class Gk, and – X =average of all features. The geometrical intervals method identifies the class breaks on the basis of the intervals that represent a geometric series. In other 100 100% 80 80% Class 1 Class 3 Class 2 Frequency 60 40 Class 2 40% 20% 20 Class 1 ICMV Rank ICMV (a) (b) FIGURE 3 Distribution of ICMV data and sample class breaks in (a) rank-size distribution and (b) cumulative distribution. 78 72 66 60 54 48 42 10,000 36 8,000 30 6,000 24 4,000 18 2,000 12 0% 0 6 0 60% 0 ICMV Class 3 77.2 700 33.9 23.8 800 Frequency 600 +1σ −1σ 500 400 µ 300 200 100 66 72 72 78 60 66 54 60 48 42 36 30 24 18 6 12 0 0 ICMV (a) 77.2 700 38.5 24.1 800 Frequency 600 +1σ −1σ 500 400 µ 300 200 100 78 54 48 42 36 30 24 18 12 6 0 0 ICMV 700 77.2 11.8 800 30.7 (b) Frequency 600 500 +1σ −1σ 400 µ 300 200 100 66 72 66 72 78 60 60 54 48 42 36 30 24 18 12 6 0 0 ICMV 700 77.2 28.4 800 52.8 (c) Frequency 600 500 +1σ −1σ 400 µ 300 200 100 ICMV (d) FIGURE 4 Different classification methods: (a) quantile, (b) natural breaks, (c) geometrical intervals, and (d ) equal intervals for geospatial analysis of IC data from System 1. 78 54 48 42 36 30 24 18 12 6 0 0 42 Transportation Research Record 2657 words, this method generates geometric intervals by optimizing the sum of squares of the number of features in each group. Inversely, the equal intervals method classifies the features in different groups in a way that the width of each class is the same. The width of interval is defined as the range of data divided by the number of features in each data group (13). The graphs on the right-hand side of Figure 4 show the class break values with respect to the mean (µ) and standard deviation (σ) of the collected ICMV data. The shaded areas in these graphs represent the three colors used in the maps to differentiate the areas with various ranges of ICMVs. Observing the differences among the four color-coded maps in Figure 4, one can interpret less-stiff areas differently. For example, Figure 4c indicates that most of the areas within the test section are either well compacted (gray) or acceptable (light gray). On the other hand, Figure 4d represents a totally different intuition, that the majority of the test section is either less stiff (dark gray) or barely acceptable. However, the acceptable compaction criteria in this study were drawn on the basis of the existing compaction data during the construction process to improve the compaction uniformity; they are not mechanistic-based target values. A comprehensive and yet complicated process needs to be conducted to establish the target ICMVs. The results of the AIC estimations are summarized in Table 2. The natural breaks method provides the lowest AIC. Although the AIC could be employed as an indicator of a classification method’s performance, it might not be the only criterion to evaluate the effectiveness of the classification method. Another approach to find the best breaks between classes is to estimate the ratio of the information loss in each classification method. This concept is also correlated to the KL divergence criterion (11). Assuming that P is the true distribution of data and Q is the target distribution, the KL divergence is the natural distance function between P and Q. This measure is mainly referred to as information gain ratio, which could be translated to the loss of information during the interpretation of the geospatial data for a given classification method. The information loss in terms of a KL divergence criterion is estimated from Equation 4a, where L is the rate of information loss: n n i =1 i =1 L = ∑ pi log ( pi ) − ∑ pi log ( qˆ k ) (4 a) where n pi = ∑x i =1 X i ; qˆ k = ∑x i ∈Gk Xnk i n with X = ∑ xi (4 b) i =1 The results of such an analysis are also included in Table 2 for different data classification methods applied to the collected ICMV data. Based on AIC and L, the natural breaks method, followed by the quantile method, are the most efficient techniques for classifying the geo-referenced IC data. Both AIC and L are lowest for these two classification methods. On the other hand, the geometrical intervals and equal intervals methods returned higher AIC and L values, meaning that those methods may not be optimal for the representation and interpretation of these IC data. The advantage of the natural breaks method is that it can identify the trend between different classes of data and determine the class breaks on the basis of the numerical values. Table 2 also summarizes the percentage of area within the test section identified by each classifying color, as illustrated earlier in Figure 4. As an example, the natural breaks method shows 30% of the test section as relatively less stiff (dark gray), while the geometrical intervals indicates the same condition for only 15% of the section. This difference can significantly affect the interpretation of the collected IC data. The following section provides an algorithm that can optimize the classification of the ICMV data on the basis of the concept of information loss. Optimization of Class Break Values An optimization algorithm was developed to estimate the class breaks on the basis of the minimization of the information loss ratio, L. This method simulates the process of biological evolution through a series of computation stages and within a specific population of solutions. The offspring solutions are then introduced to the population of parent solutions, along with some mutations to provide new solutions. The convergence of the general model is defined as the minimization of a fitness function. This procedure is stopped during the evolution as soon as the fitness function cannot be improved any further. In this optimization study, the convergence factor was selected as 0.0001, the mutation rate as 0.075, and the population size as 100. The calculated AIC for the optimized class breaks is 2.2532 × 106, which is the least, compared with the AIC values reported in Table 2. The ratio of the information loss, L, is 0.6866%, which is the smallest, compared with the other methods in Table 2. The optimized set of class breaks minimizes both the AIC and L criteria when classifying the collected geo-referenced ICMV data. Figure 5 illustrates the color-coded map and distribution of ICMV data for IC System 1 on the basis of the optimized class breaks. The results of this optimized classification are close to the ones from the natural breaks and quantile methods in Figure 4, a and b. In other words, the natural breaks method, followed by the quantile method, are reasonable estimates of the optimized classification approach for this set of ICMV data. Although the classification techniques discussed in this section are well suited to the collected IC data, the selection of the best class TABLE 2 Comparison of Performance of Geospatial Data Classification Methods for ICMV Data from System 1 Classification Method AIC (× 106) L (%) Marked as Gray (%) Marked as Light Gray (%) Marked as Dark Gray (%) Quantile Natural breaks Geometrical intervals Equal intervals 2.2535 2.2533 2.2695 2.2552 0.7424 0.7169 0.9591 1.0457 56 50 60 32 13 20 25 31 31 30 15 37 Note: L = rate of information loss. Mazari, Tirado, Nazarian, and Aldouri 43 77.2 700 35.1 20.8 800 Frequency 600 ICMV 500 +1σ –1σ 400 µ 300 200 100 0 6 12 18 24 30 36 42 48 54 60 66 72 78 0 ICMV FIGURE 5 Optimized values of class breaks for ICMV data (System 1). breaks depends on the distribution of ICMVs. As an example, the skewness of the data and the ratio of the head to the tail in the best-fit distribution may affect the selection of the classification technique. Furthermore, not all IC data fit into a normal distribution, as has been observed in a number of field studies (2, 14). The following section discusses the impact of ICMV distribution on the selection of the classification method. Impact of ICMV Distribution on Selection of Data Classification Algorithm To understand the impact of ICMV distribution on the selection of the classification technique, the two well-behaved classification methods were applied to the data obtained from IC Systems 2 and 3. Figure 6 illustrates the result of such analyses for the natural breaks and quantile methods. The class breaks for the ICMVs from System 2 are shown in Figure 6, a and b, which are relatively close. However, the selection of a properly optimized classification method is more critical for the data collected by System 3, because the ICMVs represent a heavy-tailed distribution (Figure 2c). Comparing Figure 6, c and d, the range of class breaks for System 3 can be seen to be significantly different between the quantile and natural breaks classification algorithms. The reason for such a noticeable difference is that the quantile method classifies the data on the basis of the number of features in each class while the natural breaks method finds the actual break points between the values. As shown in Figure 6, the natural breaks method performs more efficiently than the quantile method on the basis of the information loss ratio, L. In the case of the heavy-tailed data, the interpretation of the ICMV trends and identification of less-stiff areas are very sensitive to the selection of the classification method. Furthermore, the minimum value of L for the ICMVs from System 3 is still greater than those values for IC System 2. The anomalies in the collected ICMVs could be traced to the properties of the IC system components, such as the vibration sensor, the GPS unit, and the data acquisition process. As a result, there is a need for a uniform calibration system for the IC rollers’ components. Not only can the classification approach for geo-referenced IC data affect the understanding of the results but also the statistical significance of spatially distributed IC data can be of major concern for the compaction quality management process. The following section analyzes the spatial significance of the IC data in this study. Spatial Significance Analysis of IC Data Because of the variable range of ICMVs and uncertainties associated with the process of establishing the breaks between classes and selecting a classification method, it would be beneficial to understand the statistical significance of the geo-referenced IC data. The spatial analysis techniques are used to identify the spatial pattern and the “hot spots” in the geo-referenced data. These methods estimate the local correlation of a geospatial data point with respect to its neighbors within a specific distance. Of these methods, the Getis-Ord process is more popular because it compares the local averages with the global average; the other methods compare each observation only with its neighbors (15). The results of Getis-Ord hot spot analysis reveal the locations that have unusual patterns as well as the significance of the differences between local and global averages within the area of interest. This method basically tests the null hypothesis, which assumes that there is no significant correlation between each data point and its neighbors up to a specified distance. Thereafter, a statistical significance test is carried out to accept or reject the null hypothesis using the z-score and p-value. The z-score defines whether to accept or reject the null hypothesis. The p-value shows the probability of falsely rejecting the null hypothesis during the statistical significance testing. Both of these statistics require the assumption of normal distribution of data. The range of acceptable z-scores and p-values are dependent on the confidence level. For example, at a 95% confidence level, the range of z-scores is 1.96 to −1.96 and the corresponding p-value is .05. In practical terms, if a large ICMV is not surrounded by other large ICMVs, it is not considered to be statistically significant. On the other hand, if a large ICMV is surrounded by other large ICMVs, it will be marked as statistically significant and can be a hot spot. At different confidence levels, the z-score and p-value could be used to identify the spatial significance of collected ICMVs as compared with the global average of all neighbors. Figure 7 illustrates the results of spatial significance analysis in terms of the estimated p-values. For example, at the 99% confidence level, if the p-value is less than .01 (i.e., the z-score is either less 44 Transportation Research Record 2657 800 700 +1σ –1σ 600 500 Frequency 700 µ 400 +1σ –1σ 600 500 µ 400 300 200 200 100 100 0 0 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 300 ICMV ICMV (a) (b) 120 80 ICMV ICMV (d) FIGURE 6 Impact of ICMV distribution on data classification: (a) System 2—quantile (L = 1.1344%), (b) System 2—natural breaks (L = 0.9606%), (c) System 3—quantile (L = 2.2823%), and (d ) System 4—natural breaks (L = 1.9287%). 96 88 80 72 64 56 48 40 32 96 88 80 72 64 56 48 40 0 32 0 24 20 16 20 8 40 0 40 (c) µ 60 24 60 +1σ –1σ 16 µ 100 8 80 Frequency +1σ –1σ 0 120 Frequency 99.5 140 62.7 99.5 28.0 19.0 140 30.7 160 160 100 90.5 34.1 900 800 Frequency 20.7 31.8 900 90.5 1,000 23.7 1,000 Mazari, Tirado, Nazarian, and Aldouri 45 (b) (a) (c) FIGURE 7 Spatial significance of ICMV data at different confidence levels: (a) 99% confidence level, (b) 95% confidence level, and (c) 90% confidence level (● = spatially significant and ● = not spatially significant). than −2.58 or more than 2.58), the collected ICMV data are significantly different from the rest of the test section. Figure 7a illustrates the distribution of p-values at this confidence level. The dark gray areas have a spatially significant difference with respect to the rest of the test section; the gray areas tend not to show a significant difference. At lower confidence levels of 95% and 90% (Figure 7, b and c), the area covered by spatially significant ICMVs increases. Furthermore, the dark gray zones in Figure 7, b and c, are similar to those observed in Figures 4a and 5b. The advantage of this type of analysis is that it is independent of the range and distribution of ICMV data. However, the confidence level significantly affects the interpretation of the collected IC data. Summary and Conclusions To understand compaction quality and uniformity, one may classify the IC data and represent the results as a color-coded map. Although the process of classification is straightforward, it has a significant impact on the color-coded maps and consequently adds bias to the quality management process. To understand this phenomenon, four basic classification techniques were evaluated using three sets of IC data. The following conclusions can be drawn from the analyses performed in this study: • The natural breaks algorithm and the quantile method seem to perform more efficiently than the geometrical intervals and equal intervals techniques. Because the quantile algorithm divides the data on the basis of the number of features in each class, its performance might not be optimal for different types of ICMV distributions. On the other hand, the natural breaks method identifies the variance of each class with respect to the mean and determines the location of class breaks on the basis of numerical values of features. • To improve the objectivity of the representation of data, an algorithm was introduced to optimize the values of the class breaks systematically. The performance of the optimization algorithm in terms of information loss ratio was superior compared with the other classification techniques. • The distribution of the IC data significantly impacts the selection of the classification algorithm and the subsequent interpretation of the color-coded maps. The heavy-tailed ICMV distribution is very sensitive to the selection of class breaks. • The statistical significance of the ICMVs within a test section can be represented as the results of statistical hypothesis testing in terms of z-score and p-value at different confidence levels. • Overall, the interpretation of IC data depends significantly on the classification technique to generate the color-coded maps. A solution could fit different sets of ICMVs. A comprehensive statistical analysis of the collected IC data is crucial in the identification of less-stiff areas and improving the uniformity of compaction. • Compared with conventional quality control and quality assurance processes, the use of ICMV color maps simplifies the process for identification of less-stiff areas. However, either the modulus- or density-based quality management approaches could still be employed to evaluate the compaction quality of less-stiff zones. The processes discussed in this paper emphasize the importance of systematic and objective ICMV classification methods in the quality control work, considering the type of geomaterials. • It is understood that as compaction proceeds, the dispersion of the ICMV data decreases. However, the IC data in this study were collected only after the completion of the compaction process, known as mapping. Even though the mapping data show less uncertainty compared with the pass-by-pass data, the identification of less-stiff areas is still dependent on the classification criteria. The methods and analyses reviewed in this study allow quality management agencies to understand the uncertainty and nonuniformity of compaction data better. 46 Acknowledgments This study was carried out as part of the FHWA EDC-2 project in cooperation with the Texas Department of Transportation. The authors thank Jimmy Si, Richard Izzo, Antonio Nieves, Mike Arasteh, and the study panel for help and advice throughout this study. References 1. Mooney, M. A., R. V. Rinehart, N. W. Facas, O. M. Musimbi, D. White, and P. K. R. Vennapusa. NCHRP Report No. 676: Intelligent Soil Compaction Systems. National Cooperative Highway Research Program, Transportation Research Board, Washington, D.C., 2010. 2. White, D. J., M. J. Thompson, P. K. R. Vennapusa, and J. A. Siekmeier. Implementing Intelligent Compaction Specification on Minnesota TH-64: Synopsis of Measurement Values, Data Management, and Geostatistical Analysis. Transportation Research Record: Journal of the Transportation Research Board, No. 2045, 2008, pp. 1–9. http://dx.doi .org/10.3141/2045-01. 3. Anderegg, R., and K. Kaufmann. Intelligent Compaction with Vibratory Rollers: Feedback Control Systems in Automatic Compaction and Compaction Control. Transportation Research Record: Journal of the Transportation Research Board, No. 1868, Transportation Research Board of the National Academies, Washington, D.C., 2004, pp. 124–134. http://dx.doi.org/10.3141/1868-13. 4. Brewer, C. A., and L. Pickle. Evaluation of Methods for Classifying Epidemiological Data on Choropleth Maps in Series. Annals of the Association of American Geographers, Vol. 92, No. 4, 2002, pp. 662–681. https://doi.org/10.1111/1467-8306.00310. 5.Osaragi, T. Classification Methods for Spatial Data Representation. Center for Advanced Spatial Analysis. University College London, United Kingdom, 2002. 6. Xiao, N., C. A. Calder, and M. P. Armstrong. Assessing the Effect of Attribute Uncertainty on the Robustness of Choropleth Map Classification. International Journal of Geographical Information Science, Vol. 21, No. 2, 2007, pp. 121–144. https://doi.org/10.1080/13658810600894307. Transportation Research Record 2657 7. Sun, M., and D. W. S. Wong. Incorporating Data Quality Information in Mapping the American Community Survey Data. Cartography and Geographic Information Science, Vol. 37, No. 4, 2010, pp. 285–299. https://doi.org/10.1559/152304010793454363. 8. Jiang, B. Head/Tail Breaks: A New Classification Scheme for Data with a Heavy-Tailed Distribution. Professional Geographer, Vol. 65, No. 3, 2013, pp. 482–494. https://doi.org/10.1080/00330124.2012.700499. 9. Westfall, P. H. Kurtosis as Peakedness, 1905–2014, RIP. American Statistician, Vol. 68, No. 3, 2014, pp. 191–195. https://doi.org/10.1080 /00031305.2014.917055. 10. Akaike, H. Akaike’s Information Criterion. In International Encyclopedia of Statistical Science (M. Lovric, ed.), Chapter 12. Springer-Verlag, Berlin and Heidelberg, 2011. https://doi.org/10.1007/978-3-642-04898-2 _110. 11. Kullback, S., and R. A. Leibler. On Information and Sufficiency. Annals of Mathematical Statistics, Vol. 22, No. 1, 1951, pp. 79–86. https:// doi.org/10.1214/aoms/1177729694. 12. Jenks, G. F., and F. C. Caspall. Error on Choroplethic Maps: Definition, Measurement, Reduction. Annals of the Association of American Geographers, Vol. 61, No. 2, 1971, pp. 217–244. https://doi.org/10.1111 /j.1467-8306.1971.tb00779.x. 13. De Smith, M. J., M. F. Goodchild, and P. Longley. Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools. Troubador Publishing Ltd., Leicester, United Kingdom, 2007. 14. Mazari, M., J. Beltran, R. Aldouri, G. Chang, J. Si, and S. Nazarian. Evaluation and Harmonization of Intelligent Compaction Systems. International Conference on Transportation and Development, 2016, pp. 838–846. https://doi.org/10.1061/9780784479926.076. 15. Getis, A., and J. K. Ord. The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, Vol. 24, No. 3, 1992, pp. 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x. The contents of this paper reflect the authors’ opinions and do not necessarily reflect the policies and findings of FHWA. The Standing Committee on Geotechnical Instrumentation and Modeling peer-reviewed this paper.