## Abstract

Key parameters describing the spatial variability of soil properties based on the random field theory are the scale of fluctuation (SOF) and coefficient of variation (COV). To characterize the spatial variability of soil properties, reducing the impact of these errors and uncertainties is necessary. To accomplish this, for the five main layers of soil, we collected 18 cone penetration test (CPT) data from a highly heterogeneous region in Lianyungang New Airport, Jiangsu Province, China, and used the control variable method to analyze the influence of the estimation method of tendency, its function type and outliers. The results show that, compared with the ordinary least square method (OLSM), the least absolute deviation method (LADM) can more truly reflect the trend component of CPT parameters in the vertical direction, and the influence of other factors on SOF and COV is also studied, such as outliers and estimation functions of trend components. On this basis, a reasonable calculation process of SOF and COV is summarized, which provides a reference for the calculation of SOF and COV in vertical direction in the future. By comparing the SOF calculated by different models, the results show that the squared exponential (SQX) model has the highest SOF in 68.3% of the evaluation, and the single exponential (SNX) model has the lowest SOF in 64.4% of the evaluation. Moreover, we compared the SOF and COV of cone tip resistance (*q*_{c}) and sleeve friction (*f*_{s}), which showed that SOF of *q*_{c} and COV of *q*_{c} is lower than that of *f*_{s} in 54.4 and 73.3% of all evaluations, respectively.

## 1. Introduction

Soil is a product of natural deposits, and because of the complexity of sedimentary history, soil properties demonstrate evident variability in space [1-5]. The spatial variability of soil properties is one of several important sources of uncertainty and cannot be eliminated, thereby impacting geotechnical reliability and failure probability greatly [6-9]. The traditional deterministic method does not consider the spatial variability of soil properties and just provides a simple parameter (such as safety factor), as such, geotechnical problems are challenging to resolve effectively [10-14]. In 1977, Vanmarcke [15] proposed random field theory to identify the spatial variability of soil properties, which improved the cognition of geotechnical problems under the background of probability [16, 17]. Statistical analysis and random field simulation, as well as three-dimensional dynamic large deformation finite element (LDFE) analysis, are also good methods to study soil spatial variability [18-20]. Among all random fields, the stationary random field is widely studied because of its practicality, it is a method that can accurately characterize the spatial variability of soil properties with less data [21-23]. For the stationary random field model, accurately describing the spatial variability of soil properties mainly depends on three key parameters, the mean value ($\mu $), scale of fluctuation (SOF), and coefficient of variation (COV) [24, 25]. However, these parameters cannot be obtained directly and are inevitably affected by measurement errors, data deficiencies, and transformation uncertainties [26, 27]. In recent years, both horizontal and vertical spatial variability have been widely studied [28, 29]. Among them, horizontal SOF and COV simultaneously face three challenges: measurement error, data insufficiency, and conversion uncertainty. Compared with the horizontal direction, there is often more data in the vertical direction, but the spatial variability in the vertical direction is much greater than that in the horizontal direction, so the study of the spatial variability in the vertical direction is still an important topic [30, 31]. To better study vertical spatial variability, reducing the influence of measurement error and transformation uncertainty is important. Note that two main ways to reduce the influence of measurement error exist: (i) high-precision measurement data, which are selected for constructing models of spatial variability of soil properties, such as cone penetration test (CPT) data [32-37]; (ii) reasonable data analysis, although the CPT data have high accuracy, measurement error is still inevitable. Therefore, it is crucial to reduce the impact of measurement error through subsequent reasonable data analysis. However, this has rarely been considered in previous studies. In the stationary random field model, the transformation of the mean value is clear, but the transformation process of SOF and COV is affected by many factors, including the trend of soil properties. The impact of soil property trends on SOF and COV has also been studied. For example, Tan et al. [38] reported that CPT data has an evident trend along the depth, and the trend components of CPT data significantly impact the calculation results of SOF. Sasanian et al. [39] demonstrated that compared with the linear function, removing the trend components of soil properties by quadratic function reduces the SOF in almost all cases. Oguz et al. [40] compared the SOF corresponding to the constant mean method and the trend method of data processing, and found that the trend method eliminates longer fluctuations. Phoon and Kulhawy [41] indicated that COV presented in the previous studies did not eliminate the potential trend in soil properties; such that, the reported COV may be much higher than the actual value. Nevertheless, few studies have reported the impact of estimation methods of trend components of soil properties on SOF and COV.

Presently, ordinary least square method (OLSM) is often used to estimate the trend components of soil properties [42-48]. However, OLSM is extremely sensitive to unusual data [49]. This is because the OLSM seeks the best function match of the data by minimizing the sum of squares of errors; such that under the effect of squares, the contribution of unusual data in function fitting increases. Due to the test precision and complex soil deposition processes, unusual data are common in soil property measurements. If the OLSM is used to estimate the trend components of soil properties, it is easily affected by unusual data, such that ill-conditioned trend estimation will be directly generated. In this case, least absolute deviation method (LADM) enables the best function matching of data by minimizing the sum of the absolute value of the error, which effectively accounts for the defect of the fitting principle of OLSM.

In this investigation, to identify soil layers, we collected CPT data from a highly heterogeneous region in Jiangsu Province, China. Taking CPT data as an example, we first compared the effectiveness of OLSM and LADM in estimating soil properties. Then, we analyzed the influence of the estimation method of trend components, function type of estimated trend components, and outliers on the SOF and COV via the control variable method. As such, based on the full consideration of the influences, we propose a program for calculating SOF and COV. Furthermore, we discuss the spatial variability characteristics of five major soil layers in the region.

## 2. Methodology of Spatial Variability Analysis

### 2.1. Trend Removal

Before establishing the stationary random field model, the trend components of soil properties can be removed to meet the requirements of weak stationarity, where the mean and variance are preset as constant, and the covariance function is only related to the distance between two points. Hereinafter, the measured value $g(z)$ can be divided into a trend component $t(z)$ that can be explained physically, and a residual $\epsilon (z)$ presenting the inherent variability, as shown in equation (1) [50]:

where $z$ is the measured depth.

Trend removal is not a simple task because it directly affects the quality of residuals reflecting the inherent variability of soil [42]. Curve fitting method with OLSM is usually used to realize this objective because of its simplicity. However, concerning the existence of absolute values, the LADM has plagued the mathematical community for over 200 y [51], but it has made a major breakthrough in recent years. The procedure of LADM mainly includes (i) introducing a perturbed solution (Δ*a*) and continuous iteration to solve the multivariate function and (ii) introducing an optimal parameter $a\u2217$ to minimize the sum of absolute values of errors and thereby to further improve the solution accuracy. The fitting function $f(x,a\u2217)$ includes *n* points ($x1$,$x2$,···, $xn$) within $m$ ($m$ is the number of data points) points to make $yi\u2212f(xi,a\u2217)=0$ ($i\u2208n$) true. According to step (i), finding these points and substituting them into the equation achieves the only optimal solution of LADM. The two steps are adopted by an optimization criterion. Note that the optimal solution (*a*) is an accurate solution applied in engineering. More details are present in Gu [52].

### 2.2. SOF and COV Calculation Method

After the trend component is effectively estimated, the COV can be calculated using equation (2) [53]:

where *n* is the number of data points; $zi$ is the $z$ value at point $i$; $g(zi)$ is the measured value of soil properties at $zi$; $f(zi)$ is the fitting function value of the trend component at $zi$.

There are two main calculation methods of SOF: (i) the autocorrelation function method based on random field theory, and (ii) the semi-variogram method based on geostatistics theory. In this investigation, the semi-variogram model is solved by residuals, and then transformed into the corresponding autocorrelation function model to realize the solution of SOF. In random field theory, the SOF (symbol is $\delta $) can be calculated by equation (3):

where $\rho (h)$ is the autocorrelation function, which is defined as the ratio of the covariance function to variance, as expressed by equation (4):

where $C(h)$ is the covariance function; $\mu $ and $\sigma 2$ are the mean and variance of property $X$, respectively; $Xi$ is the property value of $X$at $i$; and $E(\u22c5\u22c5\u22c5)$ is the expected value.

In geostatistical theory, the variogram is defined as the variance of the difference between the property values of X at points $i$ and $i+h$, half of which is called the semi-variogram $\gamma (h)$, as expressed by equation (5):

According to the second-order stability assumption, for any *h*, there is $E(Xi)=E(Xi+h)$, and $\gamma (h)$ can be expressed by equation (6):

Expand the square term of equation (6) and $\gamma (h)$ can be expressed using equation (7):

Combining equation (4), equation (7) can be simplified as follows:

equation (8) reflects the relationship between the autocorrelation function and semi-variation function.

The estimated value $\gamma \u2217(h)$ from the semi-variogram of the sample was calculated using equation (9) [54]:

where, $N(h)$ is the number of data pairs separated by sample spacing $h$ and $xi$ is the property value of sample $x$ at point $i$.

The mathematical expressions of the popular semi-variogram models and the corresponding estimations of the SOF are shown in Table 1. To study the influencing factors of SOF for control variables, OLSM is still used for $\gamma \u2217(h)$ fitting, and the fitting range is half of the maximum spacing of samples [48].

## 3. Description of Site

The project site is located in Guanyun County, Lianyungang City, Jiangsu Province, China (Figure 1). The site belongs to the coastal facies alluvial proluvial landform, with flat terrain and wide distribution of farmland and ditches. The ground elevation is approximately 2.09–4.10 m (refer to the elevation of the Yellow Sea), the relative elevation difference is 1.25 m, and the overall terrain is relatively open. The main types of groundwater in the study area are diving and confined water, among which the diving is mainly deposited in the upper silt layer, the water level is buried at a depth of 0.2–1.4 m, the elevation is generally about 1 m, and the water volume is small. Through sampling from more than 40 drill holes in the field and laboratory testing, we determined that the predominant soil types within a depth of 15 m in the study area were clay, soft and silty clays, and contained calcareous and ferromanganese nodules. According to standard GB 50021-94 [55], we determined the grade of the foundation of this site as grade I. In this study, we used 18 CPT data in the site.

## 4. Analysis of Influencing Factors

### 4.1. Soil Layer Division

Observed data of different soil layers usually do not have the same trend, and if the trend of observed data of all soil layers is retained, the SOF will be significantly magnified [56]. Therefore, establishing the stationary random field model hierarchically would enable the accurate evaluation of the spatial variability characteristics. According to standard TB 10018-2018E [57], the site was divided into different layers in the CPT profile. It should be noted that numerous thin soil layers were present in the site, which is detrimental to effectively describe the spatial variability characteristics of the properties. Additionally, if the layer is too thin compared to the probe size, the CPT data will be affected by the overlying and underlying layers as by the thin layer itself; consequently, the results will be unrepresentative of the layer. Therefore, thin layers cannot be described as individual layers using the CPT [53]. According to the standard TB 10018-2018E [57], a layer with thickness greater than 200 mm is layered separately, otherwise it is merged into a thicker layer. To confirm the consistency and reliability of the statistics, considering the 1st CPT borehole as an example, the five main soil layers in the site are shown in Figure 2. The distance between the *q*_{c} and *f*_{s} readings is 10 cm, and the reading interval of other CPT borehole data is also 10 cm.

### 4.2. Influence of Estimation Method of Trend Component

Figure 3 shows the estimation results of the trend components of *q*_{c} and *f*_{s} of the five main layers in the 1st CPT borehole based on both OLSM and LADM. The function curves obtained by OLSM in the profiles of *q*_{c} and *f*_{s} of the five main layers, which is closer to the data with greater square than that obtained by LADM. Especially for the greatly fluctuating data, whereby its appearance will be more apparent. We can further see that the linear function curve and quadratic function curve obtained by OLSM evidently deviate from most of the normal data in the profile of *q*_{c} of clay-1, meaning that the function obtained by OLSM cannot adequately describe the trend of *q*_{c} along the depth in clay-1. This confirms that the OLSM increases the contribution of large fluctuation data, influencing the estimation results of trend components and even resulting in the ill-conditioned estimation results of trend components.

In terms of quantitative evaluation, Figure 4 shows the corresponding mean absolute error (MAE) when using the OLSM and LADM to estimate the trend components of *q*_{c} and *f*_{s} of the five main layers of the 1st CPT borehole. The MAE value is an important metric to evaluate the prediction accuracy of a model, reflecting the average absolute difference between predicted and actual values. The lower the MAE value, the better the model fit. The MAE can be calculated using equation (10).

where $xi$ is the measured value on site and $f(xi)$ is the estimated value.

In the profiles of *q*_{c} and *f*_{s} of the five main soil layers, the MAE by LADM is smaller than that by OLSM at all times. It confirms that the function obtained by LADM can better characterize the *q*_{c} and *f*_{s} of the five soil layers in the 1st CPT borehole.

Figures 5-7 illustrate SOF calculated using the single exponential (SNX), squared exponential (SQX), and smoothed particle hydrodynamics (SPH) models for the 1st CPT borehole, respectively. In the profiles of *q*_{c} and *f*_{s} of the five main soil layers, SOF calculated by the SNX and SQX models is affected by the estimation method of the trend component. For 58.3% of all evaluations, SOF calculated by SNX, SQX, and SPH for estimating the trend component by OLSM is lower than that by LADM. This is attributed to the increased contribution of large fluctuation data when using OLSM to estimate the trend component, such that the error of trend component estimation is introduced into the residual reflecting the inherent variability of the soil, which destroys the original autocorrelation structure of the soil and leads to the decrease of SOF. Nevertheless, in 15% of all evaluations, it is the opposite. This may be because the OLSM cannot truly estimate the trend component, resulting in a certain trend in the residuals.

Figure 8 shows COV of the five main soil layers of the first CPT borehole. We can see that COV is also affected by the estimation method of the trend component. For 70% of all evaluations, the calculated COV by OLSM is lower than that by LADM. However, in 15% of all evaluations, it is the opposite. The uncertainty of the calculated COV is determined by the COV calculation formula, and as can be seen from equation. (2), COV is only related to the function of estimating the trend component.

Better trend component regression can obtain residuals that more truly reflect soil autocorrelation structure [44]. Since LADM can effectively avoid misleading large fluctuation data and better fit the trend of the soil properties along the depth, we conclude that the calculated SOF and COV can more representatively describe the spatial variability of soil properties. Unless otherwise specified below, the LADM was used to estimate the trend components of *q*_{c} and *f*_{s}.

### 4.3. Influence of the Function Type of Estimated Trend Component

To further clarify the influence of the function type of estimated trend component on the SOF and COV, SOF, COV, and MAE of *q*_{c} and *f*_{s} of the five main layers of the 1st CPT borehole are shown in Tables 2 and 3. From the results, the nonlinear calculation results are smaller than most of the linear ones, which is similar to Sasanian et al. [58] and Guo et al. [59]. It can also be seen from the table that the nonlinear MAE is smaller than the linear MAE, and a smaller MAE means a better fit, which again proves that the quadratic type can better estimate the trend component [60]. In addition, based on experience, it can also be seen that the SOF calculated by quadratic detrending is more reasonable. The reason why the SOF is large due to linear detrending is that the linear function is not completely detrending, resulting in the residual still containing a trend [44]. At the same time, in a few cases, the SOF is small due to the influence of multiple factors, which leads to the misestimation of the trend component of linear detrending, which destroys the autocorrelation structure of the residuals. To control the impact of the trend component estimation on the SOF and COV, selecting the quadratic function is essential.

### 4.4. Influence of Outliers

Despite many factors leading to outliers in CPT data, including man-made errors, temperature, and so forth, outliers in the data can be eliminated in combination with the formation conditions when analyzing the in situ data referring to the standard GB 50021-94 [55]. When establishing the stationary random field model of soil properties, the outlier usually corresponds to an abnormal residual, resulting in an abnormally large variance at the outliers and difficulty in satisfying the requirements of weak stationarity of random field theory. Moreover, the reliability of geotechnical engineering depends more on the spatial average characteristics than the soil characteristics of a specific point in space, providing a favorable premise to deal with outliers.

Taking the 1st CPT borehole as an example, the quadratic function by the LADM is selected for removing the trend components of *q*_{c} and *f*_{s} to obtain the residuals, where the outliers in the residuals are identified by the 3$\sigma $ (triple standard deviation) rule. Note that if the outliers still exist in the new residuals, the above steps should be repeated until all outliers are eliminated. The outlier identification is shown in Figure 3. The quadratic function curve obtained by LADM with and without outliers is very similar, indicating the good robustness of the LADM. The calculation results of SOF and COV before and after removing outliers (inTable 3), indicates that both SOF and COV are affected by outliers. The presence of outliers can overestimate COV, and the presence of outliers has an impact on SOF because the presence of outliers can lead to misestimation of the trend component of the data, which can affect the calculation results, especially in clay-1 and silty clay-1. Additionally, the outlier influence on SOF by the SNX, SQX, and SPH models is different, in that the influence of outliers on the SOF calculated by SNX model is considerably greater than that of the SQX and SPH models in silty clay-1. This is similar to the results of Tan et al.’s study [38].

The residuals after removing the trend component and processing the outliers could also be identified as weak stationary. When the *q*_{c} and *f*_{s} of the five major soil layers of the first CPT Borehole were taken as examples, the best functions by LADM were selected to remove the trend component and perform outlier processing to obtain residuals. Kendall’s Tau test was conducted for residuals shown in Table 4. The absolute values of Kendall’s Tau test statistics for the 1st CPT borehole were all smaller than the critical values, indicating weak stationarity at the significance level of 0.05(in Table 5).

## 5. Evaluation of Spatial Variability Characteristics

Through the above analysis, the calculation process of SOF and COV is summarized as follows:

Identify the units of different soil types in the CPT profile according to the standard TB 10018-2018E [57]. Divide soil unit with thickness greater than 200 mm into layers.

Use LADM to estimate the trend components of

*q*_{c}and*f*_{s}to obtain the quadratic functions.Remove the trend components by the best functions to obtain the residuals, and use the 3$\sigma $ rule to identify the outliers in the residuals. Eliminate the data with the abnormal residual to obtain the new

*q*_{c}or*f*_{s}, if there are outliers in the residuals. Repeat steps 2 and 3 until all outliers are eliminated.Perform Kendall’s tau test on the residuals after step 3, and the weak stationarity is evaluated when the significance level is 0.05.

Calculate the SOF using the SNX, SQX, and SPH models, and then the COV by equation.2.

Figure 9 demonstrates the statistical results of SOF. In 68.3% of all evaluations, the SQX model exhibited the highest SOF; in 64.4% of all evaluations, the SNX model exhibited the smallest SOF. Additionally, in 54.4% of all evaluations, the SOF of *q*_{c} is smaller than that of *f*_{s}, which is similar to Guo et al. [59]. Figure 10 illustrates the statistical results of COV. In 73.3% of all evaluations, the COV of *q*_{c} is smaller than that of *f*_{s}. To reflect the spatial variability characteristics of the site, SOF and COV are summarized more clearly in Table 6 and Table 7, respectively. The average SOF of silty clay-2 was the largest, while the average COV was the smallest, indicating that the average soil properties of silty clay-2 along the depth were more uniform than the other soil layers. The average SOF and COV of soft soil were the smallest except for silty clay-2, indicating that compared with those of the other soil layers, the average soil properties of soft soil along depth had the highest rate of vibration and the lowest amplitude around the trend, except silty clay-2. The average COV of clay-1 was the largest, and the average SOF of clay-1 was the smallest except for that of soft soil, indicating that compared with that of other soil layers, the average soil properties of clay-1 along depth had the highest amplitude and rate of vibration around the trend, except for that of soft soil.

## 6. Conclusion

First, we simultaneously discussed the effectiveness of the OLSM and LADM in estimating the trend component of soil properties, influence of the estimation method and function type of the trend component, and the effect of the outlier on the SOF and COV. Subsequently, we proposed a calculation program for SOF and COV to evaluate the spatial variability characteristics of the five main soil layers in the studied region. The main conclusions drawn are as follows:

Compared with the OLSM, LADM enables better estimation of the trend component of soil properties. The calculated SOF and COV using LADM can describe the spatial variability more representatively.

Both SOF and COV are affected by the estimation method of trend component, function type of the estimated trend component and outliers.

In most cases, the SOF of

*f*is relatively large compared with the_{s}*q*. Compared with SQX and SPH models, the COV of the SOF estimated by SNX model is higher._{c}In this case, the average soil properties of silty clay-2 along depth were the most uniform, the average soil properties of soft soil along depth had the highest rate of vibration and the lowest amplitude around the trend, except for silty clay-2, and the average soil properties of clay-1 along depth had the highest amplitude and rate of vibration around the trend, except for soft soil.

## Data Availability

Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request (data in the graph and the code of calculation, etc.).

## Conflicts of Interest

The authors declare that they have no conflicts of interest.

## Acknowledgments

This study is supported by the National Natural Science Foundation of China (No. 42172307) and Natural Science Foundation of Anhui Province (No. 2008085MD118).