Accurately predicting the development height of the water-conducting fracture zone (HW) is imperative for safe mining in coal mines, in addition to the protection of water resources and the environment. At present, there are relatively few fine-scale zoning studies that specifically focus on predicting the HW under high-intensity mining conditions in western China. In view of this, this paper takes the Yushen mining area as an example, studies the relationship between the water-conducting fissure zone and coal seam mining height, coal seam mining depth, hard rock scale factor, and working face slope length, finally proposing a method to determine the development height of the HW based on multiple nonlinear regression models optimized using the entropy weight method (EWM-MNR). To compare the reliability of this model, random forest regression (RFR) and support vector machine regression (SVR) models were constructed for prediction. The findings of this study showed that the results of the EWM-MNR model were in better agreement with the measured values. Finally, the model was used to accurately predict the development height of the hydraulic conductivity fracture zone in the 112201 working face of the Xiaobaodang coal mine. The research results provide a theoretical reference for water damage control and mine ecological protection in the Yushen mine and other similar high-intensity mining areas.

With the increase of energy demand and mining intensity, the change of geological conditions of coal seam roof overburden and the development of mining fissures caused by coal mining activities are the direct cause of damage to key underground aquifers and the root cause of ecological degradation in mining areas. Coal mining will lead to the destruction of the overlying rock layer, forming a water-conducting fracture zone consisting of fractured and caved zones. Once the HW communicates with the aquifer, overlying old air water, or surface water body during coal seam mining, forming water channel and causing water damage accidents, along with ecological and environmental damage issues such as soil erosion, vegetation death, and ground collapse [14].

In recent years, with developments in mining equipment and coal mining technology, high-intensity mining methods have gradually received increasing attention. With the shift of the development centre for coal resources to the northwest of China, its fragile ecological and geological environment, along with water shortages, must be focused on [57]. Large amounts of high-intensity mining will inevitably lead to issues regarding green and safe coal mining, in addition to groundwater resource protection [8]. Therefore, the accurate prediction of HW is still an important issue in current research. For several years, researchers have worked on the prediction of the development height of hydraulic fracture zones, achieving certain results. According to previous research results on the development height of water-conducting fissure zones, the main influencing factors of HW are the coal seam mining height (M), coal seam inclination angle (A), mining depth (D), working face width (W), advancing speed (S), and proportion coefficient of hard rock (b) [9, 10].

Currently, the methods used to determine the development height of hydraulic conductivity fracture zones are mainly field measurements [1113], empirical equation calculations [14], theoretical analyses [15, 16], and numerical simulation methods such as UDEC [17, 18], FLAC [19, 20], PFC [21], RFPA [22], and physically similar material simulations [2325]. Among them, actual field measurements have the highest accuracy, but are time consuming, laborious, and costly. The theoretical calculation method is too idealized and deviates greatly from the actual complex geological conditions. Similarly, material simulation methods require a high accuracy of material proportioning, which is difficult to achieve for complex geological conditions. The accuracy of numerical simulations is closely related to the geological parameters for which the model is built, which hinders the accuracy of the results. At present, HW calculation in China is mainly based on the “Code for the Preservation and Pressing of Coal Pillars in Buildings, Water Bodies, Railways, and Main Wells and Roadways.” The factors considered in the empirical formula proposed in this specification are only the mining height of the coal seam and the hardness of the overlying rock seam; the mining height of a single coal seam does not exceed 3 m, which is a single influencing factor to consider and is insufficient to reflect the comprehensive effect of multiple influencing factors, as shown in Table 1.

In recent years, some machine learning methods, such as decision trees (DT), support vector machines (SVM), random forest regression (RFR), artificial neural networks (ANN), and multiple regression analysis (MNR), have gradually become mainstream for predicting the development height of hydraulic fracture zones and have improved the accuracy of prediction to a certain extent. For example, He et al. [26] predicted the height of an HW under longwall mining conditions using a multiple regression approach, which effectively reflected the relationship between the HW and different mining conditions. Further, Zhao and Wu [27] proposed a prediction method for HW based on RFR. However, the significant diversity and complexity of geological conditions in China’s mining areas lead to differing degrees of influence of mining on the development height of HW in different regions; the large range of study areas used in previous prediction models have reduced the prediction accuracy to a certain extent, and the adaptability of the prediction methods is low. Therefore, the current prediction model needs to be improved to carry out fine zoning predictions with improved accuracy [26, 28].

In this study, the relationship between HW and M, W, D, and b was analysed, and the EWM-MNR prediction model was established, taking the Yushen mining area in the north of Ordos as the study area.

To verify the validity of the EWM-MNR model, it was applied to the Xiaobaodang coal mine in the Yushen mining area. In addition, RFR and SVR models were constructed to facilitate a comparison with the EWM-MNR model. The results show that the EWM-MNR model has good prediction performance, and the prediction results are in good agreement with the field measured data, which verifies the feasibility and accuracy of the proposed method.

2.1. Overview of the Study Area

The Yushen mining area is located in the middle of the Jurassic coalfield in northern Shaanxi and is one of the main mining areas of the northern Shaanxi coal base. The total area of the mine is about 5,265 km2, which borders the Maowusu Desert and the Loess Plateau. It has an average annual rainfall of about 400 mm and an average annual evaporation of about 2,000 mm, with a fragile ecological environment and a shortage of water resources (Figure 1(a)).

The coal seam in the Yushen mining area has good conditions, excellent coal quality, large reserves, a simple geological structure, and the dip angle of the seam is <10°. The burial depth of the main mining seam in the area is generally greater than 100 m, and the further to the west, the greater the burial depth; the highest depth reaches more than 500 m, and the average recoverable coal thickness is 6.50 m. The surface water system in the area is relatively developed and primarily comprises the Kuye River and its tributaries in the northeast, the Tuwei River and its tributaries in the middle, and the Yuxi River and its tributaries in the southwest. According to the groundwater fugacity conditions and hydraulic characteristics, the water-bearing rock formations in the study area can be divided into two categories: the Sala Wusu group and the sandstone aquifer, as shown in Figure 1(b).

The Sala Wusu group aquifer is the only groundwater resource with a large-scale water supply and ecological significance in the study area under natural conditions. The region bears multiple strategic responsibilities for energy supply, water conservation, and ecological protection and restoration.

2.2. Measured Data Collection

To study the relationship between the HW and M, W, D, and b after coal seam mining in the Yushen mining area (Figure 2), among them, the proportion coefficient of hard rock (b) refers to the ratio of total thickness of hard rock strata to the estimation height of WCFZ above the coal seam [26]. The calculation equation is as follows:
(1)b=h28M,
where b is the proportion coefficient of hard rock, M is the mining height, and h is the sum of the thickness of the hard rock strata within the statistical height.

Through consulting literature [29, 30] and actual field investigation, 20 sets of measured program data for the HW and other relevant factors were collected from some coal mines in the Yushen mining area, as shown in Table 2.

Regression analysis is a method that establishes mathematical relationships between statistical observations, explaining changes in the dependent variable by changes in the independent variable. It helps predict the possible values of the dependent variable by using the values of the independent variable [31]. The MNR model is expressed as shown in Equation (2).
(2)y=α0+α1fx1+α2fx2++αnfxn+β,βN0,σ2,
where y is the dependent variable, corresponding to HW; x1,x2,xn are the independent variables, corresponding to M, D, W, and b, respectively; αn is the regression coefficient; β is the random error.
The least-squares estimation method was used to solve for the regression coefficients; the procedure for which is as follows:
(3)fxi=i=1nyiy12=i=1nyiα0α1x1iα2x2iαnxni2,
where the independent variables are x1i,x2i,xni, and the dependent variable yi is all specifically known observations. To find the regression coefficients, the derivatives of α1,α2,αn are derived, and the first-order derivatives are set to zero to obtain the set of equations for the regression coefficients.
(4)L11α1L12α2++L1nαn+=L1y,L21α1L22α2++L2nαn+=L2y,Ln1α1Ln2α2++Lnnαn+=Lny,
where Lij=Lji=xijx¯1xijx¯j; Liy=xijx¯1yiy¯1. Since Li1,Li2,Lin,Liy, (i=1,2,n) are known, xij,yi,x¯1,y¯1i=1,2,n are known; α1,α2,αn are n unknowns with n equations. Therefore, α1,α2,αn can be solved using the determinant method or elimination method, which leads to a value for α0.
In this study, after establishing the MNR model, the entropy weighting method, from information entropy theory, is used to calculate and determine the weight coefficients of each factor to improve the regression coefficients of the proposed MNR model, which is an objective weighting method. The specific calculation steps are as follows:
(5)Ej=1lnni=1nPijlnPij,Pij=xiji=1nxij,
where Ej denotes the entropy value of the jth evaluation attribute, and Pij is the weight occupied by the value of the ith evaluation indicator under the jth indicator. xij is the value of the jth coefficient of group i in the data of Table 2. There are 20 data sets and four influencing factors in this study; therefore, the maximum value of i is 20, and the maximum value of j is 4. In addition, we specify that, when Pij=0 or Pij=1, PijlnPij=0.
The weight coefficients of each influencing factor are calculated using the formula
(6)Wj=1Ejj=1n1Ej,
where Wj denotes the weight coefficient of the jth influencing factor.
Therefore, the multiple nonlinear regression model based on the entropy method proposed in this study is as follows:
(7)y=α0+w1α1fx1+w2α2fx2++wnαnfxn+β.

In this study, a multiple nonlinear regression model was used to determine the relationship between the development height of the hydraulic fracture zone and other influencing factors; finally, the corresponding relationship equation was derived.

To establish the prediction model, the data in Table 2 were divided into two groups: 80% of the data was used as training samples to establish the prediction model, and 20% of the data was used as test samples to test the prediction model. Nos.1-16 in Table 2 are used as training samples, and Nos.17-20 are used as training samples.

Finally, the accuracy of the prediction model established in this study was verified through a comparative analysis and field engineering applications.

4.1. EWM-MNR Model

To explore the correlation between HW and M,W, D, and b, a single-factor regression analysis was conducted using SPSS software to establish 11 basic models for HW and other influencing factors, as shown in Figure 3. There is a more significant direct relationship between HW and the regression variables, except for b.

To eliminate the interference of M on this factor of b, the ratio of HW to coal seam mining height (H/M) is introduced. The larger the value of H/M, the larger the value of HW under the same coal seam mining height conditions.

Figure 3 shows the relationship between HW and M, W, D, and b. As shown in Figures 3(a)–3(c), HW increases with M, D, and W, respectively; however, the rate of increase decreases and tends to level off. As shown in Figure 3(d), there is no direct correlation between HW and b. Figure 3(e) shows that H/M is positively correlated with b, and the rate of increase gradually increases after the hard rock lithology coefficient reaches about 0.65.

Table 3 presents the R2 and significance (sig) of the coefficient of determination of HW with other single factors in each of the 11 models. Based on the R2 and sig of each model, the optimal relationship between HW and other single-factor regression variables was obtained. The specific relationship between HW and each single-factor regression analysis is shown in Equation (8).
(8)HW=223.44522.71M,HW=e5.49208.46/W,HW=e5.40165.84/D,HW=96.12b3+52.12b2+52.12bM+20.06.
Based on the results of HW and one-way regression analysis, the EWM-MNR model was finally proposed, as shown in
(9)HW=0.03e5.49208.46/W+0.44e5.40165.84/D+61.56b381.40b2+50.50bM38.08M3.09,
where HW is the predicted value of the height of the hydraulic fracture zone, M is the coal seam mining height, D is the coal seam mining depth, W is the slope length of the working face, and b is the proportion coefficient of hard rock.

4.2. Error Analysis

Predictive model accuracy assessment is an important step to complete prior to model application. To further verify the accuracy of the EWM-MNR model, the RFR model and the SVR model were established using the same training samples in a Python environment. Figure 4(a) shows the reliability of each model using the canonical equations shown in Table 1, EWM-MNR, RFR, and SVR models to obtain the predicted values of different methods and evaluate the reliability of each model using two evaluation metrics: the coefficient of determination (R2) and root mean square error (RMSE).

Among these metrics, the RMSE can reflect the difference between the measured and predicted values. The smaller the RMSE, the better the performance of the model. The magnitude of R2 determines how closely the independent variable is related to the dependent variable and ranges from 0 to 1. The closer R2 is to 1, the better the fit of the prediction model and vice versa. These two indicators are defined as follows:
(10)RMSEHW,HW=1ni=1nHWHW2,(11)R2HW,HW=1i=1nHWHW2/np1i=1nHWH¯W2/n1,
where HW, H¯W, and HW are the measured value, measured average value, and predicted value of the water-conducting fracture zone, respectively; M is the mining height; D is the mining depth; W is the working face width; b is proportion coefficient of hard rock.

The RMSE and R2 of different models were calculated according to Equations (10) and (11), respectively, and the calculation results are shown in Table 4.

The EWM-MNR model has an R2 value of 0.97 and 0.96 for the training and validation samples, and an RMSE of 5.51 and 5.09, respectively. The RFR model has an R2 value of 0.73 and 0.89 for the training and validation samples and an RMSE of 15.29 and 6.90, respectively. The SVR model has an R2 of 0.82 and 0.85 for the training and validation samples and an RMSE of 12.59 and 8.10, respectively.

Figure 4(b) shows the residual values for the different methods. It can be seen that the error values for the EWM-MNR model range from -12.70 m to 7.01 m, with an average absolute error value of 4.45 m. The error values of the RFR model range from -37.53 m to 37.91 m, with an average absolute error value of 9.13 m.

The error values for the SVR model range from -33.13 m to 25.43 m, with an average absolute error value of 8.31 m. The error values for the corresponding medium-hard first formula in Table 1 range from -116.92 m to -3.25 m, with an average absolute error value of 64.57 m. The corresponding error values for the medium-hard second formula in Table 1 range from -104.73 m to 0.26 m, with an average error value of 58.57 m.

The abovementioned results show that the predicted values of the EWM-MNR model proposed in this study are very close to the measured values of the training and validation samples, with lower RMSE and higher R2 values, which indicate a better prediction performance than the RFR model and SVR model. This shows that the model is more suitable for HW prediction under high-intensity mining conditions in the Yushen mine. In addition, the prediction model proposed in this study will be continuously updated in the future, with a view to make the model more widely applicable.

5.1. Overview of Working Face 112201 and Two Investigation Drillholes

The 112201 working face of the Xiaobaodang No. 1 Coal Mine in Yushen Mining District is the first mining face of this coal mine. The length of the working face is 4660 m, the width of the working face is 350 m, the dip angle of the coal seam is 1°, the mining height of the coal seam is 6 m, and the coal is recovered using the comprehensive mining long wall method. 2-2 coal in the working face is located at the top of the fourth section of the Yan’an Group, which is the thickest recoverable coal seam in the area. The ground elevation is 1283–1330 m, the burial depth of the 2-2 coal is 300–400 m, and the bottom elevation of the coal seam is +930 m–+970 m. To accurately detect the height of the WCFZ, two holes were drilled in the mined area of 112201 working face. Figure 5 shows the locations of drill holes D1 and D2.

5.2. Application of the Prediction Model

Predicting the height of the WCFZ before mining is necessary. To better verify the accuracy of the prediction model, the EWM-MNR model, RFR model, and SVR model were used to predict the development height of the hydraulic conductivity fracture zone of D1 and D2 drill holes before the recovery of the 112201 working face. The prediction results of the different models are shown in Table 5.

5.3. Field Measurements

During mining, the height of WCFZ at the 112201 working face was observed using a combination of drilling fluid loss monitoring and downhole color TV records.

Figure 6 shows the flushing fluid leakage and in-hole TV detection results during the construction of different boreholes. From the D1 flushing fluid loss in Figure 6(a), it can be seen that the top boundary of the HW is 134.80 m, and the in-hole TV detection results show that 138.30 m is the top boundary of the HW in this borehole. The D2 flushing fluid loss in Figure 6(b) shows that the top boundary of the HW is 143.58 m, and the in-hole TV detection result shows that 142.18 m is the top boundary of the HW in this borehole.

When recording the flushing fluid leakage in segments during drilling, there is an error in terms of observation lag, which leads to certain deviations in the recorded flushing fluid leakage location. The TV detection in the borehole involves the continuous real-time observation of the entire borehole section, and it can intuitively locate and quantitatively describe the fracture development and distribution position inside the rock body with high accuracy. Therefore, the HW of the in-hole TV detection was finally adopted.

The measured height of the HW is calculated as follows:
(12)H=E1E2,HW=HH1,
where H is the burial depth of the coal seam, E1 is the ground elevation of the borehole, E2 is the elevation of the top plate of the corresponding coal seam, and H1 is the vertical distance from the HW to the ground.

Finally, the H1 of the HW top boundary height of the D1 borehole is found to be 123.30 m. The H1 of the HW top boundary height of the D2 borehole is 142.18 m. The D1 borehole E1 is 1281.63 m, and the corresponding E2 is 981.26 m, which can be substituted into Equation (12) to obtain the D1 borehole HW=1281.63981.26123.30=177.07m. The E1 of drill hole D2 is 1288.76 m, and the corresponding E2 is 987.80 m, which can be substituted into Equation (12) to obtain the measured value of the drill hole D2 HW=1288.76987.80142.18=158.78m. A comparison of the field measured results of the D1 and D2 boreholes with the prediction results of different models is shown in Table 6.

The comparison between the field measured results and those from different models show that the relative errors between the predicted and field measured values of the EMW-MNR model proposed in this paper are 7.93% and 1.00%, the relative errors between the predicted and field measured values of the RFR model are 23.75% and 15.06%, and the relative errors between the predicted and field measured values of the SVR model are 20.00% and 8.52%, respectively. This indicates that the results of the proposed EMW-MNR prediction model agree with actual models more than the results of other prediction models.

To ensure the safe operation of coal mines, this study proposed a HW prediction method based on the EWM-MNR model that is applicable to the high-intensity mining conditions in the Yushen mining area, and then applied the model to the Xiaobaodang coal mine in the Yushen mining area. The main conclusions of this study are as follows:

  • (1)

    The EWM-MNR model was proposed using 20 sets of measured data from the Yushen mining area, and the indicators affecting HW in this model mainly include four regression variables, M, D, W, and b. This model improves the prediction accuracy and stability of the HW

  • (2)

    The RFR and SVR models were used to compare the accuracy of the EWM-MNR model using the same training data. The RMSE of the training and test samples of the EWM-MNR model were lower, at 5.51 and 5.09, and the R2 values were higher, at 0.97 and 0.96, with RMSEs of 5.51 and 5.09. In contrast, the R2 of the coefficients of determination of the RFR model were 0.73 and 0.89, with RMSEs of 15.29 and 6.90, respectively. The corresponding R2 values for the SVR model were 0.82 and 0.85, and the RMSEs were 12.59 and 0.85

  • (3)

    The prediction model proposed in this paper was applied to the 112201 working face of the Xiaobaodang coal mine, and the predicted and field measured values for the HW were 164.51 m and 177.07 m for hole D1, respectively, representing a relative error of 7.10%; the predicted and field measured values for hole D2 were 157.16 m and 158.78 m, respectively, reflecting a relative error of 1.00%. The accuracy and applicability of the prediction model in the high-intensity mining area were further verified

  • (4)

    The field measurement results of the 112201 working face show that the EWM-MNR model proposed in this paper can better predict the HW under high-intensity mining conditions. The prediction model has important guiding significance for the synergistic issues of water damage control and groundwater resource protection in western high-intensity mining areas

The experimental test data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that there are no conflicts of interest regarding the publication of this article.

D. F, E. H, and X. X conceptualized and designed the study. D. F and P. H contributed for the critical interpretation of the data. Manuscript drafting and critical revision prior to the submission were performed by all authors.

This study was sponsored by the National Natural Science Foundation of China (No. 42177174), the Basic Research Program of Natural Science of Shaanxi Province (2020ZY-JC-03), and the Shaanxi Province Joint Fund Project (2021JLM-09).

Exclusive Licensee GeoScienceWorld. Distributed under a Creative Commons Attribution License (CC BY 4.0).