## Abstract

The accurate forecasting of oil field production rate is a crucial indicator for each oil field’s successful development, but due to the complicated reservoir conditions and unknown underground environment, the high accuracy of production rate forecasting is a popular challenge. To find a low time consumption and high accuracy method for forecasting production rate, the current paper proposes a hybrid model, Simulated Annealing Long Short-Term Memory network (SA-LSTM), based on the daily oil production rate of tight reservoirs with the in situ data of injection and production rates in fractures. Furthermore, forecasting results are compared with the numerical simulation model output. The LSTM can effectively learn time-sequence problems, while SA can optimize the hyperparameters (learning rate, batch size, and decay rate) in LSTM to achieve higher accuracy. By conducting the optimized hyperparameters into the LSTM model, the daily oil production rate can be forecasted well. After training and predicting on existing production data, three different methods were used to forecast daily oil production for the next 300 days. The results were then validated using numerical simulations to compare the forecasting of LSTM and SA-LSTM. The results show that SA-LSTM can more efficiently and accurately predict daily oil production. The fitting accuracies of the three methods are as follows: numerical reservoir simulation (96.2%), LSTM (98.1%), and SA-LSTM (98.7%). The effectiveness of SA-LSTM in production rate is particularly outstanding. Using the same SA-LSTM model, we input the daily oil production data of twenty oil wells in the same block and make production prediction, and the effect is remarkable.

## 1. Introduction

The forecasting of the oil and gas production rate is one of the most important and effective evaluation indicators for measuring the success of reservoir development, and it plays a crucial role in dynamically predicting the oil and gas production rate during the development process. However, due to the geological factors of the reservoir and the construction factors during the development process, oil and gas production rate forecasting has become more complex, and the dynamic characteristics cannot be well described, resulting in the subsequent production rate forecasting being affected [1-4]. There are various methods for forecasting oil and gas production rate, including the Arps decline method explored by the production rate decline law, the analytical model method based on the permeability law and the material balance equation, and the numerical simulation method based on the geological model constructed using geological data [5, 6]. Conventional oil and gas production rate dynamic forecasting generally uses numerical simulation methods, which can comprehensively consider various geological factors, wellbore interference, and the impact of multiphase flow on oil and gas well production rate. However, for unconventional reservoirs such as tight oil reservoirs, there are challenges in accurately fitting production rate history and forecasting future production rate due to issues such as lithology, lateral connectivity, vertical distribution, rapid changes in microstructure, and strong heterogeneity of the reservoir [7, 8].

With the continuous development of artificial intelligence technology, artificial intelligence gradually plays a very important role in the process of oil and gas reservoir development, making it possible to solve some complex problems in oil and gas reservoir development using machine learning techniques [9-12]. Recurrent neural network (RNN) was proposed for oil and gas production rate forecasting, while its limitations were also exposed. RNN can only connect the production rate data of the previous sequence to the current data. When the interval between production rate data and forecasting data is very small, RNN has excellent learning ability and matching. However, when complex situations are introduced, such as when water injection and energy supplement stages are performed during the production rate process, causing changes in production rate volume, the interval between neural units becomes very large due to such related information, and RNN will lose its learning ability and fail to make accurate and effective forecasting. The essence of RNN is likely to be influenced by short-sequence production rate data. Once the sequence of production rate data increases, RNN is difficult to transfer information from early production rate data to later production rate data. With the inaccuracy of RNN in long sequence production rate forecasting being determined, a variant of RNN, called Long Short-Term Memory (LSTM), was proposed. LSTM was originally proposed by Hochreiter and Schmidhuber [13, 14] and was improved and promoted by Graves [15], making LSTM widely used in various applications. By deliberately designing to avoid the problem of long-term dependence, LSTM can effectively store important information in the production rate phase through deep learning according to the actual production rate data pattern of long and short sequences. Regarding LSTM in production rate forecasting, many scholars have done research. For example, Zha et al. [16] forecasted the gas field production rate based on the production rate of gas and water in the gas field using a hybrid model of CNN-LSTM, which strengthens the feature extraction ability of CNN and uses LSTM to predict and learn time series production rate data. However, their data are monthly data, with a small amount of data and no hyperparameter tuning, which risks model overfitting. Ning et al. [17] constructed ARIMA, LSTM, and Prophet models. After forecasting and comparing the production rate well output, they found that the forecasting performance of ARIMA and LSTM models was superior to that of Prophet, proving the superiority of ARIMA and LSTM in predicting oil production rate. Although ARIMA can predict production rate based on time series, it has a major drawback that it can only capture linear relationships, not nonlinear relationships [18, 19]. There are many variables in the production rate process that cause the actual production rate to exhibit nonlinear relationships due to large fluctuations in the rules. Similarly, their data are monthly data for each year, which is too stable and does not match the actual production rate. The amount of data is small, the hyperparameters have not been effectively adjusted, and the model has overfitting risks. Aranguren et al. [20] developed a shale reservoir production rate forecasting model based on LSTM using machine learning methods and proposed that the complexity of the LSTM model cannot bring about an increase in forecasting accuracy but did not explain the specific model structure improvement method. Huang et al. [21] used the LSTM model to optimize the production rate forecast of the carbonate reservoir WAG drive and compared the calculation time difference between the LSTM model and the conventional reservoir numerical simulation method. However, the LSTM model exhibited underfitting during the forecasting process. For parameter optimization of the model, they only changed a single hyperparameter to predict the model but did not achieve the joint optimization of the hyperparameters. Therefore, the optimization algorithm of hyperparameters is important in the LSTM model.

There are many types of algorithms for hyperparameter optimization of artificial intelligence models, and common optimization algorithms include genetic algorithm (GA), particle swarm optimization (PSO), and simulated annealing (SA). However, each optimization algorithm targets a different optimization range. For example, GA and PSO algorithms can optimize the model globally, but this optimization range can cause GA to easily fall into local extreme solutions [22], while the poor local search capability of PSO algorithm results in low search accuracy and ineffective determination of the global optimal solution [23]. The emergence of SA solves the problem of local optimization of the model. SA algorithm can achieve the characteristic of a large search range in the early stage and a small search range in the later stage [24], which ensures that the SA algorithm can avoid falling into local optima in the early stage and focuses on the reliability of local search in the later stage.

This paper will introduce a production rate forecasting model for tight oil reservoirs. Considering the heterogeneity of the reservoir and the fluctuations in actual production rate, both conventional reservoir numerical simulation methods and the SA-LSTM time series model prediction method are used to forecast the actual daily oil production rate of a single well. The contributions of this paper are as follows:

The LSTM model is used to predict the actual oil production rate of single wells in tight oil reservoirs.

Using SA to optimize the hyperparameters of the LSTM model in order to improve its accuracy in predicting the actual daily oil production rate.

Comparison between conventional numerical simulation method and SA-LSTM forecasting model.

This paper will be divided into four parts. In the methodology section, we will introduce the basic principles of the LSTM models and SA algorithms and provide a detailed description of the process of constructing the SA-LSTM model. The data description section will provide a detailed description of the data source and the statistical description of the data. The results and discussion section will present the reservoir numerical simulation results, SA optimization distribution, and SA-LSTM model forecasting results and compare the advantages and disadvantages of the conventional reservoir numerical simulation method with the SA-LSTM forecasting model. The conclusion will be given in the final part of the paper.

## 2. Methodology

### 2.1. Long Short-Term Memory

LSTM is a special type of RNN. Conventional RNNs propagate weights through multiplication, which leads to weights approaching zero as data propagates over long periods of time. LSTM networks, however, have a “gate” structure that allows neurons to selectively delete or add information. This gate structure can effectively control the passage of relevant information. Data weights are propagated through a combination of multiplication and addition. LSTM networks include three types of gates: the forget gate, the input gate, and the output gate [25-28]. The forget gate and input gate are applied to the neural cell of the LSTM network. The forget gate controls whether information is retained or forgotten in the cellular state of the previous time step, determining how much information from the previous time step can be passed through the forget gate to the next time step. In the process of oil and gas production forecasting, the forget gate can assist the neural cell in deciding which past production data should be forgotten, especially those that have no impact on the current prediction. This helps reduce the influence of past production on the current prediction. On the other hand, the input gate controls the degree of update to the neural cell based on the input data at the current time step, determining how much information from the current input data will be added to the neural cell. In oil and gas production forecasting, the input gate can control the importance of the current production data to the neural cell. If the current production data is highly important for predicting future production changes, the input gate can allow more information to flow into the neural cell in order to better capture these influences. Through the regulation of the forget gate and the input gate, the LSTM model can retain crucial information in long-term time series and automatically learn and adapt to long-term dependencies in the data [17].

Compared with RNN neuron cells, the LSTM model neuron structure shown in Figure 1 shows the complexity of the internal design of its neurons.

There are two kinds of neuron cell states in LSTM: $ct$ and $ht$. The cell state, $ct$, traverses the entire cell, ensuring the complete flow of unchanged information throughout the entire neural network. The memory state of $ct$ stores learned relevant information as long-term memory and passes it on through updates. On the other hand, $ht$ represents the hidden state of the previous cell and is also known as the output state. It can be seen as the short-term memory of the previous neural cell for the entire LSTM model. In Figure 1, the region marked by the red dashed line represents the forget gate in LSTM. The forget gate determines which pieces of information need to be discarded by a single cell. It is activated by detecting the connection between $ht-1$ and the input $xt$ at time *t* and outputs a vector ranging from 0 to 1 through a sigmoid unit. When the output vector’s value approaches 0, it indicates that the input information is irrelevant and should not be retained (equation (1)).

In the equation, $wf$ and $bf$ are the weight matrix and bias vector in the forgetting gate, respectively. $\sigma $ represents the sigmoid activation function $ht-1$, which is the hidden information of the previous neural unit, and $xt$ represents the input information.

Through the forgetting of irrelevant information, the next step is to add new information to the neuronal cell state through the input gate (yellow dotted line area in Figure 1) and use $ht-1$ and $xt$ to obtain new cell information $ct`$ through a $tanh\u2061$ layer (equation (3)).

After determining the new cell information $ct`$, we began to update the old cell information $ct-1$ output from the previous neural unit. The irrelevant part of the old cell information was forgotten by the forgetting gate selection, and the new cell information $ct`$ (the green dotted line area in Figure 1) was added by the input gate selection to make it a new cell information $ct$ (equation (4)).

After updating the cell state, the cell output gate (blue dotted line area in Figure 1) is judged and output, and $ht-1$ and $xt$ are input into the output gate (equation (5)). The judgment condition is obtained through the sigmoid layer, and then the input $tanh$ layer is multiplied one by one to output into the next neuron cell (equation (6)).

A complete LSTM model is formed by a series of updates to the cell state through the forgetting gate, input gate, and output gate. Because of its constantly updated cell information $ct$, LSTM enables it to filter irrelevant information through the forgetting gate and enter the cell through the input gate. The new information is stored in memory for a long period after simple linear calculation.

### 2.2. Simulated Annealing

Compared with the other four algorithms (Table 1), the SA algorithm has the following advantages:

#### 2.2.1. Global Optimization

SA is known for its ability to evade local optimizations, making it suitable for finding global solutions in complex and multimodal optimization problems. This is particularly advantageous when compared with gradient descent and PSO, which can fall into local minima.

Optimization algorithm type | Advantages | Disadvantages |
---|---|---|

Gradient Descent | Easy to implement | Prone to get stuck in local minima |

Genetic Algorithm | Provides a diverse set of solutions | Slow convergence speed |

Simulated Annealing | Avoiding local minima | Lack of effectiveness for high-dimensional problems |

Particle Swarm Optimization | Optimized for multimodal problems | Sensitivity to initial parameters |

Ant Colony Optimization | Adapt to dynamic environments | May not always find the global optimum |

Optimization algorithm type | Advantages | Disadvantages |
---|---|---|

Gradient Descent | Easy to implement | Prone to get stuck in local minima |

Genetic Algorithm | Provides a diverse set of solutions | Slow convergence speed |

Simulated Annealing | Avoiding local minima | Lack of effectiveness for high-dimensional problems |

Particle Swarm Optimization | Optimized for multimodal problems | Sensitivity to initial parameters |

Ant Colony Optimization | Adapt to dynamic environments | May not always find the global optimum |

#### 2.2.2. Probabilistic Exploration

The probabilistic exploration mechanism of SA allows it to explore a wide range of solutions and avoid premature convergence. This is in contrast to GAs, which may converge more quickly to a subset of the solution but may miss other viable solutions.

#### 2.2.3. Fewer Hyperparameters

SA generally requires fewer hyperparameters to adjust compared with GAs and PSO. This simplicity can be an advantage in practice.

In summary, SA is particularly advantageous when dealing with complex optimization problems, where escaping local optimality is crucial. Its probabilistic exploration and simplicity in hyperparameter tuning make it a valuable optimization method, especially in cases where other algorithms may struggle to find a global solution.

During the training process, the weights are randomly initialized, meaning that they are not biased toward any particular input. The changes in the hyperparameters of the model significantly affect the degree of loss during training. The learning rate determines the magnitude of weight updates during training. A high learning rate can make the model converge to optimal weights in less time, but an excessively high learning rate can cause the model to jump around and fail to reach the optimal point accurately. A low learning rate can effectively control the training process to reach the optimal point. The choice of learning rate has a significant impact on the model’s accuracy. The decay rate determines the speed at which the learning rate decays with each update. As we know, the learning rate gradually decreases with the number of iterations. The decreasing learning rate can accelerate the training process. Therefore, it is necessary to determine an optimal decay rate for the optimized learning rate. In machine learning, there is a parameter called batch size, which determines the number of data samples used for training in each iteration. The optimal batch size determines whether the training weight values tend toward accurate values. To minimize the loss and reach the global optimum within an interval during the training process, the SA algorithm is used to optimize these three hyperparameters: learning rate, decay rate, and batch size.

The SA algorithm consists of two parts: the annealing process and the Metropolis algorithm, which correspond to the outer and inner loops of the algorithm, respectively. First, the outer loop, the annealing process, interprets the entire optimization process as a solid-state annealing process. The solid is initially heated to a high temperature *T _{0}* and then cooled according to the selected cooling factor alpha. When the temperature reaches the final temperature

*T*, the entire annealing process ends. The inner loop is the Metropolis algorithm, where

*L*iterations are performed at each temperature, and the minimum energy or optimal value at that temperature is found [29-32].

The SA basic process (Figure 2), briefly can be expressed as:

When the temperature state $T$ is given, the current state is that the hyperparameter value is $x$, and a neighborhood range is set near $x$ so that $x`=x$. At this time, $x`$ is a new state. The energy in the two states, that is, the corresponding target model loss values, is $fitness(x)$ and $fitness(x\u2032)$.

When $fitness(x\u2032)$ is less than *fitness (x)*, accept $x`$ as the current state can continue annealing calculation.

When *fitness (x′)* is greater than *fitness (x*), the probability $p$ (equation (7)) is used to determine whether it is greater than $r$ to accept the new state as the current state. Where $r$ is a random number between (0, 1), and $T$ represents the initial temperature.

After determining the current state, return to the original path to continue to cool down and iterate until the equilibrium state is searched for the optimal value within the specified range.

## 3. Model Parameters

### 3.1. Simulated Annealing Long Short-Term Memory

#### 3.1.1. Forecasting Model

Figure 3 shows the entire process of daily oil production rate forecasting, including the screening of model data, the optimization of hyperparameters by SA, and LSTM.

#### 3.1.2. Source and Processing of Production Rate Data

The data used in this study come from daily production rate data of a single well in the horizontal fractured interlayer asynchronous injection-production rate well in a super low permeability reservoir block in Changqing, China. After organizing the data, equation (8) will be used to normalize the daily production rate data, and the normalized data can effectively accelerate the convergence speed of the model.

After the normalization of the production data, the production data can be effectively classified by date. The daily oil production data of a single well in the reservoir were divided into a training set and verified at a ratio of 5:1, and the training and prediction were carried out. After the model was trained, the model was saved. The first 100 days of standardized production data are substituted as the input data, that is, the training data, and the model is started to forecast and predict the next 500 days of daily oil production.

#### 3.1.3. Model Parameters

To build an LSTM model and apply it to the predicting and training of real-world data, it is necessary to define the basic parameters of the internal neural network model of the LSTM. The initial parameters are as follows (Table 2):

Parameter name | Initial parameter value |
---|---|

Number of LSTM layers | 2 |

Neurons number in the first LSTM layer | 32 |

Neurons number in the second LSTM layer | 64 |

Dense layers numbers | 2 |

Neurons number in the first dense layer | 32 |

Neurons number in the second dense layer | 1 |

Dropout rate | 0.4 |

Epochs | 50 |

Initial learning rate | 0.001 |

Initial decay value of Adam optimizer | 0.9 |

Initial batch size | 16 |

Parameter name | Initial parameter value |
---|---|

Number of LSTM layers | 2 |

Neurons number in the first LSTM layer | 32 |

Neurons number in the second LSTM layer | 64 |

Dense layers numbers | 2 |

Neurons number in the first dense layer | 32 |

Neurons number in the second dense layer | 1 |

Dropout rate | 0.4 |

Epochs | 50 |

Initial learning rate | 0.001 |

Initial decay value of Adam optimizer | 0.9 |

Initial batch size | 16 |

After establishing the LSTM model, in order to find the optimal hyperparameters to achieve the best performance of the model’s forecasting, the SA algorithm is used to optimize the three hyperparameters of learning rate, decay, and batch size in the model. According to the optimization criteria, in order to speed up the convergence of the model and improve the training accuracy, the learning rate is optimized in the range of 1e^{-6} to 1e^{-3} in order to prevent the overfitting problem (a larger attenuation coefficient can effectively reduce the complexity of the model) and to prevent the gradient explosion and balance training caused by too large parameters. By comparing the loss degree in model training and prediction when the lapse rate is 0.1, 0.3, 0.5, 0.7, 0.9, and 0.99, Figure 4 shows that when the lapse rate is below 0.9, there is the problem of training overfitting. A higher attenuation coefficient between 0.9 and 0.99 is selected for control. The whole decreasing step is an exponential decreasing process. A value of 0.9 means that the first-moment estimate is updated with a moving average factor of 0.9 in each optimization step. The decay is optimized in the range of 0.9–0.99, and the batch size is optimized in the range of 10–100. In the optimization process, the loss value in both the training and forecasting processes is selected as the evaluation criterion to determine the feedback of the hyperparameters on the model.

After plugging in the three optimal hyperparameters into the LSTM model to evaluate the performance of the model before and after optimization, we use the loss function equation (9), which is represented by equation (10).

In the equation, $k$ is the total number of samples in the test set, $xpredicted$ is the predicted value of the forecasting point, and $xtrue$ is the true value of the forecasting point.

#### 3.1.4. Forecasting

The three optimized hyperparameters are plugged into the LSTM model to train and predict the model. After the model training is completed, the model is used for cyclic forecasting to obtain the daily production rate data for the later stable production rate period.

### 3.2. Numerical Simulation

The target reservoir is a typical ultra-low permeability reservoir, and the production rate data are obtained from the daily production rate data of a single well in a horizontal fractured well with asynchronous injection and production rate between horizontal fractures in a certain ultra-low permeability reservoir block in Chongqing, China. The average oil saturation of the block is 52%, the average permeability is 0.4 mD, and the average porosity of the reservoir is 0.1. Due to the low formation pressure, horizontal wells are used with repeated fracturing and asynchronous injection and production rate between fractures and packers.

However, the existing numerical simulation software cannot perform the inter-fracture asynchronous injection and the production rate for a single horizontal well. Therefore, a single virtual vertical well is used to replace a single injection or production rate fracture, and multiple virtual vertical wells are equivalent to a horizontal well. The fracture situation after fracturing is simulated by discretely refining the LGR(Local encryption of the grid) grid of the multiple virtual vertical wells. A numerical simulation model of the reservoir is constructed through these operations.

By adjusting the water content, pressure, and water content of the single well in the numerical simulation model based on the daily production rate data of the single horizontal well, the total daily production rate of the virtual vertical well group is fitted to the production rate between horizontal fractures of the horizontal well (Figure 5). This is used to predict the production rate during the later stable production rate period.

## 4. Comparisons and Discussion

### 4.1. Hyperparametric Optimization

To optimize the LSTM for the best production rate forecasting, the SA optimization algorithm is used to optimize the three hyperparameters: learning rate, decay, and batch size.

#### 4.1.1. Learning Rate Optimization

Keeping the values of the hyperparameters decay and batch size constant, we randomly perturbed the learning rate and calculated its corresponding loss value in the model. From Figure 6, it can be seen that as the number of iterations increases, the model convergence rate is slow in the early stages. However, when the number of iterations exceeds 50, the model convergence rate increases, and it approaches complete convergence when the number of iterations reaches 150. Upon examining the learning rate values, it can be inferred that within the optimization range, the model convergence rate is faster when the learning rate is between 0.0006 and 0.0008. Learning rates outside of this range were excluded because they consistently had higher loss values during the optimization process. As the SA temperature decreases, the learning rate values remained within the range of 0.0002–0.0008. The optimal learning rate was selected from this range.

As the iterations proceed, the loss values under different learning rates are calculated. Figure 7 shows that the change in loss value decreases as the learning rate increases in Figure 6, indicating an inverse relationship between the two.

As a result, after global optimization, it was found that when the learning rate is 0.0007, the corresponding iteration number is 232, and the minimum loss value is 0.00167.

#### 4.1.2. Decay Optimization

Keeping the batch size at its baseline value, the hyperparameter learning rate was set to 0.0007. To find the optimal value for the decay hyperparameter, different values were tested by calculating the model’s loss. As the iterations continued, the model converged almost completely by the 100th iteration (Figure 8). According to the principles of SA for selecting valid data, the optimization of the decay hyperparameter was mainly focused on the range between 0.9 and 0.94.

The model’s loss values under different effective decay values can be seen in Figure 9. Within the range of 0.9–0.94, the loss value decreases as the decay increases.

When the learning rate is 0.0007, the optimal decay value that can minimize the model loss is 0.924, and the corresponding loss value is 0.00161.

#### 4.1.3. Batch Size Optimization

With the optimized hyperparameters of learning rate and decay, different batch sizes were tested by randomly perturbing within a controlled range while keeping the other hyperparameters constant. The best-performing batch sizes were retained. From Figure 10, it can be observed that the model converges completely by the 150th iteration, and the optimal batch size is uniformly distributed. Initially, during the high-temperature phase of the SA, larger batch sizes are acceptable as new solutions due to the slower convergence rate. However, as the temperature gradually decreases, excessively large batch sizes cannot be accepted as new solutions, and the optimal range of batch size is narrowed down to between 10 and 20.

With the optimal values for learning rate and decay determined as mentioned above (Figure 11), the algorithm accepts the global optimal batch size of 15 based on the minimum loss value.

### 4.2. Compare Model

After applying the three optimized hyperparameters to the LSTM model, we compare the optimized model with the unoptimized one. In order to evaluate the performance of the two models, we compare their training and validation loss as well as forecasting accuracy.

Before and after optimization, the training and real loss values are shown in Figure 12. The left position of the figure shows the training loss result of the LSTM model before hyperparameter optimization. It can be observed that the loss value gradually approaches 0 with the increase in the number of iterations. The right position of the figure shows the LSTM training loss result after hyperparameter optimization. Generally, the verification set loss (true value loss) is lower than the training set loss. Compared with the results before optimization, it is obvious that after optimization, only a few iterations are needed to achieve the results before optimization, which proves the importance of hyperparameter optimization for LSTM.

Table 3(a) and Table 3(b) show the loss levels of the model before and after optimization for training and forecasting data. It can be observed that the optimized model has smaller loss levels than before, both in terms of training and forecasting. Specifically, the training loss of the optimized LSTM model is much smaller than before, which leads to a lower loss in the subsequent forecasting process. Based on the difference in loss values, it can be concluded that the LSTM model with hyperparameter optimization can more effectively predict the daily oil production rate.

MSE(Mean Square Error) | RMSE(Root Mean Square Error) | |
---|---|---|

(a) | ||

Training | 0.00172 | 0.0415 |

Predict | 0.00086 | 0.0294 |

(b) | ||

Training | 0.00105 | 0.0324 |

Predict | 0.00064 | 0.0253 |

MSE(Mean Square Error) | RMSE(Root Mean Square Error) | |
---|---|---|

(a) | ||

Training | 0.00172 | 0.0415 |

Predict | 0.00086 | 0.0294 |

(b) | ||

Training | 0.00105 | 0.0324 |

Predict | 0.00064 | 0.0253 |

### 4.3. Numerical Simulation History Fitting

In order to predict future production rate, the numerical reservoir simulation model (Figure 13) was modified based on the existing production rate data to ensure that the daily oil production rate of multiple virtual vertical wells was equivalent to that of a single-fractured horizontal well with synchronous injection and production rate. The production rate strategy, reservoir properties around the well, and interwell connectivity were adjusted based on the actual injection water volume, and a significant amount of time was spent fitting the historical production rate data.

After fitting the historical production rate data (Figure 14), the fitting accuracy was verified to reflect the accuracy and effectiveness of the numerical reservoir simulation model. According to equation (11), the fitting accuracy was calculated to be 96.2%.

In the equation, $n$ is the total number of daily oil production rate data, *m*_{valid} is the actual daily oil production rate, and $msimulated$ is the numerical simulation fitting daily oil production rate.

In response to the issues encountered during fitting, when the oil production rate is too high, changing the production rate system and permeability cannot reach the actual maximum value. Data points with large fluctuations cannot be perfectly fitted. Moreover, due to the fitting, the daily production rate may experience a slight increase in the final production rate stage, which may result in inaccurate forecasting and failure to achieve a stable production rate decline in the subsequent production rate forecast.

## 5. Comparison of Fitting and Forecasting Results

To verify the effectiveness of the LSTM time series model in predicting daily oil production rate, a comparison was made between the actual production rate data, the LSTM model forecasting results, and the SA-LSTM forecasting results. The fitting rate was calculated based on the equation for fitting accuracy using the numerical simulation.

Figure 15 shows that the LSTM model has a high-fitting degree for the training results of 500 days based on the training level of the first 100 days. Additionally, the SA-LSTM model, which has been optimized, has a significantly better forecasting result for the entire daily oil production rate than the single-mode LSTM model. The SA-LSTM model can also make good forecasting for data points with highly variable daily oil production rate based on numerical simulation. According to the formula for calculating the fitting rate of the numerical simulation, the single LSTM model has a fitting rate of 98.1%, while the SA-LSTM model has a high fitting rate of 98.7%.

In the same computer equipment conditions. The time costs of the three methods for forecasting daily oil production are as follows. As shown in Table 4, numerical simulation requires tedious geological modeling and continuous adjustments of reservoir properties to predict daily oil production, which takes as long as 13.5 hours. During this period, constant debugging and modifications are required. In contrast, LSTM and SA-LSTM methods require much less time than numerical simulation, and their implementation in code allows for automated forecasting. The whole process of the SA-LSTM model takes only 0.1 hours longer than that of the single LSTM model because the single hyperparameter optimization time is shorter, and suitable hyperparameters can be found quickly within the range of hyperparameter optimization. For numerical simulation, a large amount of geological and reservoir data, as well as complex model parameters, are required to be continuously debuted through professional numerical simulation software Tnavigator, which takes much longer than LSTM and SA-LSTM.

Forecasting method | Training/numerical model building time(hours) | Forecast/production history fitting time(hours) | Total time(hours) |
---|---|---|---|

Numerical simulation | 12.5 | 1.0 | 13.5 |

LSTM | 0.2 | 0.1 | 0.3 |

SA-LSTM | 0.3 | 0.1 | 0.4 |

Forecasting method | Training/numerical model building time(hours) | Forecast/production history fitting time(hours) | Total time(hours) |
---|---|---|---|

Numerical simulation | 12.5 | 1.0 | 13.5 |

LSTM | 0.2 | 0.1 | 0.3 |

SA-LSTM | 0.3 | 0.1 | 0.4 |

In order to validate the applicability of the SA-LSTM model for forecasting production in tight oil reservoirs, daily oil production data from the remaining twenty production wells in the same tight oil reservoir block were used as input to the aforementioned model for daily production forecasting. The training and forecasting results are shown in Figure 16. The results indicate that the LSTM model optimized through hyperparameter SA is effective in predicting daily oil production, particularly for tight reservoirs. The accuracy of the predictions ranges from 92% to 98%. Compared with reservoir numerical simulation, the SA-LSTM model is more time-efficient and offers a convenient forecasting function.

## 6. Conclusion

The application of artificial intelligence models in the petroleum industry has become widespread, and LSTM is a model for processing data that vary with time. This paper uses LSTM and SA-optimized LSTM to predict daily oil production rate. SA can search for global optimal solutions and strengthen hyperparameters, making the LSTM model more accurate and reliable. Based on this, the two models are validated and compared using reservoir numerical simulation methods. In summary：

After extensive debugging and attribute modification, the numerical simulation accuracy of the conventional reservoir reached 96.2%. However, this process required a significant amount of time.

SA search can effectively eliminate bad parameters and find the most suitable parameters for LSTM models. This method is simple to operate and has a fast search time, resulting in the global optimal hyperparameter.

Upon comparison and validation, it can be determined that the SA-LSTM model exhibits high accuracy in fitting and predicting the daily oil production of tight oil reservoirs. Based on predictions made for twenty tight oil reservoir production wells, its accuracy ranges from 92% to 98%.

## Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## Conflicts of Interest

The authors declare that they have no conflicts of interest.

## Acknowledgments

This study was supported by the CNOOC (China) Co., Ltd.’s major project "Key Technologies for Significantly Enhanced Oil Recovery in Offshore Oilfields" (KJGG2021-0501).