## Abstract

A procedure to automatically correlate well logs measured in boreholes that are located in continental siliciclastic basins by using two different methods is shown. The first method is applied to the parametric layers that were determined in each borehole starting from the values of their geophysical parameters and consists of correlating, by cross-association, the columns formed by these layers. The second method consists of cross-correlating the geophysical stretches or units, which are established as sets of layers with similar characteristics that are sufficiently different from the average values in the adjacent stretches. The evaluation of the correlation results requires showing the criteria that are used for determining the parametric layers that are obtained from the well logs, the result of which is called segmentation in this study. This evaluation also requires to show the techniques that are used to determine the geophysical stretches by a process that is called stretching in this study. The reason for using different correlation methods is that cross-association of layers provides high resolution but relatively smaller spatial extent, whereas cross-correlations of geophysical stretches provide higher spatial extent but lower resolution. Thus, the cross-association results have been used both to assess the correlations in boreholes that are relatively close ($distances<10\u2009km$) and to support the establishment of the stretch correlation criteria. The developed methodology is applied to a set of boreholes located in the Duero Basin (Spain). From the results obtained, an evaluation of the correlations with respect to the distances between boreholes was carried out. Furthermore, it is shown that the correlations between geophysical stretches enable identifying the correspondences between these and the tectono-sedimentary sequences (activation-relaxation of a tectonic phase) that are established in the literature.

## 1. Introduction

The correlations between the lithological results that are obtained from boreholes in a multilayered siliciclastic basin are a key aspect for the identification of large geological units that may have similarities in their main characteristics as reservoirs and the determination of sedimentary sequences. On many occasions, the distance between boreholes means that some of the characteristics of the encountered layers do not remain constant laterally, producing changes in thickness, disappearance of levels and differences in sedimentary episodes, and even the repetition or inversion of sequences and the presence of geological discontinuities [1, 2], which cause the correlation process to be difficult.

In these basins, there are not so marked disconformities as in marine basins or coastal basins, with continental and marine environments [3, 4], and the well logs signatures are not as characteristic and/or isolated. Nor are the sequences so noticeably repetitive, because in the multilayered siliciclastic basins the main difference between strata lies in their granulometry, providing the vertical textural (grading) trends [5], and their clay content. In marine basins, the sequence patterns within each tectonic interval have much greater lateral continuity [6] and the concept of parasequence, as originally intended by Van Wagoner et al. in 1990 [7], is more related to its usage to successions of genetically related beds bounded by flooding surfaces. Although many studies discussed the information provided by well logs, in general, the correlation processes are mainly supported by geological information, being especially important the information provided by geochronology and biostratigraphic data of fossils from cores [8, 9]. Some basin studies have used well-log data for correlation and reservoir modeling [10, 11]. Most studies associated with siliciclastic environments [7, 12–14] show a very complete stratigraphic correlation and interpretation based on the combined core, seismic, outcrop, and well log data. Instead, this study focuses on achieving the broadest possible stratigraphic interpretation only starting from the techniques for correlating well logs that are obtained in such multilayered siliciclastic basins.

In well logging, to correlate can be defined as finding a correspondence in the attributes, sequence, and position between the different parts of the well logs obtained in two boreholes. To obtain a correlation of the lithostratigraphic attributes, lithological classification (or more ambitiously, belonging to the same lithological unit) should be added to this definition. The attributes that are used to determine these correspondences can be represented by the shapes of the different logs (e.g., cross-correlation and pattern recognition), their parametric values (equal resistivity, gamma ray emissions, or other parameters), or the lithological assignments that result from interpreting of the well logs (e.g., a geophysical column). These differences divide the well log correlation techniques into two types, depending on whether continuous curves or series of discrete results are compared, either the numerical values of one or several measured parameters or the values that are assigned to the different lithologies existing in the basin.

Correlations of well logs, as continuous or quasicontinuous curves, have been used both manually and automatically in many works [15, 16]. However, the authors have not chosen this technique because the geological variations of the strata that are found in several continental siliciclastic basins of the Iberian Peninsula have shown that they have a reduced spatial range. For example, the interval of a short normal resistivity log (S.N.R.) containing sand lenses with lateral extents of several hundreds of metres in an alluvial fan system may exhibit different geometries depending on the part of the fan in which the interval is located.

For the above reason, to achieve an adequate sedimentary interpretation, in this study, a prior subdivision of the well logs (continuous curves) into different levels is conducted by delimiting the depths between these levels. Figure 1 shows the results of the segmentation process in an example log interval. Although there are multiple well logs processes that should be carried out prior to the correlations (e.g., corrections for environmental factors and corrections of baseline trends), in this work, only the discretisation processes carried out are described due to their implications for the correlation techniques.

The achievements of the correlation techniques depend in part on the objective and the scope that are pursued by the correlation process. In this work, we differentiate between two types of delimitation within the logs: segmentation of the logs into geophysical parametric layers (hereafter layers) and stretching into geophysical units (hereafter stretches), as will be detailed in Section 3.

In many publications on well logging to date, the term zonation has been used interchangeably for these two delimitation types. Most authors agree that the previously developed processes of well logs zonation are fundamental [1], although both are intermingled in the literature, with multiple references to both segmentation methods [17–20] and subdivision of entire well logs in layers’ groups [1, 21–24].

The process of segmenting well logs consists of determining, within the continuous curve of a parameter, the discontinuities that correspond to strata with different values of the measured parameter(s), i.e., the different parametric layers of the borehole. In mathematical terms, the continuous curve is the result of the convolution of the measurement system with a discrete series of different geophysical events, i.e., the layers, so that the depth limits of this series can be obtained by carrying out this process in an inverse manner, i.e., by deconvolution. However, in this study, we chose to use a more classic technique, namely, determining the layers by means of inflection points.

For the stretch determinations, two types of statistical studies were carried out, which define zones from the analysis of the mean values or from its variances. The criterion of the variance of the mean value was employed by Webster and Wong [25], who used Student’s $t$-test by successively dividing each log in two and comparing each half, which was later generalised by Webster [26] for other parameters by using the Mahalanobis distance. Later, Shaw and Cubit [27] modified this method by using a predefined window along the log. The criterion for the changes in variance was employed by Testerman [21] and Gill [22] so that it was a minimum value within zones and a maximum value between zones and used a regression model for the different selected stretches until a maximum number of zones or a predetermined difference in the variance between stretches was reached. Later, Hawkins and Merriam [28] modified the method of stretch selection by using a predefined window to yield an optimal number of zones, and these same authors together with Hawkins and Ten Krooden [29] extended the method for several well logs.

It can be argued that layer-by-layer correlations make the most geological sense when used between nearby boreholes, although correlations between “guide layers” such as volcanic ash or granite wash [30] can be used for correlations over longer distances. These layers show high outliers which may also correspond to the presence of organic matter rather than clay content [31]. However, dividing a log into stretches or units is considered to be the most effective method for determining the well log correlations between distant boreholes (up to 25 km) within large continental siliciclastic basins. From a geological point of view, although stretch correlations have larger spatial extents than layer correlations, they have several limitations. First, a depositional system can laterally vary in its sedimentation energy and show decreased values of many of the geophysical parameters, which are intrinsically related to grain size. Second, delimiting the stretches of a borehole from the well log data is of a very relative nature, given that their thicknesses are not determined in a generic or prior manner, and different solutions can be obtained depending on the methods used. For these reasons, layer correlations have been used in this study as a criterion to support decisions on the correlations of stretches.

The methodology that is developed to conduct correlations between well logs in continental multilayer basins is aimed for this process to be carried out automatically, without the need to create an expert system, so that it requires minimum operator intervention and is not limited to the specific characteristics of a basin. The main advantage of automatic correlation is the clarity and unification of the criteria. For example, in the optical correlation developed by Kerzner [32], the fundamental criterion was the creation of correlation pairs that considered a limit in the depth differences and slope similarities of the compared points. Automatic and semiautomatic correlation methods for well logs have been developed, especially from the 1980s to date [33], because of their interest in optimising this interpretation process without sacrificing good results, but these are two objectives that cannot be easily combined.

There are very different physicomathematical algorithms for correlating well logs including Markovian chains [34–36], slotting [37–39], and pattern recognition [40–42].

In recent decades, these and other techniques have been implemented in expert systems [1, 43–46] because of the complexity of the geological correlations over large distances. Sedimentary analysis includes some criteria that may only be carried out automatically by expert systems. These expert systems require decisions to be made by the operator based on an in-depth knowledge of the geology of the medium under study, which then governs their application. Furthermore, many of these recent procedures include approximate reasoning criteria, such as fuzzy logic and Bayesian inference, in which the correspondences are not based on equality but on similarity or resemblance (i.e., “practically the same” or “approximately the same”). However, in this study, we chose to start from more classic techniques such as layer cross-associations (which include similarity criteria) and stretch cross-correlations (which include certain decisions based on numerical criteria) and present in detail the steps followed for which the highest degrees of correlation were achieved.

The cross-association methods were developed, which consist of considering the lithological columns of two boreholes as if they were two numerical vectors that were obtained from a series of previously created codes to indicate their lithologies. Then, a matrix is produced whose values are the results of the distances, according to a defined metric, between each pair of elements. Finally, a pseudodiagonal alignment is selected so that it passes through the minimum values of each row. The pairs in this alignment represent the correlation pairs between the two geological columns [47].

The term cross-association was first used in geological correlations in 1965 [48] and was developed by several authors [49–54].

Cross-association methods are widely used and have the advantage of achieving an optimal correlation with the presence of voids and absence of layers. Several approaches can be found which combine algorithms that consider deformation with dynamic warping depth with artificial intelligence techniques, pattern recognition, expert systems, neural networks, or combinations of several of them [55–61].

This method, when applied to log correlations, has been used since the 1960s and began with correlations in the original space domain [62–64] and continued with the frequency-transformed domain [15, 63–66]. The use of the first domain yields good results when there are no significant differences in layer thickness that are correlatable with different sedimentation rates [16]. The variations in these thicknesses make it necessary to use iterative processes by performing a depth transformation factor in one of the compared wells; in the attempts made by the authors, although this procedure increases some points of the cross-correlation curve between the well logs, its values are mostly low. In the frequency domain, these problems are overcome, but there are problems with layer absences in the compared sequences (by extension, truncation due to erosion, or faults), which Srivardhan [67] tried to solve by applying a multiscale analysis. However, in this domain, it is not certain that the relationship found is binding to affirm the existence of correlation, since in large multilayered basins such as the studied basin, the diversity, and high number of layers produce very similar spectra in all of the boreholes considered globally.

In this study, first, the techniques that are developed for the automatic delimitation of the layers traversed in the boreholes are presented, as well as for the determination of stretches. Second, the specific techniques that are developed for the correlations between layers by cross-association and the correlations between stretches by cross-correlation are presented, whose criteria have been developed based on the results of the cross-associations.

The description of the developed algorithms for automatic layers’ delimitation will allow readers to evaluate the scope or equivalence of these delimitations based on physical-mathematical criteria with other units of a geological nature. It should also be noted that the algorithms developed for correlating geophysical layers can be applied equally to lithological columns obtained by other geological criteria. Besides, the detail of the methodology can also be used to develop correlation software subroutines, as they are simple mathematical developments.

This methodology has been applied to correlate 27 boreholes in the Duero Basin (Iberian Peninsula), for which the segmentation and stretching processes were previously carried out. The final interpretation is obtained through a joint interpretation of the results obtained in the different boreholes, the results of which are presented together with the corresponding equivalences between some geophysical stretches and the different tectono-sedimentary sequences (activation-relaxation of the different tectonic phases).

## 2. Geological and Hydrogeological Setting

The study area is located within the Iberian Peninsula on the left bank of the Duero River in the southern sector of the central siliciclastic Paleogene of the Duero River. The Duero river basin has been extensively studied by several authors [68, 69]. Regionally and hydrologically, this area is known as the “Región de los Arenales,” where an average siliciclastic thickness of approximately 500 metres can be determined [70]. Figure 2 shows the limits of the ~$80\xd745$ km study area, together with a regional geological map of the Duero Basin.

Hydrogeologically, the region has superficial aquifers that consist of sand layers (less than 5 metres thick) between sandy clays, which for hydrogeological purposes, can be considered as a free, continuous, and heterogeneous aquifer mantle. The deep aquifers consist of lenticular layers of sands or gravels enclosed in a more or less semipermeable matrix, while the whole region behaves as a large, heterogeneous, anisotropic aquifer that is confined or semiconfined depending on the area.

To evaluate the accuracies of the differentiated geophysical stretches to the expected sedimentary model in this area of the basin, a geological framework of the basin was extracted, which is necessary from any point of view to obtain accurate conclusions, and it was necessary to consult the literature for a more in-depth geological approach to the problem.

In the Tertiary of this sedimentary basin, five major sedimentary cycles can be distinguished, which are related to tectonic reactivation processes [71] at the basin margins. These synsedimentary tectonics control the Tertiary sedimentation in the bands [72] and cause the frequent presence of progressive unconformities [73]. During the early evolution of the basin, there were different subbasins that were relatively separate from each other, in which well-differentiated environments were generated [74]; even hydrogeologically, subzones can be differentiated [75]. Thus, in the study area, two sectors have been established (e.g., Salamanca Sector and Zamora Sector) with different stratigraphic characteristics throughout the first three cycles [76]. In the Neogene, a common source area was reached in this location, which could then be considered a single basin, although with a diversity of lithofacies based on the positions and energies of the environments in this basin [77]. A summary of the characteristics of these environments is presented in Table 1.

It should be noted that one of the objectives of this study is to associate the log correlations with this geological subdivision in the southern Duero Basin.

## 3. Materials and Methods

### 3.1. Boreholes Selected

The boreholes used for this study are part of a campaign that was carried out by the Spanish Institute for Agrarian Reform and Development (I.R.Y.D.A.) and the Dirección General de Estructuras Agrarias de la Comunidad de Castilla y León for the irrigation water supply in Castilla y León Region. Most of the boreholes were drilled by rotation with reverse mud circulation, although some were drilled with direct circulation. Different types of muds were used for the drilling of these boreholes, namely, organic (Revert), bentonite, and natural muds with conductivities between 860 and 2260 *μ*S/cm.

Among the boreholes analysed by the authors in this part of the basin, a total of 27 boreholes were selected, both because of their geographical locations (an appropriate geographical distribution to establish a correlation profile) and because they reached great depths (i.e., more than 400 metres). Figure 2 shows the location diagram of the selected boreholes, whose depths are shown in Table 2.

A northwest-southeast profile (P-P$\u2032$) was plotted on this map to subsequently produce a correlation section. The selected boreholes present a strong cluster to the northwest, which disperses toward the east in the form of a fan, which will allow to carry out a detailed evaluation of the extent of the correlation.

The logged parameters were natural gamma radiation (G.R.), spontaneous potential (S.P.), normal resistivity (with electrode spacing $AM=0.4\u2009and\u2009AM=1.6\u2009m$), and single point resistance (S.P.R.), while lateral resistivities (with electrode spacing $AO=1.8\u2009m$) and temperature logs were only measured in some boreholes. The cross-association will be applied to lithological layer, as previously stated, and the cross-correlation will be carried out on the stretches obtained from resistivity logs.

For better control of the geophysical results, lithological column surveys were conducted by using the drilling cuttings. For the lithologies that were present down to the depths reached in the boreholes (approximately 500 metres), those established visually consisted mainly of gravels, sands, silts, and clays, with some mixed lithologies such as clayey gravels and clayey sands. The marl levels and some mixed lithologies (loamy sands) were also differentiated by hydrochloric acid effervescence in the case of low effervescence, and the most remarkable aspect was the scarce presence of these lacustrine lithologies. The grain-size curves were obtained on cuttings extracted from six of the studied boreholes.

### 3.2. Previous Processing

In this work, a distinction is made between the delineation of parametric or geophysical layers, a process referred to here as segmentation, and the delineation of stretches (geophysical units), referred to here as stretching. The parametric layers are the intervals of a log that correspond to unique values of various physical parameters; e.g., a clay layer has a low resistivity value (on average close to 10 *Ω*·m), relatively high radioactive emissivity, and a zero spontaneous potential anomaly value. In this study, geophysical stretches are sets of layers that are characterised by maintaining, within each of them, one or more characteristics that are more or less uniform among themselves and differentiate them from the adjacent stretches. These characteristics include the values of one or more physical parameters (related to the dominant lithology), degrees of variation (related to the homogeneity), and their thicknesses and sequences (related to the formation process). These stretches, as a set of levels or layers, may or may not correspond to geological units or facies depending on the lithological characteristics of an area. It has been decided to use a specific name to differentiate it from others with a more depositional character such as the tracts established by Brown and Fisher [85]. Determining these stretches is of great importance in hydrogeological studies and in the management of the resources of a basin, as well as in the information they represent for the location and planning of boreholes in an area. It should be noted that the term unit has not been used here because of the specific connotations it has from a geological point of view.

There are different methods to carry out the zonation process, in its generic meaning, among which we should mention the Walsh transform. It was used on well logs by Lanning and Johnson [86] and Aliouane and Ouadfeul [87] and consists of decomposing each of the well logs into a sum of rectangular functions in a manner similar to Fourier decomposition and then using filters to find the significant changes. This method is especially applicable to the delimitation of layers. It is also worth mentioning spectral shifts, which were successfully applied to generic signals by Baseville and Benveniste [88, 89]. This method consists of analysing the possible changes in the frequency content curve of the well log (obtained, e.g., by means of the fast Fourier transform) as one moves along the log. This method, in addition to being tedious in terms of time, has not provided good results in the attempts made in this work in terms of segmentation but, as will be seen below, provides good results in the zonation process.

Given that two different methods have been used for the correlations of the obtained well logs, one using the layers and the other between the stretches, the criteria that were used for segmentation and stretching are detailed below.

#### 3.2.1. Segmentation

As defined in the introduction, the geophysical layers must necessarily be qualified with the parameters that define them (e.g., geoelectric and geoseismic), which in some cases may correspond to different lithological levels or layers. In our case, the minimum combination of parameters chosen is aimed at determining the column or distribution of lithological layers of the borehole with depth, but depending on the lithology in question, certain equivalences may not be resolved or some other limits may be found (e.g., the case of saline water versus fresh water in the permeable levels).

The technique for delimiting the layers or levels (segments) of each log consisted of determining the inflection points. This process was carried out by using of the points of the null second derivative and was subsequently modified and/or cancelled according to a series of criteria and both were dependent on the parameter in question and on transition patterns. Figure 3 shows the curves that resulted from this process for one of the parameters measured in one of the studied boreholes, in which the resolution of the classic technique used can be evaluated when the programming criteria are appropriate.

Once the depth values of the successive segments have been determined, an apparent value for the logged parameter is assigned to each segment. The initial value assigned, in the case of segments where there is a local maximum or minimum, is this extreme value. In the case of a thin layer where an extreme is not reached, the maximum value within the segment is taken if it is a segment with positive curvature, or the minimum value is taken if the segment has a negative curvature. This value is then corrected but is still within the apparent value concept, by means of criteria that depend on the parameter in question (e.g., for the lateral device multiplied by 0.8 in the case of resistive layers).

Once the different parameters had been separately segmented, the next process was to decide which of the segments established in the set of well logs could represent the lithology. This step also required the consideration of several factors. On the one hand, the different spatial characteristics of the different parameters, e.g., the gamma ray and single point resistance sondes, have a more limited outreach than the 0.4 AM normal resistivity sonde. Environmental borehole effects such as borehole diameter and the fact that the measurements are not as accurate as one would like, especially with respect to measurement depth, also were considered. In short, a series of more or less complex steps were used in the programming, but were without particular mathematical difficulties. These processes are described in detail in Díaz-Curiel et al. [90].

The final step in the “geophysical” layer determination process was the lithological assignment. The technique used for this step was the elaboration of a cluster by using the Manhattan distance instead of the Euclidean distance. To evaluate the lithological extent of the “geophysical” layers obtained, the key is to know the parameters that are introduced in the cluster and the limits that are used to discriminate the lithologies. The parameters used were the porosity, which was obtained from the actual resistivity value of the formation, and the clay content, which was obtained from the gamma ray log. It should be noted that both processes require complex processing of the measured resistivity values and gamma ray log.

For the limits used to discriminate the lithologies, Table 3 shows the differentiated lithologies and values used for the two characteristics.

These values were used in all of the studied boreholes, with the result that the percentages of the determined lithologies correspond to sands, clayey sands, and clays in similar proportions of 20%. The remaining lithologies are silts (4-12%) and gravels (3-8%), although in some boreholes, there were radioactive marls and sands. In Table 1, some regional lithological characteristics of the area are cited, and as mentioned above, the lithological columns of all the boreholes were taken.

#### 3.2.2. Stretching

Two types of statistical criteria have been applied to the already segmented well logs, which define stretches on the basis of the analysis of the mean values or on the basis of their variances so that they were at a minimum within the zones and at a maximum between them [83]. These methods were modified by using a window whose width was chosen according to the results obtained.

To determine the stretches of the well logs, a statistical analysis of the initial log and the segmented resistivity curve is carried out. The type of window that was chosen for this study has a fixed width that is equal to 14 times the average thickness of the borehole layers (in the well logs studied, this corresponds to an average thickness of 25 metres). This choice was made after testing both fixed windows of different thicknesses (between 5 and 50 metres) and accumulated windows. Tests were also carried out by studying, for each point, the differences in values between opposing fixed windows.

Once the type of study window has been chosen, a variance study of the segmented log is first carried out. Next, a frequency analysis of the curve is conducted by calculating the average thickness of the layers (equivalent to the wavelength and inverse of the number of zeros) along the segmented log. Finally, a study of the mean value variable of the initial log is carried out, for which the log is first strongly smoothed and then segmented following steps that are analogous to those described for segmentation.

These processes present problems at the extremes of the curve, so to avoid losing information at the edges, the width of the study window was modified for the variance and frequency calculations, which causes it to decrease to zero at the extremes. For the mean value calculations, the influence of the window width at the extremes of the log is even greater, so a decreasing potential filter type whose coefficients are a function of this width was developed.

Since the studied variables are very diverse, it does not make sense to search for common limits for all of the variables but rather to use as a starting point the limits that were found for the mean value variable and to add the limits that were determined by the variance and frequency variables in those depths where particularly significant variations in these two variables were observed.

Once the final boundaries of the stretches have been determined, the variances, frequencies, and mean values are recalculated for each stretch, while this time, windows of width equal to the thickness of each stretch are used.

The characterisation of stretches is carried out by assigning the mean value of each of the variables studied. The mean value variable is normalised from 0 to 100% with respect to the total range in the initial log to minimise the influence of the borehole characteristics (e.g., diameter and drilling mud). A brief description of many elements influencing the well logging will include in the Section 4.3. The variance variable is normalised to between 0 and 1, and the average thicknesses of the layers in each stretch are not normalised.

Figure 4 shows an example of the automatically defined stretches in one of the studied boreholes. In this figure, we can see how, among the analysed variables, studying the amplitudes best defines the log stretches. However, the thickness and variance variables have been used for support when characterising the stretches that were determined for each borehole. It should also be noted that although the thickness study (equivalent to the frequency study) does not determine physically meaningful stretches, it may nevertheless produce a subdivision with a certain sedimentological character.

### 3.3. Correlation

#### 3.3.1. Correlation of Layers: Cross-Association

The matrix is then traversed from the $DKmax,Jmax$ value to the first row or first column and follows the path of the minimum values. The alignment followed provides the correlation pairs selecting those for which $dKJ=0$. Several modifications were made to this generic procedure:

- (i)
Creation of lithological codes: the code assigned to each lithology must have a certain mathematical meaning to be able to evaluate and classify the different lithologies. These codes must consider the predominant lithology and have a certain sense of order. The starting codes used are shown in Table 4.

By using a 2-digit code, mixed and pure lithologies can be indicated, e.g., code 22 for gravels and code 53 for sandy clays.

- (ii)
Introduction of a similarity coefficient or equivalence of the layers (fuzzy logic): for a complete study of the differences between the columns, different equivalence criteria between the layers were chosen and were more permissive each time. These equivalence criteria are very important in this type of correlation because given the exclusion of cross correlations, the appearance of new correlations may exclude some previous correlations. The coefficients used are shown in Table 5.

- (iii)
Analysis of the last row of the correlation matrix: the first problem that is encountered when following Smith and Waterman [50] criterion is that the number of correlations is so low compared to the number of layers such that the choice of the minimum value in the last row cancels out the possible correlations between the last layers. This problem was solved by taking two starting points: one is from Smith and Waterman [50], and the other reflects a correlation in the last rows and choosing from these two criteria the starting point that achieves the greatest number of correlations.

- (iv)
Selection of the correlation ascendant: the second problem that is encountered when studying the correlation matrix is that in many cases, the values of the matrix do not reflect an isolated correlation to the extent that the minimum value path passes through that point. We have therefore imposed the condition of deviation from the minimum path if there is an equivalence between layers.

- (v)
Selection of correlation pairs: from all of the correlation pairs found, only those that involve two equivalent layers are selected, which are always within the chosen equivalence criterion.

In addition, the following restrictions were imposed: If the two compared layers include both substrates and organic matter, correlation always occurred. The depth displacement of a layer cannot be greater than a maximum percentage of its depth; otherwise, mathematical correlations are included that have no real meaning.

In order to evaluate the scope and reaching of the developed cross-association algorithms, Figure 5 shows the correlation pairs that were established between boreholes GU-2 and GU-7 (thick lines in the middle column). The horizontal lines to the left and right represent the automatically determined layers for each borehole, while the lithologies are not shown due to the scale of the graph. The obtained correlation pairs do not imply that there are no other possible pairs to correlate (it would be a limitation of the cross-association technique), but show different correlation dips along the depth of both boreholes.

#### 3.3.2. Stretch Correlation: Cross-Correlation

By following the above, this process is aimed at dividing each log into a series of stretches characterised by geophysics by one or more of the following properties: the mean value of the logged parameter, the frequency of the curve, and the difference between the maximum and minimum values of the log in each stretch.

It should not be ignored that for correct determinations of these stretches, as well as for the lithological assignments, the interpreter must be aware of the influence that certain drilling characteristics, such as mud viscosity or borehole diameter, have on the resolution of the geophysical parameters.

Once the type of study window has been chosen following the process described in Section 3.2.2, a variance study of the segmented log is first carried out. Then, a frequency analysis of the curve is carried out by calculating the average thickness of the layers (equivalent to the wavelength and inverse to the number of zeros) along the segmented well log. Finally, a study of the mean amplitudes of the initial well log is carried out, for which the well log is first strongly smoothed and then segmented following the same steps as those described in Section 3.2.1.

Given that the variables studied are of a very diverse nature, it does not make sense to search for common limits for all of them but rather to take as a starting point the limits found for the mean value variable and define the limits for the variance and frequency in a personalised manner at those points where significant variations in these two variables are observed.

Once the stretch limits have been determined, the variance, frequency, and mean values are recalculated for each of them, this time with the window defined by the width of the stretch. The mean values are normalised from 0 to 100% between the extreme values of the initial log to avoid the effects mentioned in the first paragraph of this section. The variance values, on the other hand, are normalised to between 0 and 1.

All of these processes have problems at the curve extremes, so to avoid losing the information at the edges, the width of the study window has been modified for the variance and frequency calculations, so it decreases to zero at the extremes. For the mean amplitude calculations, the effect of the window type at the edges is much greater, so a power function filter was developed. In this case, characterising the stretches is not carried out through the creation of zone codes but instead by using the values of each of the statistical variables studied.

Given that the number of stretches in a borehole is not very large (less than 20), this process is carried out individually based on the characteristics of these stretches and the results of a cross-correlation process of the stretches that were defined for the resistivity (S.N.R.) and G.R. logs. These parameters were chosen because they have been shown to be most representative for carrying out this process.

To achieve more significant degrees of correlation, it is necessary to deform these stretches in such a way that the thicknesses of the compared stretches coincide. To this end, for each stretch in the first borehole, the depths of all of the points of each stretch in the second borehole are modified proportionally to their position within the stretch, and subsequently, the values for the initial equidistant depths are defined by means of linear interpolations.

Thus, cross-correlations are performed between all of the S.N.R. and G.N. stretches of the first borehole with those of the same parameters in the second borehole, which is somewhat tedious in terms of time. The results are presented in the form of a matrix where each cell $n$ of row $m$ represents the minimum value of the cross-correlation between the stretch $m$ of the first borehole and stretch $n$ of the second borehole.

Analogously, a dissimilarity matrix was defined by using the differences in the mean variance and amplitude values of each stretch. The points of the matrix with the lowest dissimilarities indicate the most coincident stretches (the selection of the final correlation is always performed to avoid cross-correlations).

Figure 6 shows the correlation pairs that were established between the LAN and MO-1 boreholes. In the middle column, the correlation pairs for the stretches that were determined in each of the boreholes are shown.

Summarizing, all processing steps are reflected in the flowchart shown in Figure 7.

## 4. Results and Discussion

### 4.1. Results Obtained: Correlation-Distance Assessment

The final correlation is the result of the combined information that is obtained from the two-by-two comparison of all of the boreholes, both between layers and between the geophysical stretches that were defined from the automatic analysis of all of the well logs. Figure 8 shows a section with the projection on the traced profile of the different determined pairs of correlations of stretches.

Although some correlation horizons can be followed throughout the study area, three subregions can be distinguished according to the location of the profile in Figure 2. A northwestern subregion with boreholes GU and TO. A central subregion that is represented by AR and MAD boreholes has the lowest degree of correlation. The last subregion, located to the southeast, shows very high correlation percentages, despite its greater length.

An evaluation of the degrees of correlation of the stretches with distance was also carried out to determine the percentages of stretches for which a correspondence was found with respect to the total number of stretches by using the 17 pairs that were chosen to cover the study area in a representative manner. The aim of this analysis was to evaluate the extent of the correlations in a stratigraphic system such as the one studied, although it should be kept in mind that the layer correlations are implicit in part in this result. Figure 9 shows the percentages of the correlated stretches between the stretches of the different boreholes versus the distances between them, together with two curves that represent the envelopes of these percentages.

The increase in separation with distance between the two envelopes shown in Figure 9 can be associated with either belonging or not to one of the three subregions mentioned above. Thus, for the boreholes in the southeast subregion, which are 35 km apart, such as CO-1 and LAN, a correlation of 38% is found, while the others belonging to different subregions, such as MAD and ADA, which are 42 km apart, show much lower percentages (22%). Thus, it can be considered that the upper curve represents the resulting correlation percentages between the stretches of boreholes that are located in the same subregion, while the lower curve corresponds to the case where the compared boreholes are located in different subregions.

### 4.2. Tectono-Sedimentary Interpretation: Correlation of Sedimentary Sequences

The stretching process that was carried out enables a very exhaustive correlation but does not differentiate the depths of the sedimentary cycles that are related to the tectonic phases of the basin. Thus, the method generates the division of each borehole into many stretches (12-20).

To find the key relationship between the geophysical stretches that are defined with the relevant geological data, the evolutionary trend between them was analysed, which looked for cyclical behaviours in the series.

To this end, it was previously proposed [91] to analyse the energetic trends of the sedimentary environment on the basis of the predominant lithologies of each stretch and the trend of the sequences (positive or negative) that they form. To differentiate from more conventional geological sequences that formed by stretches will be thereinafter named macrosequences. As can be guessed, the aim was to check whether there is a correspondence, both in the number of cycles generated and in their main trends, with the recognised tectono-sedimentary cycles.

As in the investigation of the sedimentary environments and palaeogeographic evolution of a basin, the analysis of trends in a series and their macrosequences allows to discern the variations in the energy of the sedimentary environment along with the tectonic and palaeoclimatic implications they suggest.

The sequencing results indicate more consolidated groupings. Each macrosequence contains between 3 and 5 stretches that mark a negative-positive sequential evolution (i.e., a progressive increase in the energy of the medium and subsequent decrease in the same). Thus, from the analysis of the studied boreholes, a clear differentiation into four macrosequences can be obtained, although their characteristics do not coincide laterally with each other.

A high degree of similarity appeared to be determined between the expected characteristics of the second cycle (e.g., abundant limonites and clays) and fourth cycle (e.g., arkoses and clays with a lenticular and heterogeneous distribution) of the Zamora Sector with those of the same macrosequences in boreholes GU-1 to GU-10, which are located in the same sector.

For the case of the ADA and LAN boreholes, the distributions and general characteristics of the macrosequences clearly differ from those expected in the Zamora Sector (which coincide with their large distance from each other) except in the penultimate macrosequence, which could be similar to that of GU-4 (equivalent to cycle IV), and coincide from this cycle onwards as the basin homogenises. In general, the characteristics of this sector are more similar to those expected in the Salamanca Sector, although not excessively so; this could coincide with the fact that there is not sufficient information on the first two cycles down to the drilled depth, to which it could be added that it is located in a sector that was not clearly defined in the examined literature.

Once the sequencing process had been conducted for all of the studied boreholes, they were correlated. Figure 10 shows the sequencing that was carried out for the analysed boreholes, with the resulting correlation between the defined macrosequences.

The increased thickness of Cycle II towards the centre of the basin and its decreased thickness towards the edges are evident, together with a new lateral change towards clayey and even marly facies. In this line, a clear and complex lateral change of facies between Cycle II and Cycle III towards the centre of the basin can be pointed out, and the latter practically disappears (it would be advisable to analyse in detail the complex interrelations between both cycles). This lateral facies change detrimentally affects the hydrogeological characteristics in this central area (even Cycle IV-1 shows a significant reduction in the proportion of coarse siliciclastic components in this area).

Likewise, in the Zamora Sector, the lithological characteristics that were deduced geophysically from the boreholes are quite similar to those that are expected geologically, which fade towards the south. The higher presence of coarse-grained siliciclastic components in Cycle III of this sector points to the need to improve the hydrogeological knowledge of this sector due to its expected high water potential. On the other hand, the homogenisation of the basin from Cycle IV onwards is corroborated, although once again, its water qualities worsen slightly towards the centre of the basin.

On the other hand, the increased thickness of Cycle III towards the Eastern Zone, together with the increase in the proportion of coarse siliciclastic components, points to an improvement in the expected water productivity in this region. It would be advisable to increase the borehole depths to cross the productive stretches of Cycle I and Cycle II expected in this area.

### 4.3. Discussion

The correlations among the layers in continental siliciclastic basins, although graphically very convincing, seem to lack geological sense for many of the pairs found due to the continuous alternations of most of the lithologies present (especially the four dominant lithologies). Nevertheless, this correlation provides two important contributions to the final correlation. On the one hand, the general correlation dips and, on the other hand, the discrimination in the case of ambiguities, especially in the presence of less abundant lithologies (mainly gravel levels). In this way, the validity of the results that are provided by this correlation system will be greater when there is improved differentiation in the lithological series in the boreholes.

As mentioned above, the various borehole characteristics that affect the well logs must generally be corrected to obtain the correct parametric values of the layers. One such effect is the baseline shift that is traditionally analysed in the spontaneous potential log, although it is also found in some SPR logs. Since the major application of SP log in water capture boreholes is the estimation of formation water quality [92], the correction for that effect has not been taken into account for the correlation between boreholes. Another effect to consider is the influence of mud conductivity on the resistivity logs, used to obtain the porosity of the formations by analysing sondes with different radii of investigation [47]. From the resistivity logs corrected by the mud conductivity together with the borehole diameter, the porosity of the layers can be estimated as long as they allow using the actual formation factor, which can be determined if the equation established in Díaz-Curiel et al. [93] is used. With both criteria, the effect of different degrees of invasion due to different values of mud viscosity is also corrected. It is also necessary to correct for the effects of mud density together with borehole diameter in the GR logs and then perform a statistical analysis to estimate the clay content of the different layers. The current equations to perform both steps are clearly stated in Díaz-Curiel et al. [94] and Díaz-Curiel et al. [95]. Considering that the clay content and porosity data, obtained after the above-mentioned corrections, have been used for the lithological assignment, the correlation between layers by cross-association is not influenced by these effects. Regarding the correlation between stretches, we must remember that for these, the mean value was normalised from 0 to 100% with respect to the total range in the initial log to minimise the influence of the borehole characteristics, and the variance variable is normalised to between 0 and 1. The average thicknesses of the layers in each stretch were not normalised because they are not influenced by the borehole characteristics.

The scope of the correlations carried out increases the percentage of success that is obtained in the correlation between the stretches of the well logs, which is 40% for distances of 20 km, that are always within the same region or sector studied, from which the risk of failure begins to increase. The absence of the end of Cycle IV in boreholes GU-1 to GU-10 of Cycle I in the ADA borehole can be explained by the lack of recording of the first tens of metres in one borehole and depth reached in the other borehole, respectively. The spatial reductions in the correlations between well logs in a continental multilayer basin of these dimensions indicate that such correlations should be limited to regions that at least maintain some sedimentological correspondence with the different source areas.

## 5. Conclusions

The methodology proposed presents a contribution of the automatic processing of well logs to the geological knowledge of siliciclastic basins and provides an initial abstraction of the whole, which facilitates subsequent hydrogeological studies to optimise the management of water resources in these basins (e.g., location, exploitation, and planning of boreholes). This information also has another practical component, in the sense that the different hydraulic transmissivities between the more permeable stretches and the less permeable stretches can cause differences in the hydraulic heads of the permeable stretches. In this respect, the influences with respect to the source areas must be considered.

The processes conducted on several well logs of the south part of the Duero Basin have shown an interesting similarity between the five tectono-sedimentary cycles that are recognised in the basin and their characteristics based on their positions with respect to the two different source areas, namely, the northwest and the south, with the macrosequences are differentiated on the basis of the analysis of the energy trends of the environment.

The correlations of macrosequences over large distances resemble or coincide with the geological model of the basin, partly because such a sequential trend analysis has been supported by recognised geological investigations and data and fits to a lesser or greater extent the expected spatial model of the basin through a coherent correlation of the different units.

According to the similarities of these macrosequences to the tectono-sedimentary cycles, since their characteristics are broadly similar to those foreseen according to the sectors that were recognised in this area (Salamanca Sector and, even more so, the Zamora Sector), it can be observed, however, that the further the boreholes are to the east, the more this similarity becomes indistinct, so that perhaps the existence of a new Sector (e.g., Ávila Sector) should be proposed on the basis of these data.

All of this leads us to affirm that determinations of the macrosequences from the well logs in multilayered siliciclastic environments such as the one studied here provide information on boreholes that are of a highly geological nature. In this work, we highlight its application for the paleogeographic analysis of basins for locating possible geological cycles and aquifer formations in boreholes that are used for hydrogeological purposes through stretching.

The characteristics of the methods used suggest that the process can be generalised to other basins where different periods of tectonic activation have occurred.

The confirmation of the tectono-sedimentary assignments established and their fits with the expected sedimentary model of the basin from the correlation of long-distance sections will require additional geological data (e.g., palynological and paleomagnetic). Since the described sequencing processes were performed manually, the implementation of an “expert system” that is capable of automatically performing this process is a future line of research.

## Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

## Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

## Acknowledgments

Part of this work was supported by the Regional Government of Madrid (CARESOIL-CM project grant number P2018/EMT-4317).