## ABSTRACT

We have evaluated arrival-time picking algorithms for downhole microseismic data. The picking algorithms that we considered may be classified as window-based single-level methods (e.g., energy-ratio [ER] methods), nonwindow-based single-level methods (e.g., Akaike information criterion), multilevel- or array-based methods (e.g., crosscorrelation approaches), and hybrid methods that combine a number of single-level methods (e.g., Akazawa’s method). We have determined the key parameters for each algorithm and developed recommendations for optimal parameter selection based on our analysis and experience. We evaluated the performance of these algorithms with the use of field examples from a downhole microseismic data set recorded in western Canada as well as with pseudo-synthetic microseismic data generated by adding 100 realizations of Gaussian noise to high signal-to-noise ratio microseismic waveforms. ER-based algorithms were found to be more efficient in terms of computational speed and were therefore recommended for real-time microseismic data processing. Based on the performance on pseudo-synthetic and field data sets, we found statistical, hybrid, and multilevel crosscorrelation methods to be more efficient in terms of accuracy and precision. Pick errors for S-waves are reduced significantly when data are preconditioned by applying a transformation into ray-centered coordinates.

## INTRODUCTION

Arrival-time picking is a fundamental step in the processing of downhole microseismic data for phase identification and later processing, such as the accurate determination of a microseismic hypocenter (e.g., Maxwell et al., 2010). Any errors and/or misidentifications of these arrival times may have significant effects on the hypocenters and thus the interpretation of such results, particularly if a systematic bias is present. With the ever-increasing size of microseismic data volumes, the task of manually picking arrival times in a timely fashion is impossible and automated methods must be used. Numerous automated algorithms have been proposed for arrival-time picking, operating in either the time or frequency domain and for single- or multicomponent data. Some examples of commonly used algorithms include the short- and long-time average ratio (STA/LTA) (Allen, 1978; Baer and Kradolfer, 1987; Earle and Shearer, 1994; Withers et al., 1998), modified energy ratio (MER) (Han et al., 2009; Gaci, 2014), modified Coppens’ method (MCM) (Sabbione and Velis, 2010), Akaike information criterion (AIC) (Takanami and Kitagawa, 1991; Sleeman and Van Eck, 1999; Leonard, 2000; Zhang et al., 2003; Diehl et al., 2009), algorithms based on fractals (Boschetti et al., 1996; Jiao and Moon, 2000), crosscorrelation (Molyneux and Schmitt, 1999; Raymer et al., 2008; De Meersman et al., 2009), neural networks (McCormack et al., 1993; Dai and MacBeth, 1995; Gentili and Michelini, 2006), digital image segmentation (Mousa et al., 2011), and higher order statistics such as skewness and kurtosis (Yung and Ikelle, 1997; Saragiotis et al., 2002, 2004; Küperkoch et al., 2010; Tselentis et al., 2012; Lois et al., 2013, 2014). These algorithms have been extensively used in earthquake and exploration seismology to pick first arrivals and other seismic phases; with some modifications, they can also be useful for picking P- and S-wave arrivals on microseismic data. For example, Diehl et al. (2009) use the AIC algorithm in combination with STA/LTA and polarization detector to pick S-wave on local earthquakes, whereas Tan et al. (2014) pick P- and S-wave arrival times on microseismic data using an STA/LTA-polarization-AIC hybrid approach. Küperkoch et al. (2010) use higher order statistics (skewness and kurtosis) to pick P-wave arrival times on local and regional earthquakes, whereas Lois et al. (2014) use kurtosis to estimate S-wave arrival time on microseismic data. Similarly, VanDecar and Crosson (1990) use a crosscorrelation-based method to pick relative phase arrival times for teleseismic events, and De Meersman et al. (2009) use an iterative algorithm based on crosscorrelation to refine initially picked arrival times for microseismic data.

Despite the plethora of algorithms from which to choose, accurate arrival-time picking remains a challenge. Microseismic data are often characterized by low signal-to-noise ratios (S/Ns) and complex waveforms. We define S/N as the ratio of root-mean-square (rms) amplitude in an inferred signal window to rms amplitude in a noise window. Typically, higher signal frequencies and weaker amplitudes are associated with smaller events, which makes automated arrival-time picking difficult in low-S/N environments. For microseismic data, the presence of a strong coda after the P-wave arrival as well as mode-converted arrivals can make the picking of direct S-wave arrivals a challenging task (Chen et al., 2005; Lois et al., 2013). Picking errors will affect precision and accuracy of computed hypocenters, particularly if a systematic bias exists.

Sharma et al. (2010) point out that no time-picking algorithm is optimal under all conditions and that algorithms tend to become unstable under noisy conditions. Thus, understanding the parameters and the limitations of these algorithms can help to improve data processing outcomes. Similarly, knowledge of any algorithm’s speed is important, especially if the objective is to perform real-time data analysis to provide timely feedback during well completions.

In this paper, we provide a review of existing arrival-time picking algorithms: STA/LTA, MER, MCM, AIC, phase arrival identification — kurtosis (PAI-K), hybrid techniques, and crosscorrelation-based methods. In addition, we evaluate their performance using pseudo-synthetic and field data examples. The term “pseudo-synthetic” is applied because we add 100 realizations of Gaussian noise to a high-S/N microseismic event recorded during hydraulic fracture monitoring. We also use downhole microseismic data (112 events) from the Hoadley Flowback Microseismic Experiment (HFME) in western Canada (Eaton et al., 2014). We discuss key parameters for each algorithm and provide recommendations for optimal parameter selection based on our analysis and experience. Examples are presented to explain the preconditioning of input data for arrival-time picking and also to discuss frameworks for picking P- and S-arrivals using unrotated ($x$, $y$, and $z$) and ray-centered coordinate rotated (p, s1, and s2) waveforms. Finally, we evaluate the algorithms in terms of speed, pick error, and provide quantitative and qualitative comparisons. Analysis of algorithm performance in the presence of complex waveforms containing confused arrivals or coda due to reflection and refraction are not considered, as these topics are beyond the scope of the present study.

## FIELD EXPERIMENT

Downhole microseismic data for the HFME project were acquired during and after an open-hole multistage hydraulic fracture treatment in two horizontal wells. The wells were completed in a lower Cretaceous Glauconitic tight sand reservoir in the Hoadley gas field, Alberta, a giant gas-condensate field discovered in 1977. Vertical wellbores were initially used to produce from the most permeable sand bodies, but the introduction of multiple horizontal well technology has shifted the focus to the immense unconventional resource potential in this play (Eaton et al., 2014).

Real-time monitoring of the microseismicity during the hydraulic-fracture treatment of two horizontal wells was conducted on 18–19 September 2012. A 12-level receiver array of 15-Hz triaxial geophones was used for downhole recording with a sampling rate of 0.25 ms in a vertical well that was located between the two horizontal treatment wells. The total array length was 229 m with variable receiver spacing: 15.25 m for the bottom eight receivers and 30.5 m for the top four receivers (Eaton et al., 2014).

Initial processing of the recorded data was performed by Engineering Seismology Group (ESG), and a total of 1660 events including 259 postpumping events were detected using the STA/LTA algorithm (Eaton et al., 2014). In this paper, we select 112 representative microseismic events from the event database. Figure 1 shows an example of raw (unfiltered) three-component (3C) waveforms for a detected microseismic event. The direct P-wave arrival is more distinct on the $z$-component, whereas the S-wave arrival is stronger on the $x$- and $y$-components.

## DATA PRECONDITIONING

Arrival-time picking algorithms can be applied to individual components of 3C data, or a combination of components. The use of a combination of components offers the capability to reduce data dimensionality by providing a single attribute, which can be useful in obtaining a unique estimate of P- and S-arrival times from the 3C data and also decreases the computational cost, thus making it more suitable for real-time monitoring applications. To increase the S/N by damping random noise, Saari (1991) uses the absolute value of the product of the amplitudes of 3C data as input to STA/LTA algorithms. Similarly, Oye and Roth (2003) use the stack of absolute values of 3C data as input to STA/LTA and AIC algorithms. Figure 2 shows the results from both approaches for the 3C example data. The quality of the product and stack of absolute values of 3C data is affected by the presence of strong noise in any of the components. The product suppresses the background noise efficiently but also significantly reduces the signal if it is present only on one of the components. For most of our analyses, we use the absolute amplitude stack because it is more effective for data with a low S/N.

The P- and S-wave signals can also be enhanced by transforming observed data into ray-centered coordinates. Transformation into ray-centered coordinates requires time windows that define the start and end of P- and S-arrivals. In the absence of picked initial arrival times, the start and end of these windows can be approximated using a polarization attribute-based criterion or the STA/LTA algorithm. Figure 3 shows unrotated and ray-centered-coordinate rotated waveforms for the example data (receiver 5 from the data shown in Figure 1). The accompanying spectrograms indicate an approximately 50 Hz dominant frequency for the P- and S-wave arrivals. The transformation into ray-centered coordinates maximizes the P- and S-wave amplitudes on the corresponding rotated waveforms (p and s1), respectively, and also improves the S/N and waveform quality (Figure 3).

Prior to the application of any time-picking method, we recommend S/N enhancement using a noise-filtering technique such as polarization filtering (Vidale, 1986; Reading et al., 2001), $f-k$ analysis (Maxwell et al., 2005), or wavelet-transform based denoising (Gaci, 2014). Here, we apply a minimum-phase bandpass filter (10–150 Hz) and a polarization filter (Vidale, 1986; Reading et al., 2001). Although band-pass filtering is useful for suppressing unwanted frequencies, it is less effective when signal and noise bandwidths overlap. On the other hand, polarization filtering focuses on temporal changes in the degree of polarization and suppresses the background noise, while preserving the character of linearly polarized signals.

## ARRIVAL-TIME PICKING METHODS

### Algorithms

Arrival-time picking algorithms can be classified into two main categories: (1) single-level and (2) multilevel (or array processing)-based algorithms. Single-level algorithms operate on single-component or multicomponent recordings from an individual receiver level, and thus they do not make use of data from other locations within the recorded receiver array. In contrast, multilevel algorithms, such as crosscorrelation-based algorithms, make simultaneous use of information on multiple receiver levels within the array. The single-level algorithms considered in this study can be further classified into window-based and nonwindow-based methods. The window-based algorithms require the specification of window size and location to compute microseismic data attributes for defining criteria for arrival-time picking. Many hybrid algorithms also exist in the literature, which combine information from different individual algorithms to achieve more accurate and precise arrival-time picks because none of the individual algorithms are optimal in all conditions (Sharma et al., 2010).

In this section, we review the theoretical background of some algorithms, discuss their parameters, and explain the workflow used to pick P- and S-wave arrival times for each algorithm. A summary of the methods reviewed and key references are given in Table 1.

#### Window-based single-level algorithms

##### STA/LTA ratio

Figure 4b shows the STA/LTA response for the unrotated components for the example data (Figure 4a). The absolute amplitude stack was computed from the unrotated ($x$, $y$, and $z$) components and used as input to generate STA/LTA response curves. The manually picked P- and S-arrivals (best estimates) are also shown with blue and red vertical lines, respectively. In this case, the onsets of STA/LTA for the local maxima are closer to the arrival times as compared with the time sample associated with the local maxima. We therefore pick on the maximum value of the derivative function of STA/LTA curve around the local maxima.

Window size plays a key role in the performance of STA/LTA algorithms. An STA window that is too short will result in improper averaging of the microseismic signal and meaningless noise fluctuations on the STA/LTA curve. On the other hand, if the LTA window is too long, it will obscure events that are closely spaced in time. The STA and LTA window sizes depend on the frequency characteristics of the microseismic waveform. Generally, the choice of STA window should be longer than a few periods of a typical microseismic signal, whereas an optimal LTA window should be longer than a few periods of the irregular noise fluctuations. A longer LTA window also makes the STA/LTA more sensitive to P-waves and is useful in events with a weaker P-wave compared to the S-wave signal (Earle and Shearer, 1994; Trnkoczy, 2002).

Figure 5a shows STA/LTA curves obtained using different window sizes. The apparent dominant period of the signal ($\tau dom=1/50Hz=0.02\u2009\u2009s$, which is equivalent to 80 time samples with 0.25 ms sampling interval) estimated from the spectrograms (Figure 3) is used to compute STA and LTA windows. The STA/LTA responses are shown for STA window sizes ({1, 2, 3}$\tau dom$) and for LTA window sizes ({3, 5, 10, 14, 15, 21}$\tau dom$). The STA/LTA shows spurious fluctuations in the case of STA and LTA windows that are too short ($\tau dom$ and $3\tau dom$, respectively). This occurs because the LTA window is not sufficiently long to give an average value of the local noise and is sensitive to noise fluctuations. A significant improvement is observed when a slightly longer LTA ($5\tau dom$) is used. This may nevertheless produce false picks in the case of data with low S/N. Increasing the STA window size to $(2\u20133)\tau dom$ reduces the sensitivity to noise, whereas increasing the LTA window size improves the averaging of local noise. Based on these considerations, we recommend an STA window size that is equivalent to $(2\u20133)\tau dom$ and an LTA window size that is 5–10 times the STA size. Similar values for the STA and LTA windows were suggested by Han (2010).

##### MER algorithm

Because the ER function is computed using post- and presample windows, the time index associated with the maximum of the MER represents the arrival-time pick. Figure 4c shows the MER curve for the example data. The P- and S-wave arrivals occur near the local maxima in the corresponding intervals.

Careful selection of window size is also important to achieve better results using the MER algorithm. Like STA, window size should be longer than a few periods of the microseismic signal to avoid false picks from noise fluctuations and to pick the signal changes properly. Figure 5b shows the MER curves for different window sizes ({1, 1.5, 2, 2.5, 3, 5}$\tau dom$). These curves exhibit clear local maxima for P- and S-arrivals for window sizes ($\tau dom-3\tau dom$), but the response deteriorates for a longer window size ($5\tau dom$). In the case of a low S/N, we recommend the use of longer windows ($2\tau dom\u22123\tau dom$) for greater stability because smaller windows will be more sensitive to noise fluctuations.

##### MCM

Figure 4d shows the MCM curve for the example data. Like STA/LTA and MER algorithms, the purpose of the leading window is to identify the signal changes. We therefore recommend a similar window size ($\tau dom-3\tau dom$) for accurate and stable arrival-time picks.

##### PAI-K

Küperkoch et al. (2010) recommend adaptation of the window length based on the frequency characteristics of the data. A window that is too short will yield a biased kurtosis estimate, resulting in false and early picks, especially for a low S/N. A longer window may provide picks that are beyond the real S-wave arrival, and a window that is too long will result in missing the significant amplitude variations (Lois et al., 2013). Figure 6a shows the PAI-K curves for different window sizes ({2, 4, 6, 8, 10, 12}$\tau dom$). The P-wave arrival is clearly visible in all cases, but the S-wave arrival is picked accurately only for the window size ($2\tau dom$). A delayed S-wave arrival pick results for other cases. The PAI-K curve for a window size of $2\tau dom$ tends to be unstable and may provide false picks for data with a low S/N because the background noise becomes non-Gaussian in many windows. For this reason, we recommend use of a window length $\u226510\tau dom$ to ensure higher values at the signal interval while suppressing the background noise.

##### S/L-Kurt

#### Nonwindow-based single-level algorithms

##### AIC algorithm

#### Hybrid approaches

Numerous hybrid approaches exist that merge single-level algorithms in different combinations. Anant and Dowla (1997) combine a wavelet-transform method with polarization attributes to pick P- and S-wave arrivals. Zhang et al. (2003) use the wavelet transform and AIC to confirm the occurrence of an arrival in the data interval and to pick the arrival time. Galiana-Merino et al. (2008) pick P-wave arrivals using a higher order statistics (kurtosis)-based criterion in the stationary wavelet domain. Akazawa (2004) uses STA/LTA, STA-LTA, and AIC to pick the P- and S-wave arrivals. Akram et al. (2013) combine STA/LTA with peak eigenvalue ratio from the post- and presample windows to pick P- and S-wave arrivals. Recently, Maity et al. (2014) presented a neural network-based approach with a number of attributes computed from microseismic waveforms to pick the P- and S-wave arrivals. In this paper, we discuss the wavelet-transform-based hybrid approaches (Anant and Dowla, 1997; Zhang et al., 2003), joint energy ratio (JER; Akram et al., 2013), joint STA/LTA-polarization-AIC method (Tan et al., 2014), and Akazawa’s hybrid picking workflow (Akazawa, 2004).

##### Wavelet-transform-based approaches

The discrete wavelet transform (DWT) is used in practice, and it can be implemented using Mallat’s (1989) algorithm. Two quadrature mirror filters, a low-frequency filter and a high-frequency filter, are applied to compute the DWT of level one of the input signal $x(t)$, with a subsequent reduction in the number of samples by two (Mallat, 1989; Zhang et al., 2003; Baranov, 2007). The wavelet coefficients for the high-frequency filter characterize the details of the signal, at different scales, whereas the approximations of the signal at different scales are represented by the wavelet coefficients for the low-frequency filter.

The Daubechies-5 (db5) wavelet is used for the wavelet transform. Our choice is based on the expected shape of a microseismic signal (Figure 7) because the wavelet basis functions that are similar to P- or S-wave arrivals will result in strong correlations in the wavelet scale, thus leading to quality enhancement of the picked arrivals. The P- (on $x$- and $z$-components) and S-wave arrivals ($y$-component) are compared with various wavelets (db2–db7) in the Daubechies family. The db5 wavelet shows a good match for the P- and S-wave arrivals.

AD’s method: We denote the arrival-time picking methodology presented in Anant and Dowla (1997) as AD’s method. For P-wave arrival picking, the workflow is described as follows:

Compute the wavelet decompositions for each of three data components.

- Compute the eigenvalues from the covariance matrix generated from the three decomposed components on a sliding window, for each scale. Using the largest and intermediate eigenvalues ($\lambda 1$ and $\lambda 2$, respectively), a rectilinearity function can be estimated:$F=1\u2212\lambda 2\lambda 1.$(15)
- Compute the composite rectilinearity function by combining the rectilinearity functions for each scale:where $j$ represents the number of wavelet decomposition levels. The position at which the composite rectilinearity function is maximum represents the P-wave arrival time. Figure 8b shows the composite rectilinearity function for the example data. Manually picked P-wave arrival time appears closer to the time index associated with the maximum value of composite rectilinearity function. However, there is not enough contrast between composite rectilinearity function responses for P- and S-waves, which can result in erroneous picking for data with low S/N.$CF=\u220fjFj,$(16)

For S-waves, the arrival-time picking workflow is described as follows:

Using the P-wave arrival information, rotate the microseismic data into radial and transverse components.

Compute the wavelet decomposition for each of the rotated components ($dt$ and $dr$ denote the decomposed transverse and radial components, respectively).

- Compute the ratio for each scale:where env is the envelope function, which represents the positive outline of input data, in this case, the decomposed radial and transverse components.$R=env(dt)env(dr)+env(dt),$(17)
- Compute the composite envelope function:where $j$ represents the number of wavelet decomposition levels. The S-wave arrival time is determined from the position of the first point after the P arrival time that has a value that is at least one-half of the maximum of $Ct$. Figure 8c shows the composite envelope function for the example data. The manually picked S-wave arrival time appears closer to the time index associated with the point that has a value that is at least one-half of the maximum of $Ct$.$Ct=\u220fjRj,$(18)

*ZTR’s method:* We denote the arrival-time picking methodology presented in Zhang et al. (2003) as ZTR’s method. The workflow for arrival picking is described as follows:

Compute the wavelet decomposition for each of the three data components.

Apply AIC on each scale. If the AIC values for all scales are close to each other and are inside a user-specified interval (in this case, we use 100 time samples), an arrival (P- or S-wave) is identified in the interval. The AIC value on scale two provides the preliminary arrival time.

Reapply AIC on the data components in a window surrounding the preliminary pick. The minimum of AIC determines the arrival time.

Zhang et al. (2003) apply this workflow to pick P-wave arrivals only, but the method can easily be extended to pick P- and S-wave arrivals if initial estimates of the corresponding windows containing the arrivals are known using other approaches. We propose the following steps to pick P- and S-wave arrivals directly from ZTR’s method:

Divide the data segment containing the microseismic event into short overlapping windows (in our case, 800 time samples, with 10% overlap). The size of the window is chosen considering the expected maximum S-P distance as well as the minimum expected distance between two closely spaced microseismic events.

Perform steps 1–2 from ZTR’s method on all windows and compute the preliminary arrival times.

Compute the energy in a small window ($\tau dom\u20132\tau dom$) surrounding the preliminary arrival times.

Pick two windows with the highest energy. The preliminary time of the earlier window is considered as the P-wave time and the later window as the S-wave time.

Reapply the AIC on the data components in a window surrounding the preliminary P- and S-wave picks. The minimum AIC determines the arrival time in the corresponding windows.

Selecting a single wavelet that performs well for the entire data set is a significant challenge for wavelet-transform-based algorithms, especially in the case of low-S/N data. It is recommended to perform a visual inspection and to compare the wavelet shape from the input data with a class of standard wavelets and then choose the best matching wavelet for wavelet-transform analysis. The length of the sliding windows for the eigenvalue computations in AD’s method is recommended to be $(1-1.5)\tau dom$.

##### JER

##### Joint STA/LTA-polarization-AIC (TYFH’s method)

A similar approach to the JER method was presented recently (Tan et al., 2014). We denote this method as TYFH’s method. This workflow can be summarized as follows:

Compute STA/LTA for the input data.

- Compute the degree of polarization for the input data in a sliding time window ($\tau dom\u20131.5\tau dom$):where $\lambda 1$, $\lambda 2$, and $\lambda 3$ are the three eigenvalues of the 3C data. Another function ($F$) is computed using $P$ to estimate the polarization state correctly (Moriya, 2008):$P=(\lambda 1\u2212\lambda 2)2+(\lambda 1\u2212\lambda 3)2+(\lambda 2\u2212\lambda 3)22(\lambda 1+\lambda 2+\lambda 3)2,$(21)where $P\u203e1=2\u2211i=n+1n+N/2Pi/N$, $P\u203e2=2\u2211i=n\u2212N/2n\u22121Pi/N$ and $N$ denotes the length of an averaging time window. The arrival time is indicated by the local maxima of $F$ in the corresponding time interval.$F=N2\u22123[12log(1+P\xaf11\u2212P\xaf1)\u221212log(1+P\xaf21\u2212P\xaf2)],$(22)
The STA/LTA and the polarization function are normalized and then multiplied to obtain a picking function $Q$. Initial estimates of P- and S-arrival-time windows are obtained from the local maxima of $Q$ in the corresponding intervals.

- Likelihood functions are computed usingwhere $am$ are the autoregressive model coefficients and $N$, $M$, and $k$ have the same meaning as in equation 12.$L=\u2212k\u2212M2log[1k\u2212M\u2211i=M+1M(X(i)\u2212\u2211M=1Mam1X(i\u2212m))2]\u2212N\u2212M\u2212k2log[1N\u2212M\u2212k\u2211i=k+1N\u2212M(X(i)\u2212\u2211M=1Mam2X(i\u2212m))2],$(23)
Finally, the normalized likelihood function is multiplied by $Q$, and the maximum of this curve indicates the respective arrival time.

Figure 8e shows the P- and S-wave arrival picking using TYFH’s method. The P- and S-wave arrivals are clearly represented on the response curve. The window size for the degree of polarization is recommended to be $(1\u20131.5)\tau dom$ because a longer window will include other arrivals and affect the degree of polarization at the P-wave interval. On the other hand, a smaller window may result in meaningless fluctuations.

##### Akazawa’s method

P-wave arrivals are picked using the following workflow:

First, STA/LTA is computed on the CE function shown in equation 25. The sample index ($m1$) associated with the maximum value is determined.

AIC is applied on the envelope function on the interval [1, $m1$]. The sample index ($m2$) associated with the minimum value is determined.

Another iteration of AIC is applied on the shorter interval [$m2$, $m1$].

The time index associated with the minimum value of AIC (step 3) is considered as the P-arrival time.

S-wave arrivals are then computed using the following workflow:

Forward and reverse STA-LTA differences are computed for the post P-wave arrival onset data. The sample index ($m3$) associated with the minimum value of the reverse STA-LTA, and the sample index ($m4$) associated with the maximum value of the forward STA-LTA is determined.

AIC is calculated for the interval [$m3$, $m4$].

The time index associated with the minimum value of AIC (step 2) is considered as the S-wave arrival time.

We use the same window-length parameters for STA/LTA as discussed earlier. The CE is effective for a data set with high S/N, such as strong-motion earthquake records, but it is less effective for low S/N data, such as microseismic data. To improve the performance of this algorithm on microseismic data, we replace the CE function with the absolute stack of waveforms from 3C data. Figure 8f and 8g shows the arrival-time picking using Akazawa’s algorithm. The P- and S-wave arrivals are represented clearly on the response curves and can be picked using the specified workflow.

#### Multilevel- or array processing-based algorithms

Multilevel- or array processing-based algorithms take advantage of similar characteristics of waveforms across an array of receivers or by analysis of multiple events using a single receiver. Common examples of multilevel algorithms for arrival-time picking include image-processing techniques (Criss et al., 2003; Mousa et al., 2011), global-optimization-based techniques (Chevrot, 2002), beamforming (delay and stack) of waveforms (Schweitzer et al., 2002; Rawlinson and Kennett, 2004), and crosscorrelation-based techniques (VanDecar and Crosson, 1990; De Meersman et al., 2009; Liu et al., 2009; Kapetanidis and Papadimitriou, 2011; Lou et al., 2013). In this paper, we discuss several waveform crosscorrelation-based techniques.

##### Waveform crosscorrelation methods

In the arrival-time picking process, a reference (pilot) waveform is obtained from the data recorded within the array, and then it is crosscorrelated with waveforms from all receiver levels in the array to determine the time delay. The selection of an appropriate pilot waveform is important because it defines the type of crosscorrelation algorithm (iterative or noniterative). The pilot waveform can be estimated and used in the workflow in the following ways (Bagaini, 2005):

A high-S/N waveform from a receiver level within the array is chosen as the pilot waveform.

A pilot waveform is obtained from the stacking of waveforms from all receiver levels within the array. These waveforms are aligned using initial estimates of arrival times prior to stacking.

The pilot waveform obtained using stacking is updated iteratively after time delay estimation and lag adjustments of waveforms within the receiver array.

The first two techniques are noniterative, and, as suggested in Bagaini (2005), these are less efficient than the iterative approach. Figure 9 explains the iterative workflow based on crosscorrelation (De Meersman et al., 2009), and we denote this workflow as DKV’s method. In this workflow, initial arrival times (either manually picked or using any of the previously described automatic algorithms) are used to align the microseismic waveforms. Before the computation of pilot waveform, all waveforms are rescaled to equalize to the pre-event noise level. A pilot waveform is then computed and correlated with all waveforms within the receiver array to update the time-shift. This process is repeated until the time delay converges to a value that is less than a user-defined threshold value ($\zeta $), which represents the optimal re-alignment of the input data.

An important parameter in this algorithm is the size of correlation window. A long window may affect the accuracy of crosscorrelations and may result in cycle skipping. If the initial estimates have large deviations from the true time, then it is reasonable to pick large windows covering the arrivals. Based on our experience, a window size $\u226410$ times the dominant period of the signal is recommended unless initial estimates have small deviations from true times, in which case a window $size\u22643$ times the dominant period of the signal is recommended to minimize pick errors. Crosscorrelation-based algorithms also perform better if all arrivals have similar polarities.

Instead of refining the initially picked arrival-time picks, crosscorrelation can also be used to pick the arrival times on the microseismic waveforms. Irving et al. (2007) present a crosscorrelation algorithm (here denoted as IKK’s method) that is based on the crosscorrelation of waveforms from all receiver levels in an array with a reference waveform with a known arrival time (manually picked). First, the reference waveform is crosscorrelated with other waveforms, and the waveforms are aligned using the lag value. A new reference waveform is then formed from the stacking of aligned waveforms and is again crosscorrelated with all waveforms. The process is repeated until the waveform alignment matches user’s specifications (Giroux et al., 2009). The method is semiautomatic because it requires picking on the reference waveform. This process can be cumbersome for large data sets.

A template-based approach (Plenkers et al., 2013), in which a high-S/N master event template is crosscorrelated with the continuous data stream to detect weaker similar events, can also be used to pick arrival times on similar events. However, the effectiveness of the arrival-time picks depends on the separation between the master event and the target event (Arrowsmith and Eisner, 2006; Song et al., 2010).

In this paper, we describe the results using DKV’s method for the refinement of initially picked arrival times and IKK’s method for picking arrival times by crosscorrelating with a reference waveform for which the arrival time is known (manually picked).

### P- and S-wave arrival picking framework

Depending upon the input data format (unrotated or ray-centered coordinate rotated data), numerous strategies can be formulated for the identification of P- and S-wave arrival times. Typically arrival-time picking is conducted on the extracted data around a provisionally detected microseismic event containing one or both of P- and S-wave arrivals. Therefore, simple assumptions that P- and S-wave arrivals occur only once in the extracted event data, and that the P-wave is the first arrival, can be quite effective in the case of raw input data. The orthogonal particle motions of P- and S-waves can also be used in validating the arrival picks. Figure 10 shows an example of arrival-time picking on unrotated input data using JER algorithm. However, a similar approach can be adopted for other algorithms. The main difference is the pick location (maxima, minima, or the onset before the extreme value). In a general sense, our picking strategy can be summarized as follows:

Compute the response curve from the 3C microseismic data using the methods outlined previously. If necessary, apply an EPS (Luo et al., 2002) filter on the response curve to enhance the transition between noise and noise plus signal. Find the time index ($ti$) associated with the global maximum of the response curve. In the case of EPS filtered response curve, the global maxima will be a string of values, and therefore the first value is picked as the global maximum.

Compute the polarization angles on sliding windows for the entire length of response curve using the 3C microseismic data. Find the time indices characterized by polarization angles orthogonal to the polarization angle ($\xb120\xb0$) observed at $ti$, in two equal presample and postsample windows (each smaller than the minimum expected event separation).

Select the index ($tj$) from the previous step representing the local maximum of response curve.

If the characteristic curve requires arrival picking on the onset value prior to the local maximum, compute the derivate of the function in a small window surrounding the local maximum. The positive maximum of the derivative function in the corresponding window then indicates the arrival time ($tj$). In the case of $tj>ti$, assign $tp=ti$ and $ts=tj$. Alternatively, if $tj<ti$, assign $ts=ti$ and $tp=tj$.

This framework can provide accurate and precise arrival-time picks on data with high S/N. However, the precision of picking results using this framework may be deteriorated in low-S/N data because the polarization angles and the attribute-response curve obtained from the unrotated input data are affected by the level of noise. Because we assume that each data segment for arrival picking contains only one event (a P-wave arrival and an S-wave arrival), our current picking framework will provide erroneous picks in the case of missing arrivals or multiple arrivals in the data segment.

Another approach described by Oye and Roth (2003) uses data rotated into ray-centered coordinates. The fact that the P- and S-wave amplitudes are maximized on p, s1, and s2 components allows easier identification of P- and S-wave arrivals on the respective component. In this framework, any of the previously described algorithms can be used to pick a P-wave arrival on p-component, and then an S-wave can be picked on a post P-wave arrival window on the s1-component. We provide a comparison of both approaches for the field microseismic data examples in the next section.

## RESULTS

To demonstrate the performance of arrival-time picking algorithms, we use pseudosynthetic and field microseismic data examples. The pseudosynthetic data (1200 waveforms) are generated by adding 100 Monte Carlo realizations of white Gaussian noise to a high S/N microseismic event (12 waveforms from all levels for the p-component) from the HFME experiment. Figure 11 shows the waveform data examples after the addition of synthetic noise. The best estimates of arrival times for P- and S-waves were obtained through manual picking on the actual high S/N microseismic event used for a pseudo-synthetic data set and on the field microseismic data (112 microseismic events). Arrival-time picking algorithms were then used, and their results are compared with manual picks for benchmarking and performance evaluation. The pick errors were obtained by subtracting the manual picks (best estimates) from the automatic picks. In the following sections, we present performance statistics (pie charts, mean, standard deviation, and skewness of pick errors) of arrival-time picking algorithms for the entire data as well as in specific intervals to provide more detailed analysis in terms of precision and accuracy.

### Pseudo-synthetic data

Table 2 shows the performance statistics (mean, standard deviation, and skewness of pick errors) of arrival-time picking algorithms for the entire pseudo-synthetic data and for the intervals of [$\u22122\u2009\u2009ms$, 2 ms] and [$\u221210\u2009\u2009ms$, 10 ms] pick error. For the entire data, higher skewness ($\gamma $) values associated with STA/LTA (3.56), MCM ($\u22128.01$), JER (4.81), Akazawa’s method ($\u22123.85$), and DKV’s method (4.10) suggest that majority of arrivals are picked either earlier or after the manually picked best estimates. Higher standard deviations ($\sigma $) associated with STA/LTA (0.15), MER (0.22), S/L-Kurt (0.12), AIC (0.10), AD’s method (0.12), ZTR’s method (0.12), and Akazawa’s method (0.13) show the imprecision of these algorithms, whereas higher mean ($\mu $) values of pick errors for MER ($\u22120.12$), ZTR’s method (0.04), STA/LTA (0.03), Akazawa’s method ($\u22120.03$), AD’s method (0.02), and AIC (0.02) show their poor accuracy. IKK’s method ($\mu =\u22120.0004$, $\sigma =0.0052$, and $\gamma =\u22120.328$) is found to be the most accurate, and precise relative to the other algorithms that were tested. Figure 12a, 12b, and 12c shows pick error versus S/N plots for each single-level, hybrid, and multilevel algorithms for the all of the pseudo-synthetic data. The majority of pick errors from each algorithm lies in the range of $\xb110\u2009\u2009ms$. As expected, the performance of these algorithms deteriorated with decreasing S/N. This is especially true for picks obtained using S/L-Kurt, AIC, AD’s method, and ZTR’s method, which show large deviations from the general trend. In comparison, pick errors are more concentrated at either earlier or later times for STA/LTA, MER, MCM, and Akazawa’s methods. This occurs because the addition of synthetic noise produced high amplitudes in some traces near the edges of the pseudo-synthetic data file (Figure 11); because the ER algorithms are sensitive to such noise fluctuations, picks are made near the edges of the data file, resulting in 0.4–0.6 s delayed picks. Hybrid algorithms, such as JER, JER-AIC, and TYFH’s method provided better accuracy and precision than the single-level algorithms, whereas other hybrid algorithms could not yield similar results. The reasons for the relatively poor performance of these algorithms are the use of degree of polarization and/or the wavelet transform. The degree of polarization is very sensitive in the presence of noise and may provide unstable results. On the other hand, it is difficult to choose a single wavelet function that is suitable for the entire data set, especially in the presence of complex waveforms, low S/N, and polarity fluctuations. Choosing a single number for decomposition levels is also difficult for the entire data set. In Figure 12c, DKV’s method uses JER picks as the initial guess and seeks to improve relative pick times, whereas IKK’s method uses a manual pick for a reference trace and computes the absolute arrival times. IKK’s method yields picks that are mainly concentrated within $\xb12\u2009\u2009ms$; large deviations from main trends are due to cycle skipping ($\xb110\u2009\u2009ms$, which is half the dominant period of signal in this data set). Despite that IKK’s method provides better results than DKV’s method, this may change with a different choice of initial guess (more accurate than the JER algorithm).

The pick errors were grouped into intervals ([$\u22122\u2009\u2009ms$, 2 ms], [$\u22125\u2009\u2009ms$, 5 ms], [$\u221210\u2009\u2009ms$, 10 ms], and ($\u2212\u221e$, $\u221210\u2009\u2009ms$] U [10 ms, $\u221e$))for assessing the precision of these algorithms. Figure 13 shows the pick error pie charts for all algorithms for these intervals. Table 2 shows the $\mu $, $\sigma $, and $\gamma $ values of pick errors in the intervals [$\u22122\u2009\u2009ms$, 2 ms] and [$\u221210\u2009\u2009ms$, 10 ms]. In the [$\u22122\u2009\u2009ms$, 2 ms] pick error interval, PAI-K, AIC, JER-AIC, Akazawa’s method, and IKK’s method outperform other algorithms, which means that these algorithms were able to provide more picks within $\xb12\u2009\u2009ms$ error. Other algorithms, however, provide pick errors with slightly higher standard deviations in the [$\u22122\u2009\u2009ms$, 2 ms] pick error interval. PAI-K, MCM, AIC, JER, JER-AIC, Akazawa’s method, and IKK’s method also perform better than other algorithms in the [$\u22125\u2009\u2009ms$, 5 ms] pick error interval. PAI-K, TYFH’s method, and IKK’s method are found to be most accurate, whereas PAI-K and AIC yielded the highest precision in the [$\u221210\u2009\u2009ms$, 10 ms] pick error interval. In total, 68.83% (11,563/16,800) of the arrivals were picked within $\xb110\u2009\u2009ms$ error using all algorithms. Because the dominant period of the signal for this data was 0.02 s, $\xb110\u2009\u2009ms$ error means that these algorithms were able to pick 68.83% of the arrivals within half of a dominant period of the true time.

For illustrative purposes, Figure 14a shows the computational cost for each single-level, hybrid, and multilevel algorithm. These values are based on a machine with i7-4720HQ CPU @ 2.60 GHz and 16 GB RAM size. Naturally, these values are expected to vary with different CPU architectures and with different implementations of these algorithms. The ER-based algorithms (STA/LTA and MER) are more computationally efficient than other algorithms. The large computation time associated with MCM arises from the application of EPS filter, which is a computationally expensive method. The computational cost of MCM without EPS filter, however, reduces to a level that is similar to STA/LTA. The algorithms based on wavelet decompositions are also computationally expensive because the computation time depends on the number of decomposition levels, as well as the window size, in the case of ZTR’s method. IKK’s method shows lower computation time, but this is a semiautomatic algorithm that requires picking on a reference trace, in which case, the total time can be high depending on the number of traces to be picked manually. The comparison of pick errors for all algorithms for this data set suggests that IKK’s method, Akazawa’s method, PAI-K, AIC, and JER-AIC provide more picks as compared with other algorithms in the [–2 ms, 2 ms] and [–5 ms, 5 ms] intervals, whereas other algorithms perform with a lower but similar precision (Figure 14b and 14c).

### Field data

Figure 15 shows the pick-error pie charts for all algorithms for field microseismic data examples (112 events) for [$\u22122\u2009\u2009ms$, 2 ms] and [$\u22125\u2009\u2009ms$, 5 ms] intervals. The pick errors are shown for cases with 3C data and rotated data as inputs. The use of rotated data provides significant refinement of the picks in these intervals, especially for S-waves. For the [$\u22122\u2009\u2009ms$, 2 ms] interval, the total number of picks for S-waves improves from 2963 to 4976, whereas for the case of the [$\u22125\u2009\u2009ms$, 5 ms] pick error interval, the number of picks improves from 4612 to 7015. In terms of precision, JER-AIC and IKK’s methods are more precise and show consistency with the synthetic data results. JER-AIC, as expected, outperforms AIC for picking P- and S-wave arrivals. Akazawa’s method picks the P-wave arrival with high precision as seen in the pseudo-synthetic data examples, but it provides poor-quality picks for S-wave arrivals in the two error intervals. S/L-Kurt outperforms the PAI-K algorithm, which is opposite of the results from pseudo-synthetic data. This may be due to very weak and complex waveform for P-wave on the field microseismic data and complex S-waveforms, which causes cycle skipping for the P-wave and misidentification of the S-wave as a P-wave arrival for PAI-K. The MER algorithm outperforms other ER-based algorithm (STA/LTA and MCM), especially in the case of rotated data, but performs similarly to STA/LTA and MCM when 3C data are used as input. ZTR’s method also performs well for the field data examples.

These results support a suggestion by Sharma et al. (2010) that none of these algorithms perform optimally on all data sets, but there are some consistent algorithms, which can provide more accurate and precise picks, for example, crosscorrelation-based approaches such as IKK’s and DKV’s methods; hybrid approaches such as JER-AIC and ZTR’s and Akazawa’s methods; and single-level algorithms such as PAI-K, S/L-Kurt, and AIC. The ER algorithms are very efficient algorithms in terms of speed and can thus provide reasonable pick errors. This characteristic of ER-based algorithms makes them more desirable for real-time microseismic data processing.

## DISCUSSION

### Applications to real-time monitoring

Real-time microseismic monitoring requires algorithms that are fast and are accurate and precise enough to obtain meaningful event locations. The ER-based algorithms, such as STA/LTA and MER are, therefore, more popular for use in real-time monitoring. The simple formulation and parameter settings of these algorithms give a useful advantage. Another advantage of an STA/LTA algorithm is that it can be simultaneously used for event detection and arrival picking. The MER algorithm can also be used for event detection if a threshold-based criterion is used in place of local maxima. The MCM algorithm without an EPS filter has the same computational cost as observed for the STA/LTA and MER algorithms, but it cannot be used as an event-detection algorithm because it requires a continuous increasing window. This algorithm also becomes computationally expensive with the inclusion of an EPS filter. Kurtosis-based algorithms (PAI-K and S/L-Kurt) are not as fast as STA/LTA but can improve the pick accuracy. Choosing the model order in an AIC algorithm is difficult and may not be optimal for a real-time monitoring scenario, but an AIC algorithm without an autoregressive model order (Maeda, 1985) could be considered. This algorithm, however, requires picking on rotated data or requires initial estimates of the arrival windows to perform optimally. The algorithms that are based on wavelet decomposition are computationally expensive because the computation time depends on the number of decomposition levels, as well as the window size in the case of ZTR’s method. These algorithms require selection of a single wavelet function and a number of decomposition levels that is suitable for the entire data set. These complexities in the parameter selection render these algorithms suboptimal for real-time monitoring scenarios. In comparison, other hybrid approaches, such as JER, JER-AIC, Akazawa’s method, and TYFH’s method, require relatively less computational effort and also reduce the errors in pick results. IKK’s method shows lower computation time, but this is a semiautomatic algorithm that requires picking on a reference waveform, in which case, the total time can be high depending on the number of traces to be picked manually.

Considerations of computational speed may be mitigated with the increasing availability of powerful computers. The STA/LTA is an obvious choice if only limited computational resources are available, but if powerful machines are available, other techniques, such as S/L-Kurt, PAI-K, JER, JER-AIC, Akazawa’s method, or TYFH’s method can be used. IKK’s and DKV’s methods can also be used on picks obtained from an STA/LTA algorithm.

### Applications to postacquisition processing

The focus of postacquisition processing is to achieve optimal results. We therefore recommend to the use of algorithms that provide the most accurate and precise arrival-time picks, such as hybrid approaches and crosscorrelation-based approaches. An important aspect of postacquisition processing is manual quality control. For example, IKK’s method can be used for cases in which a reference waveform can be picked using a single-level or hybrid algorithm, and the picked arrivals are later checked and corrected manually. Quality control can be performed on the picks obtained on other waveforms after crosscorrelation with the reference waveform. The quality of picked arrival times from an algorithm can also be improved by using data that are S/N enhanced through a noise-filtering approach or data rotation into ray-centered coordinate reference frame. Picking of P- and S-wave arrivals becomes much easier when rotated data components are used.

### Best practices

In this paper, we have presented guidelines for selection of parameters for numerous arrival-time picking algorithms (Table 1), together with best practices for data preconditioning. In the case of real-time monitoring, it is important to achieve reasonably accurate and precise picks in an efficient manner. We therefore recommend reducing the data dimension by using the stack or product of absolute amplitudes of the three data components. The STA/LTA algorithm is recommended for real-time monitoring because it is simple, fast, can provide pick results of reasonable quality, and can work simultaneously for event detection and arrival-time picking. However, if more computational resources are available, hybrid approaches or combining crosscorrelation-based approaches with STA/LTA may be applicable to real time monitoring. In the case of postacquisition processing, we recommend S/N enhancement of the data through various noise reduction techniques, such as polarization filtering, $f-k$ analysis, or wavelet-based denoising approaches. We also recommend rotating the data into ray-centered coordinates to further improve the P- and S-wave arrivals. The parameters for the selected algorithm should be carefully tested on a representative subset of recorded data before being applied to the entire data. A multilevel algorithm can be used to achieve better accuracy and precision in picked arrival times. We strongly recommend that each process be followed by a manual quality control step. A key consideration is that data quality plays a significant role on the effectiveness of any picking algorithm.

## CONCLUSIONS

Using pseudo-synthetic and field microseismic data examples, this paper provides a review of various single-level based, hybrid, and multilevel-based picking algorithms. The key parameters for each algorithm are discussed, including our recommendations for optimal parameter selections based on our analysis and experience (Table 1). The window size is a key parameter for the majority of algorithms and should be chosen based on the dominant period of the signal. We found that a window size equal to $(2\u20133)\tau dom$ is optimal for STA, MER, and STK computations, whereas $(1\u20133)\tau dom$ performs better for MCM. The LTA window should be between 5 and 10 times the STA window sizes, whereas the LTK window length should also be 3–7 times the STK window, and the PAI-K window length should be $\u226510\tau dom$ for efficient arrival picking. A window length of $(1\u20131.5)\tau dom$ is recommended for estimating the degree of polarization. Wavelet-transform-based approaches require careful selection of wavelet and the number of decomposition levels for the entire data set. It is often challenging to find a single wavelet that works well for an entire data set, especially in the case of low-S/N and complex waveforms. Autoregressive modeling methods, such as AIC, require the estimation of model order, which is obtained by trial and error. Because AIC picks the arrival time based on the global minimum for a data window, it should be applied on small time intervals containing the arrivals. Other algorithms, such as MER, JER, or PAI-K, can be used to find an initial estimate of P- and S-arrival windows.

We show that ER-based algorithms are more efficient in terms of computational speed when compared with other algorithms and can provide reasonably accurate and precise arrival picks. Therefore, these algorithms should work well in real-time microseismic data processing scenarios. On the other hand, wavelet-transform-based approaches are computationally expensive. Similarly, the use of an EPS filter in the case of the MCM algorithm significantly increases the computational time of the algorithm. Among single-level algorithms, PAI-K and MCM were found to be more precise for the entire pseudo-synthetic data considered here, whereas the PAI-K and AIC were more precise in the [$\u221210\u2009\u2009ms$, 10 ms] pick-error interval (Table 2). Other single-level algorithms perform with slightly lower, but similar, precision. TYFH’s method, JER, and JER-AIC provided more accurate picks among hybrid algorithms on pseudo-synthetic data. Among the wavelet-transform based approaches, ZTR’s method performed better than did AD’s method in terms of precision in the [$\u221210\u2009\u2009ms$, 10 ms] pick interval. Crosscorrelation-based approaches (DMV’s and IKK’s methods) were also found to be highly accurate and precise. For the field microseismic data set considered here, IKK’s method, ZTR’s method, JER-AIC, Akazawa’s method, and S/L-Kurt were found to be more precise. Akazawa’s method picks the P-wave arrival with high precision; however, S-wave picking was poor for the field data examples. We also found that the pick error for the S-wave reduces significantly when rotated data are used as input. We therefore recommend rotation of microseismic data into ray-centered coordinates as a preconditioning procedure.

Finally, our results support the view that none of these algorithms are optimal for all conditions. However, some of these algorithms, such as IKK’s method, JER-AIC, Akazawa’s method, PAI-K, and S/L-Kurt, are more accurate and precise in the majority of cases. Regardless of the accuracy and precision of an algorithm, an interactive quality-control process should always follow the automatic picking workflow to ensure the quality of arrival-time picks.

## ACKNOWLEDGMENTS

We would like to thank the sponsors of the Microseismic Industry Consortium (www.microseismic-research.com) for their support. E. Caffagni and ESG are thanked for providing the processed (detected) events for use in this paper. We also thank the anonymous reviewers for their valuable comments.