Prediction-error filters (PEFs) are essential for seismic deconvolution and other geophysical estimation problems. We find that nonstationary multidimensional PEFs can be computed in a “streaming” manner, wherein the filter is updated incrementally by accepting one new data point at a time. The computational cost of estimating a streaming PEF is reduced to the cost of a single convolution. In other words, the cost of the PEF design while filtering is equivalent to the cost of applying the filter. Moreover, the nonlinear operation of finding and applying a streaming PEF is invertible at a similar cost, which enables a fast approach for missing data interpolation.

Prediction-error filters (PEFs) play an essential role in geophysical data analysis. They can be used directly in different forms of seismic deconvolution (Webster, 1978; Robinson and Osman, 1996). More importantly, PEFs are a practical approximation for the covariance operators used in geophysical estimation problems (Claerbout, 2014). As such, they provide a way to absorb and characterize statistical patterns (Claerbout and Brown, 1999).

Because most natural patterns are nonstationary, extending the classic stationary formulation to nonstationarity is vital. This extension can be done either by patching, that is, breaking data into overlapping windows, or by a smoothly nonstationary estimation with regularization (Crawley et al., 1999; Curry, 2003). Shaping the regularization provides a particularly effective method for constraining the filter variability (Fomel, 2009; Liu and Fomel, 2011; Liu et al., 2012). Both approaches have an additional computational expense, particularly in storing variable filter coefficients (Ruan et al., 2015).

This paper presents an efficient approach for computing and applying nonstationary PEFs without excessive storage requirements. Using an adaptive-filtering approach (Widrow and Stearns, 1985; Haykin, 2002), we show that a nonstationary PEF can be updated on the fly by accepting one data point at a time. The cost of computing and applying such a PEF is reduced to the cost of a single convolution. In other words, the cost of the PEF design while filtering is equivalent to the cost of applying the filter. Moreover, the nonlinear operation of finding and applying a PEF has an exact inverse, which finds an immediate application in missing data interpolation problems. We extend the filter from one dimension to multiple dimensions using the helix transform (Claerbout, 1998) and show its application to simple benchmark problems.

We have implemented the proposed algorithm in the Julia programming language (Bezanson et al., 2017). For clarity, the essential code is included in the paper. All presented examples are reproducible using the Jupyter Notebook (Kluyver et al., 2016; Granger and Pérez, 2021) provided with the paper.

Theory

The basic equation defining a PEF is autoregression. For a given data stream d and a 1D PEF a, the autoregression equation takes the form
(1)
where it is assumed that nk. Equation 1 represents the convolution of the data with the PEF.

Suppose the filter is updated with each new data point dn+1. The additional constraint we can impose to control the variability of the filter is that the new filter a stays close to the previous filter a¯.

The two conditions can be combined into one overdetermined linear system:
(2)
where λ is the parameter that controls how much we allow a to deviate from a¯. In a shortened block-matrix notation, we can rewrite the linear system of equations as
(3)
where
(4)
and I is the identity matrix.
The least-squares solution of the overdetermined system is
(5)
Next, we can use the Sherman-Morrison formula from linear algebra (Hager, 1989) to transform the inverse matrix as follows:
(6)
Substituting the Sherman-Morrison equation and doing algebraic simplifications lead to the final result:
(7)
The Sherman-Morrison technique is a well-known approach in adaptive filtering by recursive least squares (Haykin, 2002).
This equation shows that the filter is updated by subtracting a scaled version of the data. The scale is proportional to the convolution residual computed using the previous version of the filter. Updating the filter requires only elementary algebraic operations (vector dot products) and no iterations. Moreover, computing the dot product dTd in a streaming fashion requires at most two multiplications:
(8)
where k is the number of points in a and d.

Inversion

The streaming residual rn+1 from the left side of equation 1 can be expressed as
(9)
or substituting the equation for a,
(10)
Equation (10) can be directly inverted to reconstruct dn+1 as follows:
(11)
We see that streaming time-variable deconvolution is invertible: the input data and the time-varying filter can be reconstructed from the streaming residual. Note that because of the division by λ2, the inverse operation may become numerically unstable for small λ. The forward and inverse filtering is implemented in Program 1.

Multiple dimensions

To extend the filter from one dimension to multiple dimensions, we follow the helix transform of Claerbout (1998). The change is minimal: the multidimensional data are arranged in a 1D stream on a helix and the definition of the data vector d changes to
(12)
where l1,l2,,lk represent the lags of the helical filter. The computational cost and other benefits of streaming PEFs remain the same. The implementation is provided in Programs 2 and 3.
One caveat is that, in multiple dimensions, we may want the nonstationary filter to change smoothly along the helix and other dimensions. As shown in the following, this goal can be accomplished with only a minor change in the algorithm. Suppose that a1 is the previous filter in the first (helical) dimension and a2 is the previous filter in the second dimension. We can keep the newly estimated streaming PEF close to both filters by changing the matrix equation to
(13)
where λ1 and λ2 control the filter variability in two directions. The least-squares solution of this system is
(14)
where λ2=λ12+λ22. The new equation is equivalent to the previous equation with the simple substitution:
(15)
Thus, the only change in the algorithm is an increase in the storage to keep track of a1 and a2. The extension to more dimensions is analogous.

Cost comparison

Table 1 compares the computational cost of different approaches for computing PEFs. The cost of a streaming PEF is minimal and reduces to the cost of a single convolution. Streaming PEF can capture the input data’s nonstationary character without storing multiple copies of the filter.

Moreover, all streaming computations are local, which makes them particularly suitable for modern hardware accelerators such as general-purpose graphics processing units (Sanders and Kandrot, 2010) or Intel Xeon Phi (Jeffers and Reinders, 2013).

Missing data interpolation

The existence of the exact inversion for the application of a streaming PEF allows for a straightforward approach to missing data interpolation. When reading data in a streaming fashion, every time we meet a missing data point dn+1, we can replace its value with a value computed from the residual. The residual value rn+1, in this case, can be set to zero or a random number (white noise) with the expected variance of the residual. In the latter case, it is possible to generate multiple equiprobable distributions for the interpolation result (Clapp, 2000). The missing data interpolation is implemented in Program 4.

A simple 1D interpolation example using data from Figure 1 is shown in Figure 2. As evident from Figure 2, such interpolation, although capable of picking the nonstationary data pattern, remains one sided and may create discontinuities on the other side of the data gap. To help with this issue, we can rerun the streaming PEF using the opposite direction and stack the results.

Figure 3 shows a 2D missing data interpolation test using a synthetic pattern from Claerbout and Brown (1999). The input data for this test are shown in Figure 4b. The interpolation proceeds by running the streaming PEF twice in forward and backward directions (Figure 3a and 3b) and averaging the interpolation results (Figure 3c). The cost of such a procedure is the cost of two convolutions, as opposed to multiple iterations required in the conventional approach to missing data interpolation with PEFs (Naghizadeh and Sacchi, 2010; Liu and Fomel, 2011; Claerbout, 2014). A better interpolation result, shown in Figure 4c, is achieved by filling the residual inside the gap with small-variance white noise instead of zeros while reconstructing the data from the residual.

The random residual approach can generate multiple equiprobable realizations for the missing data interpolation problem in the spirit of geostatistical stochastic simulations (Clapp, 2000). Figure 5 shows three realizations created using different seeds for the pseudorandom number generator.

Figure 6 shows a similar interpolation test applied to 2D data with a nonstationary pattern. Although streaming PEF fails to achieve a perfect reconstruction in this case, it manages to capture most of the pattern and hide the location of the gap.

Figure 7 shows a similar test applied to 2D seismic data. The pattern made by reflection events with variable slopes is well captured.

Removing the dominant reflection events by PEF filtering leaves weaker hyperbolic diffraction events (Figure 8).

A streaming application, wherein the adaptive filter is updated one data point at a time, is appropriate in a continuous stream of a large amount of data, such as in passive monitoring of carbon storage (Alumbaugh et al., 2024). In more typical scenarios, wherein the data are immediately accessible, more accurate results can be achieved with other forms of regularization, such as regularizing adaptive filter coefficients with smoothing shaping operators (Fomel, 2009; Liu and Fomel, 2011; Liu et al., 2012). For greater efficiency, a hybrid regularization strategy can be developed. Such strategies are discussed by Geng et al. (2024), who also extend the streaming approach to other applications of seismic attributes.

In applications such as missing data reconstruction, the streaming approach represents an extreme point of the accuracy-efficiency trade-off. To achieve better accuracy, some efficiency can be sacrificed at the expense of performing extra iterations. The presented approach is also limited to the situations of missing values in regularly sampled data and will need to be modified for situations of irregular data.

In multidimensional applications, the helical boundary conditions allow for easy invertibility (Claerbout, 1998) but are not always appropriate. In this case, extra accuracy can also be bought at the cost of sacrificing some of the efficiency.

We hope that providing reproducible benchmarks with this paper will invite the reader to make direct comparisons with more advanced methods. Such comparisons are beyond the scope of this paper because they would require a different codebase.

We have presented an efficient approach for computing and applying nonstationary PEFs. Instead of storing multiple copies of varying filters, the streaming approach stores only one copy and updates the filter on the fly with every new data point. The cost of this procedure is equivalent to the cost of a single convolution and does not require multiple iterations. Moreover, the nonlinear operation of estimating and applying a streaming PEF has an exact inverse, which becomes helpful in missing data interpolation problems. A streaming approach for missing data interpolation does not require iterations and can be accomplished effectively at the cost of two convolutions. We anticipate many possible applications of the proposed technique in geophysical estimation problems.

All examples in this paper are reproducible using the attached Jupyter Notebook.

We thank R. Clapp, S. Levin, Y. Liu, and R. Sarkar for inspiring discussions about prediction-error filters. We thank the sponsors of the Texas Consortium for Computational Seismology (TCCS) and the Stanford Exploration Project (SEP) for their financial support.

The data associated with this research are available and can be accessed via the following URL: https://doi.org/10.5281/zenodo.11099632.

Biographies and photographs of the authors are not available.

Supplementary data