A source of ground-motion recordings in urban Los Angeles that has seen limited prior application is the Community Seismic Network (CSN), which uses low-cost, micro–electro–mechanical system (MEMS) sensors that are deployed at much higher densities than stations for other networks. We processed CSN data for the 29 earthquakes with M > 4 between July 2012 and January 2023 that produced CSN recordings, including selection of high- and low-pass corner frequencies (fcHP and fcLP, respectively). Each record was classified as follows: (1) Broadband Record (BBR)—relatively broad usable frequency range from fcHP < 0.5 to fcLP > 10 Hz; (2) Narrowband Record (NBR)—limited usable frequency range relative to those for BBR; and (3) Rejected Record (REJ)—noise-dominated. We compare recordings from proximate (within 3 km) CSN and non-CSN stations (screened to only include cases of similar surface geology and favorable CSN instrument housing). We find similar high- to medium-frequency ground motions (i.e. peak ground acceleration (PGA) and Sa for T < 5 s) from CSN BBR and non-CSN stations, whereas NBRs have lower amplitudes. We examine PGA distributions for BBR and REJ records and find them to be distinguished, on average across the network, at 0.005 g, whereas 0.0015 g was found to be the threshold between usable records (BBR and NBR) and pre-event noise. Recordings with amplitudes near or below these thresholds are generally noise-dominated, reflecting environmental and anthropogenic ground vibrations and instrument noise. We find nominally higher noise levels in areas of high-population density and lower noise levels by a factor of about 1.5 in low-population density areas. By applying the 0.0015 g threshold, limiting distances for the network-average site condition, based on the expected fifth-percentile ground-motion levels, are 89, 210, 280, and 370 km for M 5, 6, 7, and 8 events, respectively.

The Community Seismic Network (CSN) is a ground-motion network currently with more than 1200 three-component stations, mainly in Southern California (Clayton et al., 2011, 2020 http://csn.caltech.edu/), which are operated as a collaborative research effort between Caltech and the University of California, Los Angeles. Approximately 800 of those stations were considered in this work (the additional 400 have only recently become operational). CSN uses low-cost, three-component, micro–electro–mechanical system (MEMS) accelerometers (Phidget 1056-1) capable of recording accelerations up to twice the level of gravity. This amplitude level was chosen to ensure that large earthquake amplitudes would be recorded without clipping; as with any sensor, there is a trade-off between maximum amplitude recording capability and its sensitivity (e.g. a threshold level of 0.5 g would be four times as sensitive). The primary product of the network is the measurements of shaking of the ground or in buildings from a major earthquake.

In terms of its layout and configuration, CSN differs from other seismic networks in two principal respects. First, the sensors are spatially concentrated in parts of Southern California, including the San Fernando Valley, San Gabriel Valley, the Los Angeles basin, and Downtown Los Angeles. These areas have high population densities with substantial human and industrial activity and hence are culturally noisy. This noise takes the form of background ambient vibrations that have been found to depend on time (i.e. time of day, day of week, and season; Clayton et al., 2020). CSN is effective at capturing ground-motion characteristics over relatively short length scales due to its small station spacing (∼0.5 km). However, the instruments have relatively high noise levels compared to broadband seismometers and modern digital accelerometers. The instrument-related noise, coupled with the environmental (both anthropogenic and geological) noise levels of the current CSN station locations, limits the distance and magnitude range for which the data can be used for traditional applications, such as locating event hypocenters.

As with any other network, the effective noise levels of CSN stations located in the greater Los Angeles area are important because recorded earthquake ground motions are subject to amplitude sampling errors for low amplitudes. In the case of triggered instruments, amplitude sampling errors occur when the ground shaking level at a site falls below the trigger threshold. In the case of continuously recording instruments such as those in the CSN, amplitude sampling errors occur when signal amplitudes do not exceed environmental or instrument noise thresholds. This is typically the case at large distances and is more limiting for small magnitude events than for large magnitude events. For a magnitude-distance condition where the mean ground-motion amplitude is near the threshold, unusually strong motions that exceed trigger thresholds or that fall above the noise floor are recorded. However, weaker motions that do not exceed trigger thresholds or that fall near or below the noise floor are not provided. Accordingly, the ground-motion sampling problem associated with earthquake scenarios that are likely to generate motions below the trigger threshold is not that no records are obtained, but that the recorded ground motions become biased toward larger values.

The objectives of this study were to evaluate effective noise thresholds of CSN data based on the currently available recordings, to validate the recordings against those from higher-resolution sensors, and to make the data that is judged to be reliable available in a Ground-Motion Database used for ground-motion modeling projects (Buckreis et al., 2023a). Two types of noise thresholds are considered: (1) threshold between clear earthquake motions and signals recorded during earthquakes but for which no typical earthquake characteristics are visually apparent and (2) threshold between earthquake signals and 1–2 min of pre-event ambient vibrations. The ambient ground vibrations considered in these analyses represent the combined effects of environmental and cultural sources, and hardware-related sources (electronic, sensor and digitizer resolution, power). The thresholds presented here are averaged across the network and may differ from the effective thresholds located in specific geographical regions and for an earthquake occurring at a particular time.

Following this introduction, we provide background information on the CSN, the data produced by the network, and the events considered in this study. We then describe data processing and assignment of classes that indicate record quality, compare CSN data to data from other networks, analyze noise recordings from CSN sensors to evaluate spatial variations and amplitude thresholds separating usable from noise-dominated records, and identify usable distance ranges for CSN data for application in ground-motion modeling projects, such as Next-Generation Attenuation (NGA)-West3 (https://www.risksciences.ucla.edu/nhr3-nga-west3-home). Results of this study were previously presented in a project report (Stewart et al., 2023).

Over the duration of this project, the data development portion of which effectively concluded in September 2023, the CSN comprised 769 seismic station locations, most of which are in southern California (Clayton et al., 2020). In addition, there are 339 previously active but now decommissioned station locations, some of which produced data that are evaluated. Figure 1 shows the locations of CSN stations overlaid on a regional map that also shows stations from other regional networks (California Strong Motion Instrumentation Program, CSMIP; US Geological Survey, USGS; Southern California Seismic Network, SCSN).

CSN uses low-cost, three-component, MEMS accelerometers. The primary products of the network are measurements of shaking of the ground and upper floors in buildings. Each sensor records time series data in real time at 250 samples per second (sps), which are then downsampled to 50 sps. Prior to ∼2014, most CSN stations consisted of plug-in sensors that were attached to community hosts’ laptops and desktop computers. This deployment type no longer exists. After 2014, all CSN sensors are stand-alone devices deployed by a CSN field engineer who determines location and physical coupling with the floor.

For this study, we focused on all ground level and basement stations in southern California, each of which has been assigned an instrument housing code using guidelines provided in Table 6 of Consortium of Organizations for Strong—Motion Observation Systems (COSMOS, 2001). That table provides categories for classifying stations using two main categories (free-field and structural / array stations); within each category, a series of specific codes are provided. This information is provided as metadata accompanying the CSN sites in the ground-motion database (GMDB; Buckreis et al., 2023a). The applicable codes that were applied to CSN stations, including those on upper floors of buildings, are as follows:

  1. “04”—ground-floor in a one- to two-story building without a basement (1250 CSN stations)

  2. “05”—ground-floor in a larger structure (118 CSN stations)

  3. “09”—basement or underground in a large vault (27 CSN stations)

  4. “10”—upper levels of a structure (463 CSN stations)

While none of these conditions can be considered as “free-field,” experience has shown that instruments in small (in plan dimension) structures without embedment can reasonably approximate free-field conditions (Stewart, 2000). Such conditions correspond to stations in Group 04. Stations in 05 and 09 might be approximated as free-field depending on the depth of embedment (for 09) and plan size of the structure (for 05). Stations from groups 04, 05, and 09 are the ones used here. The difference between the 769 figure mentioned at the start of this section and the sum of 04, 05, and 09 is caused by the occurrence of multiple stations at a given site at the ground level or basement level.

Clayton et al. (2020) and Stewart et al. (2023) provide information on the deployment of CSN stations over time in different parts of the greater Los Angeles area, details of the MEMS sensors and their real-time communication with central computers, and locations where unprocessed CSN data are archived for public use.

Figure 2 shows the locations of 13 of 29 events considered in this study (the others are outside the limits of the map). We include events recorded by the network with M > 4. The stations that recorded the events are color-coded based on their date of deployment, which is important mainly because of the more robust instrument installations since 2014. Table 1 lists the events and their key attributes for engineering studies. Per NGA protocols (e.g. Contreras et al., 2022), seismic moment is taken from the global centroid moment tensor catalog (Ekström et al., 2012; https://www.globalcmt.org/) as are other moment tensor attributes with the exception of hypocenter location, which is taken from USGS (https://www.usgs.gov/programs/earthquake-hazards/earthquakes). The number of CSN records listed in Table 1 is the total number of records considered in this study, even if the data were ultimately deemed unusable. The number of non-CSN records is the number of processed records in the GMDB (Buckreis et al., 2023a).

Processing

The Next-Generation Attenuation (NGA) program has developed standard procedures for processing earthquake ground motions. The aim of these procedures is to minimize the effects of noise on recorded ground motions while optimizing the dynamic range for which a given recording can be considered to accurately represent the ground shaking at the site. The most recent procedures are described in the work by Goulet et al. (2021) and Kishida et al. (2020), although the main elements of the procedure were presented earlier in the work by Boore (2005), Boore and Bommer (2005), and Douglas and Boore (2011). The principal steps are as follows:

  1. Visual screening of records to remove signals that are noise-dominated. This is a critical step with CSN data, particularly for recordings from small M events located outside of the main instrumentation region in Figure 2.

  2. Identify noise and signal windows in the time domain.

  3. Compute normalized Fourier Amplitude Spectra (FAS) of the noise and signal windows. FAS are computed after zero-padding the end of the record to increase the number of data points to a power of 2. The normalization involves division of Fourier coefficients by the square-root of the signal duration to ensure consistent (duration-insensitive) Fourier amplitudes.

  4. Application of high- and low-pass acausal filters. High-pass corner frequencies (fCHP) were selected to ensure adequate signal-to-noise ratio and to remove numerical artifacts (wobble) from double integration to displacement. This involved iterations of fCHP selection followed by inspection of displacement time series and FAS. Final values of fCHP were those that minimized numerical artifacts while maximizing bandwidth (i.e. minimizing fCHP). Low-pass corner frequencies were selected as the smaller of 0.75 × fNyq or the frequency where signal-to-noise ratio falls below a threshold.

  5. Remove zero-padding in the filtered signal and baseline correct it, as needed, to remove drift.

The above steps were applied using a version of the processing code gmprocess (Hearne et al., 2019) modified to facilitate the above workflow (Ramos-Sepulveda et al., 2023). The workflow involves automated processing in gmprocess that produces initial estimates of fCHP that are then checked, and as needed adjusted, in a graphical user interface. Stewart et al. (2023) provide additional details and illustrations of the processing steps for CSN data. The procedures were applied to all CSN recordings from the events listed in Table 1. We also queried the GMDB for those events. Some of our events were not already in the GMDB, so we added those and processed the non-CSN data in a similar manner. The processing of non-CSN data was performed to enable between-network signal comparisons as described subsequently.

CSN data classification

In our evaluations of the CSN data, we observed three general categories of records. The “best” records (Broadband records, BBR) clearly reflect earthquake shaking, have waveforms where the different wave arrivals are evident, and exhibit only modest effects of noise. Records deemed unusable (Rejected, REJ) appear to be noise-dominated, generally based on visual inspection of time series, but sometimes from similar levels of signal and noise FAS. The intermediate case (Narrowband records, NBR) has the visual appearance of earthquakes, but the signal is of modest strength in comparison to noise, and the record bandwidths are relatively limited.

We developed criteria to distinguish different record categories for threshold analyses and data comparisons. After some trial and error, the category definitions are as follows:

  1. BBR: relatively broad usable frequency range from fcHP < 0.5 to fcLP > 10 Hz

  2. NBR: limited usable frequency range because corner frequencies do not meet the criteria for BBR (i.e. fcHP > 0.5 or fcLP < 10 Hz)

  3. REJ: visual evidence suggests seismic waves cannot be distinguished from noise

Figure 3(a), (b), and (c) show examples of records assigned as BBR, NBR, and REJ, respectively. In each case, the figures were generated by gmprocess. At the top of each column, an indication is given of whether the record “passed” or “failed” depending on the visual screening criteria described in the previous section. Corner frequencies applied in the filtering are those from automated algorithms, and as a result, there are cases where low-frequency numerical artifacts in displacement occur that were removed in subsequent processing. In Figure 3(a), the signal FAS exceed that of noise over a wide frequency range, resulting in a BBR assignment. In Figure 3(b), the signal FAS exceeds that of noise over a narrower frequency range—in particular, the values of fcHP are > 0.5 Hz, which causes the NBR assignment. In Figure 3(c), the signal FAS are generally similar to, or in some cases below, the noise FAS. Those relative amplitudes, along with the obvious effects of noise in the time series, are the reason for the REJ assignment.

Table 2 indicates the number of CSN individual-component recordings in each category for each of the 29 considered events. The number of usable ground motions for modeling purposes, which combine horizontal components typically as the median component (RotD50; Boore, 2010), is approximately 1/2 to 1/3 of the numbers shown in Table 1.

For some events (e.g. 2019 Searles Valley, 2019 Ridgecrest, and 2020 El Monte), large percentages of ground motions have usable bandwidth (BBR and NBR), whereas for others (2013 Weldon and Joshua Tree), all records are rejected based on the criteria presented in the previous section. Figure 4 shows that the correlation of ground motion with magnitude and distance is driving the classification of CSN records. In the upper portion of the plot (large magnitudes) and the close distance portion of the plot for M < 5 events, most records are BBR, whereas in the lower-right portions of the plot (M < 5 event and distances > 50–100 km) most records are REJ. This is also reflected in summary statistics for the dataset. Among events since 2018, large magnitude events and events generally closer than 70–80 km to the network (Malibu, Carson, Lennox, El Monte, Pacoima, Searles Valley, Ridgecrest, La Verne) have the following aggregate component record classifications:

  • Usable records (BBR and NBR): 5784 (1122 BBR, 4662 NBR)

  • REJ: 4024

The database as a whole, which includes many events with small magnitude and large distances, breaks down as

  • Usable records (BBR and NBR): 9950 (1446 BBR, 8504 NBR)

  • REJ: 11987

The effects of noise on CSN data due to the station location environments (cultural, geological) and instrument can be significant (e.g. Figure 3(c)). As such, it is important to establish a threshold level of ground motion that separates CSN earthquake signals (BBR or NBR) from noise-dominated data (REJ signals or pre-event noise). In this section, we address this question on a network-wide level using the ground-motion parameter of individual-component peak ground acceleration (PGA). Individual components are used in lieu of combinations of components (RotD50) because for some stations, individual components can have different classifications (e.g. one component may be BBR and the other NBR). Various intensity measures (IMs) were considered for the derivation of this threshold, and PGA was found to be the most effective. Only data from ground-level instruments (COSMOS codes 4 and 5) were considered for threshold analyses.

Threshold between BBR and REJ

We first evaluate the threshold ground-motion level that distinguishes BBR from REJ components, which is denoted PGAth. Figure 5 shows histograms of PGA for both groups (BBR and REJ) and the vertical red line shows the PGAth derived from the data. This threshold was identified iteratively by computing cumulative relative likelihoods from the BBR and REJ histograms for various trial values of PGAth. Denoting the cumulative relative likelihoods under the BBR curve with PGA > PGAth as A, and cumulative relative likelihood under the REJ curve with PGA < PGAth as R, a confidence index (Iconf) was computed as:

(1)

The value of PGAth that maximizes Iconf was selected as the preferred threshold value, which is 0.005 g. The corresponding value of Iconf is 0.94, which indicates that only 6% of records are misclassified as either usable (BBR) or not usable (NBR) with this threshold.

We separately evaluated PGAth using a binary logistic regression process, which is a method for developing a statistical model to estimate the probability of discrete binary outcomes from a process (e.g. will the outcome of a test be “pass” or “fail”?). This process, described by Stewart et al. (2023), produced essentially identical outcomes to the threshold analyses presented in Figure 5, and consequently, the details are omitted here for brevity.

It is noteworthy that the PGAth value of 0.005 g is nearly 20 times the noise level identified for Phidgets sensors of 0.00028 g (Clayton et al., 2020). There are several possible reasons for the differences. The effective noise levels of the sensors installed in the field are expected to be higher than what is reported in the work by Clayton et al., 2020, which used a relatively quiet site with low-level environmental ambient vibrations. The vast majority of CSN sites are in the geologically and culturally noisy Los Angeles, San Fernando, and San Gabriel basins, and Downtown Los Angeles; thus, noise levels at those sites will almost always be higher due to environmental conditions associated with the station location. PGAth is an average for the entire network and therefore incorporates the effects of noise in these station location environments.

Threshold between pre-event noise and usable records

The REJ signals considered in threshold analyses in the previous section consist of both ambient noise and earthquake shaking. The relative contributions of the two sources are generally unknown. The thresholds identified previously therefore serve to distinguish between signals recorded during earthquakes (and thus containing energy from earthquake shaking) that do or do not contain clear seismic features.

An alternative objective for threshold identification is to distinguish ambient noise lacking a seismic signal from earthquake signals. This is the subject of this section, with the non-earthquake signal taken from 1–2 min of pre-event noise windows from BBR and NBR records that also contain earthquake signal. The noise/seismic signal ratio threshold is identified by applying the Iconf threshold analysis.

Figure 6 shows histograms of PGA for the two groups. The top histogram contains component PGAs for earthquake signals (both BBR and NBR). The bottom histogram contains component PGAs for pre-event noise from BBR and NBR signals. The earthquake signals range from 0.00063 to 0.06 g (approximate 5%–95% interval) and have some overlap at the low end of the distribution with pre-event noise PGAs, which range from 0.0004 to 0.006 g. The index Iconf was computed using Equation 1 for different values of PGAth, and the threshold was identified as 0.0013 g by minimizing Iconf. This threshold is shown in Figure 6 as a vertical red line and the minimized Iconf value is 0.79. The lower value of the threshold PGA from these analyses is expected given the lack of earthquake shaking in the reference motions (bottom histogram) in Figure 6.

An important step in the evaluation of the usability of ground motions recorded by CSN stations for ground-motion modeling applications is to compare them with ground motions recorded by non-CSN/traditional network sensors that have been used in previous studies (i.e. NGA projects). Such comparisons are most robust when sensors from both networks share the same location and both record a given event at the ground level. The number of such sites is small (three) and the results of the comparisons (presented in Stewart et al., 2023) are inconclusive for a variety of reasons including the limited number of recordings per site and different soil–structure interaction effects.

As a result, we focus here on “proximate” CSN and non-CSN sensors, for which we consider stations that meet two criteria: (1) the stations are separated by ≤ 3 km and (2) the stations have the same surface geology, based on the statewide map by Wills et al. (2015). Station pairs that meet these criteria are mapped in Figure 7 (lines are drawn between paired stations).

For each station pair, a differential ground-motion IM is computed as:

(2)

where the “csn” subscript indicates the IM is from the CSN station and the “net” subscript indicates the IM is from the non-CSN station. Both IMs are taken from individual as-recorded components of ground motion (generally north–south and east–west). The average value of δ(lnIM) is denoted μδ. Figure 8(a) plots δ(lnIM) versus separation distance for cases in which the CSN records are BBR and the IM is PGA. The mean difference in this case is μδ = −0.12 with a standard error of the mean of 0.07. These results show that the CSN PGAs are on average slightly smaller than the non-CSN PGAs. Figure 8(b) shows the variation of μδ for response spectral accelerations at 5% damping (Sa) over the oscillator period range of 0.01–10 s. Data were only considered in the calculation of the binned means when both the CSN and non-CSN Sa values are within their usable ranges given the data filtering (i.e. the oscillator period T < 0.8/fcHP for both instruments). Also, only CSN stations with COSMOS code 4 were considered to minimize potential soil-structure interaction effects. The results in Figure 8(b) show a negative bias (CSN lower) for T < 1 s, but means of nearly zero at longer periods. The vertical red lines are the 95% confidence intervals on the means, which include zero for nearly all periods, indicating that the offsets from zero of μδ may not be statistically significant. Figure 8(c) shows positive bias (μδ∼ 0.2) for two significant duration parameters (times from 5%–75% and 5%–95% of the normalized Arias intensity). This indicates longer durations from CSN records, which is likely due to the higher signal noise affecting coda portions of records.

Figure 9(a) plots δ(lnIM) versus separation distance for cases in which the CSN records are NBR and the IM is PGA. The mean difference in this case is μδ = −0.28 with a standard error of the mean of 0.06. These results again show that the CSN PGAs are on average smaller than the non-CSN PGAs, but these differences are larger than with the BBR data. Figure 9(b) shows the variations of δ(lnIM) with period for Sa over the period range of 0.01–10 s. The results in Figure 9(b) show a negative bias (CSN lower) for T < 0.5 s. Within that period range, the offsets from zero of μδ are considered to be statistically significant because the 95% confidence interval does not include zero. Values of μδ for duration parameters (Figure 9(c)) are similar to those for the BBR case.

In summary, the results of the proximate sensor comparisons show that BBR CSN and non-CSN records are similar within the typical usable period range of PGA to ∼5 s, although values for T < 1 s are slightly and persistently negative. It is possible that the negative means at short periods are influenced by kinematic soil–structure interaction effects in the structures housing CSN instruments, but this would require more investigation to assess with confidence. Relative to BBR, the CSN NBR records have larger negative biases for PGA and for Sa for T < 0.5 s, which is expected because by definition, these records have a relatively limited frequency range and hence are missing portions of the seismic signal at low and high frequencies. As a result, we suggest that the criteria used to define BBR recordings can be used to identify usable CSN data for ground-motion modeling applications.

Our review and processing of CSN data from the 29 earthquakes in Table 1 revealed that 53% of the downloaded records from CSN servers are REJ, meaning that the signals have the visual appearance of noise. As shown in Figure 5, 95% of the component PGAs for these records are between 0.0004 and 0.01 g, which exceed the nominal instrument noise level of 0.00028 g (Clayton et al., 2020). This comprises a wide range of apparent noise levels across the dataset. In this section, following a brief literature review on ground vibration noise sources, we examine whether these different noise levels have systematic location dependencies.

Wilson et al. (2002) evaluated sources of seismic noise signals across different frequency bands, finding that low-frequency noise (< 0.1 Hz) is dominated by thermal or atmosphere-driven local slab-tilt effects, mid-frequency (0.1–0.3 Hz) noise levels are dominated by naturally occurring microseismic noise, and high-frequency (0.3–8 Hz) ambient vibrations are derived primarily from cultural sources. Lecocq et al. (2020) identified trains, airplanes, and industrial processes as contributing sources of cultural noise and documented their relationships to human activities, which decreased during the COVID-19 pandemic. Diaz et al. (2017) describe how impactful cultural events (e.g. rock concerts, fireworks, or football games) intensify noise signals. Whereas few of these prior studies report noise data in acceleration units, Clayton et al. (2020) report such information for CSN data recorded at a Los Angeles area school, which generally range from 0.0015 to 0.003 g. The lowest noise amplitudes occurred in evening hours and higher amplitudes occurred in mid-afternoon. For comparison purposes, an accelerometer high-noise model and low-noise model developed for high-quality accelerometers (typically Kinemetrics EpiSensor) provide noise amplitude estimates in the period range of 0.03–0.1 s of 2E-5 to 2E-7 g (Cauzzi and Clinton, 2013), which are several orders of magnitude smaller.

The noise from CSN instruments may have several contributing factors. One factor is instrument noise, which should nominally match the Phidget noise level of 0.00028 g (Clayton et al., 2020). Another factor is ambient noise (i.e. ground vibrations from cultural sources and microtremors). Observations from CSN technicians suggest that cultural sources from proximal construction, motors or generators, HVAC units, and transportation systems (traffic, trains) are common within the spatial domain of the CSN. Moreover, a relatively high baseline level of noise could be exacerbated within basins in relation to nearly continuous scattering of energy within the upper crustal sediment layers (Ma and Clayton, 2016). Whereas the cultural sources are likely to be spatially variable (more prevalent in industrial areas or areas with high population densities), the prevalence of sedimentary basins across most of the spatial domain of the CSN may make the noise levels relatively consistent.

To investigate spatial patterns of noise, we consider population density as an independent variable against which to assess variations. We realize that population density is an imperfect indicator of noise, as it does not account for a host of location-specific potential sources, but it is considered due to its availability and its potential to correlate broadly with noise sources associated with human activities / machines. Figure 10 shows a map of the portion of the greater Los Angeles region where the CSN sensors are concentrated. The map is shaded by population density (number of people per 100 m2), which ranges from 0 to 2000 (data from Depsky et al., 2022), and is based on 2020 census data.

The pattern of noise level with population density can be visualized by plotting the dependency, which is provided in Figure 11 for PGA using the REJ signal type. The population density trend is flat from 0 to 20 people/100 m2 at an average level of 0.0045 g that increases to 0.006–0.007 g at higher densities, which is relatively consistent with the BBR / REJ threshold identified earlier in the article. As shown by Stewart et al. (2023), similar population density trends are identified using an alternative noise metric of averaged Fourier amplitudes (averaging is across a defined frequency range) in place of PGA. That report also shows weaker trends with population density when pre-event noise signals are used in place of REJ records, and lower average PGAs of 0.0016 g (consistent with the threshold shown in Figure 6).

Our interpretation of these results is that noise amplitude trends with population density are too weak to be confidently applied to estimate spatially variable threshold ground motions across the study region. For this reason, the network-wide average levels are used to evaluate usable distance ranges in the next section.

A typical step in ground-motion model (GMM) development is to apply data screening criteria, which are used to identify the data that will be considered in model development. One aspect of data screening is to minimize the effects of amplitude sampling errors from low-amplitude ground motions.

In the case of continuously recording instruments such as those in the CSN, amplitude sampling errors occur when signal amplitudes are not stronger than the instrument noise threshold. For a magnitude–distance condition where the mean ground-motion amplitude is near or below the threshold, unusually strong motions that exceed the threshold are recorded. However, weaker motions that fall near the noise floor are not available. Accordingly, the problem is not that no records are obtained for such conditions, but that the recorded ground motions become biased as a population toward larger values.

To overcome this problem, data selection criteria are applied during GMM development so as to screen out data that are potentially subject to sampling bias. In effect, for a given sensor network, these criteria provide the maximum source-to-site distance (Rmax) that can be used with confidence. In the NGA-West2 project (Bozorgnia et al., 2014), conservative criteria were used consisting of magnitude-Rmax relations that depended on instrument type (analogue; older, low-resolution digital; modern, high-resolution digital). For the case of modern digital instruments, one such relation provided Rmax values ranging from 230 to 400 km for M3.0 to 6.25 events.

Here, we establish a predictive model for Rmax for CSN data applicable to the greater Los Angeles urban region. This is done by establishing the distances for which few (< 5%) ground motions from a given event would be expected to fall below a threshold that is applicable to the network. Ground-motion recordings with distances smaller than Rmax could be used with confidence in GMM development because a nearly full statistical distribution of ground motions would be sampled.

To estimate Rmax, we plot in Figures 1213 the distance variation of the fifth percentile PGA (mean minus two within-event standard deviations) for five magnitudes (M 4, 5, 6, 7, and 8) and two site conditions (VS30 = 300 and 760 m/s). The Rmax value is the distance at which the PGA curve intersects the threshold PGA. Two thresholds are considered: (1) 0.005 g, which is based on the BBR / REJ threshold analyses (Figure 5) and (2) 0.0015 g, which was selected based on the seismic / noise threshold analysis (Figure 6) and the mean noise amplitudes (Figure 11). Additional refinements of the limiting distances are possible if the event term for an earthquake is known (used to adjust the GMM prediction across all distances and site conditions up or down) or for specific site conditions.

In past work, Rmax was estimated using truncated regression of data from a specific region and for a specific network (e.g. Contreras et al., 2022). The truncated regression was performed to develop a regional GMM that was used to develop IM—distance curves like those shown in Figures 12(a) and 13(a). Truncated regression of regional data was not needed in this case because the Boore et al. (2014) path model has been demonstrated to perform well for southern California data from arrays with sensitive instruments for which truncation is not at issue across the distance range considered (Buckreis et al., 2023b; Nweke et al., 2022). Accordingly, the Boore et al. (2014) model was used without modification to produce the distance attenuation curves shown in Figures 12(a) and 13(a). The resulting values of Rmax are plotted in Figure 12(b) and 13(b).

As shown in Figures 12(b) and 13(b), the resulting CSN Rmax values are smaller than the recommendations from the work by Boore et al. (2014) for modern digital instruments and are similar to the prior recommendations for analog and older digital events (for M > 5 events).

Fits to the RmaxM data are provided using the following expression,

(3)

where Mh = 5.5 and c1, c2, and c3 are coefficients estimated using least-squares regression. Coefficients are provided in Table 3 for the VS30 = 300 and 760 m/s site conditions considered above, and also 400 m/s, which is the median condition across the CSN station network. The fits for the VS30 = 300 and 760 m/s site conditions are shown in Figures 12(b)–13(b).

Figure 14 shows the variation of PGA with distance for five earthquakes (2014 La Habra and Westwood, 2019 Searles Valley and Ridgecrest, 2020 El Monte) along with the predictions of the Boore et al. (2014) GMM (median ± two standard deviations). CSN data shown on the plot are BBR motions only. Recordings from all other networks for these events from the GMDB are plotted with gray symbols. The two threshold PGA levels defined earlier are shown with horizontal lines. Figure 14 shows that the data generally lie above the thresholds. The GMM mean −2 standard deviations curve intersects the thresholds at a distance beyond the maximum distance of BBR stations for the lowest threshold but would cause the data to be screened out for the upper threshold. This suggests that the 0.005 g threshold may be too restrictive and that the 0.0015 g threshold is more effective.

This article described the results of a series of tasks that collectively aimed to provide insight into the performance of CSN ground-level sensors located in the greater Los Angeles urban region during southern California earthquakes, provide processed data for inclusion in the GMDB, and provide recommendations on the range of conditions for which the data can be used with confidence in ground-motion modeling.

CSN data from 29 earthquakes were uniformly processed using NGA-type procedures. For events where data from other networks were already available, the CSN data were added to the GMDB to supplement the previously available data. For events not previously in the database, CSN and non-CSN data were processed and added to the database. Relevant site and event metadata were compiled and added so that these data are available for public use.

Among the events considered, approximately 50% of the recordings were judged to be not usable because they are noise-dominated based on visual inspection or have unusual features. These are referred to as REJ records. However, this rate is potentially misleading as an indicator of network performance because 27 of the 29 events were small magnitude (< 5.5) and often occurred at considerable distances from the network. Two large events (2019 Searles Valley and 2019 Ridgecrest) were successfully recorded by more than 95% of sensor horizontal components, despite being located at distances > 150 km. This rate of data recovery is considered more representative of the performance that can be expected in future impactful earthquakes in the greater Los Angeles area.

Among the remaining (non-REJ) recordings, we distinguished records with relatively broad bandwidth (usable Fourier frequency range of at least 0.5–10 Hz) (denoted BBR) from those with relatively limited bandwidth (narrower than that for BBR at one or both ends of the frequency range; denoted NBR). Comparisons of BBR and NBR signals with signals from non-CSN proximate sensors (separation distance < 3 km and same geology) showed that PGA levels were similar. Spectral accelerations from BBR CSN data appeared to be unbiased over the oscillator period range of 0.01–5 s based on these comparisons, whereas NBR CSN data had lower spectral accelerations for T < 0.5 s that are statistically significant. This is expected given the limited bandwidth of NBR signals.

Comparisons of BBR and REJ data indicated that 0.005 g is an average but possibly overly restrictive threshold acceleration across the network, whereas comparisons of pre-event noise signals with usable signals (BBR and NBR) indicated an effective threshold taken as 0.0015 g. These thresholds represent averages over broad areas; we anticipate that specific locations could produce higher or lower thresholds. Using these threshold accelerations with a calibrated GMM for southern California, limiting distances were provided as a function of magnitude and site condition. The limiting distances derived for CSN were notably lower than those for more sensitive instruments as used in prior research, but consistent with those for analog or low-resolution digital sensors.

These results show that CSN data are useful for research and engineering applications, but their range of applicability is more limited than data from more sensitive instruments. Within their application range, the CSN data have advantageous features, including relatively small between-sensor spacings that facilitate site response or ground-motion variability studies at short length scales.

Recorded strong-motion data for CSN and non-CSN stations were accessed from the CSN webpage: http://csn.caltech.edu/data (last accessed 14 April 2023) (Community Seismic Network, 2024) and the Incorporated Research Institutions for Seismology (Trabant et al., 2012) (last accessed August 2023), respectively. The CSN has an International Federation of Digital Seismograph Networks (FDSN) code of CJ (California Institute of Technology (Caltech), 2010)). Processed data and associated metadata developed as part of this study are provided by Mohammed et al. (2024).

The work presented here represents the views and opinions of the authors and does not reflect the policy, expressed or implied, of the State of California.

Helpful input was received during the project from Eric Thompson, Scott Brandenberg, and Maria Ramos-Sepulveda. The authors thank three anonymous reviewers for the input, which has improved the article.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this study was provided by the California Strong Motion Instrumentation Program (Contract no. 1021-006). Partial support for the first and second authors was also provided by the UCLA and USC Civil & Environmental Engineering Departments, respectively. This support is gratefully acknowledged.