Rapid assessment of an earthquake’s impact on the affected society is a crucial step in the early phase of disaster management, navigating the need for further emergency response measures. We demonstrate that felt reports collected via the LastQuake service of the European Mediterranean Seismological Center can be utilized to rapidly estimate the probability of a felt earthquake being high impact rather than low impact on a global scale. Our data‐driven, transparent, and reproducible method utilizing Bayes’ theorem and kernel density estimation provides results within 10 min for 393 felt events in 2021. Although a separation of high‐ and low‐impact events remains challenging, the correct and unambiguous assessment of a large portion of low‐impact events is a key strength of our approach. We consider our method as an inexpensive addition to the pool of earthquake impact assessment tools, one that is fully independent of seismic data and can be utilized in many populated areas on the planet. Although practical deployment of our method remains an open task, we demonstrate the potential to improve disaster management in regions that currently lack expensive seismic instrumentation.

An urgent question that decision makers and emergency response operatives are facing in the immediate aftermath of felt earthquakes is whether considerable impact on the affected population is to be expected or not. Although a sophisticated answer to this question is crucial to successful long‐term disaster management, a preliminary characterization of the situation based on rapidly available, though crude, information in the very first minutes after an earthquake is equally important, because it can determine whether emergency response measures will be initiated or relinquished in the first place.

Rapid impact assessment systems such as PAGER (Jaiswal et al., 2010; Wald et al., 2010) are based on the ShakeMap methodology (Wald et al., 2005) and typically provide the first quantitatively reliable estimate of expected impact such as financial losses, destroyed and damaged buildings, and number of casualties. The ShakeMap methodology requires a description of the earthquake source, ground acceleration data from a dense strong‐motion network, and/or macroseismic intensity observations collected, for example, via the “Did You Feel It?” service of the U.S. Geological Survey (Wald et al., 1999). Consequently, it takes on average 30 min until a first impact assessment is available after an earthquake (Wald et al., 2010), although in rare cases this number can be as low as 5 min if a dense, regional real‐time strong‐motion network is operated (e.g., Poggi et al., 2021).

Complementary to this indispensable framework, we suggest an approach in which we circumvent the intermediate step of a ShakeMap and classify an earthquake as either high impact or low impact solely based on the reported level of shaking inferred from felt reports collected globally by the European Mediterranean Seismological Centre’s service LastQuake (Bossu et al., 2016, 2018). Because LastQuake felt reports are collected numerously and fast via websites, and a smartphone application (50% of felt reports are collected within 10 min after an event), our method is independent of seismic instrumentation and can provide a rapid, preliminary characterization of the situation before more sophisticated and quantitatively reliable estimates from ShakeMap‐based approaches are available. Furthermore, results of our model may still serve as an independent control mechanism, in which large deviations between our results and later approaches could indicate under‐ or overestimation by either of the methods. Although independence of seismic data makes our method applicable in poorly instrumented regions, a technology‐affine and participating population is a limiting necessity.

We use a dataset comprising over 1.5 million globally collected felt reports from over 10,000 earthquakes of any magnitudes between 2014 and 2021. Our easily comprehensible method utilizes Bayes’ theorem and kernel density estimators (KDEs), and is therefore fully data driven, transparent, and reproducible. We discuss our model using well‐reported validation earthquakes from 2022, that is, earthquakes that occured in 2022 and have not been used to calibrate the model and therefore allow an estimation of the model performance on future earthquakes.

Over 1.5 million felt reports collected globally from over 10,000 felt earthquakes (i.e., earthquakes with at least 10 felt reports) between 2014 and 2021 form our data foundation (Bossu et al., 2023). One felt report comprises a pseudointensity value that quantifies the level of shaking, the timing, and the report location. We use the term pseudointensity, because the reported value is inferred from mapping a single macroseismic observation to the EMS‐98 macroseismic scale (Grünthal et al., 1998), whereas a true, quantitatively reliable macroseismic intensity would be obtained from averaging multiple observations across a region. Reliable information can only be obtained from felt reports when collected in large numbers and if pseudointensity values are averaged spatially. Because the magnitude of an event might be unknown by the time when first felt reports are available, and because we focus on the impact rather than the physics of earthquakes, we include earthquakes of any magnitude. An example collection of felt reports for the Mw 5.7 event in Bosnia and Herzegovina on 22 April 2022, comprising 14,000 felt reports, the first 50 of which were collected within 95 s, demonstrates the capability of the LastQuake collection procedure to meet the preceding requirements (Fig. 1).

The key goal of this study is to develop a probabilistic model that classifies an earthquake as either high impact or low impact based on felt reports rapidly after an event. We briefly outline here the dataset preparation, and additional details are found in the supplemental material S1. In a first step, we transform felt reports into representative features such as the average pseudointensity I or the average distance R of reporting locations to the barycenter (their geometric centroid). A list of all considered features is provided in Table S1, available in the supplemental material to this article. To assure rapid availability and accuracy of the predictive features, we derive them from the first 50 felt reports that are available for an earthquake and remove the event from the database if it has fewer reports. In a second step, we label earthquakes as high impact or low impact based on impact measures that are documented in the NCEI/WDS Global Significant Earthquake Database (GSED), the Emergency Events Database (EM‐DAT), and the Earthquake Impact Database (EID, see Data and Resources). We define an event as high impact if it caused at least one of the following impacts:

  • at least 1 destroyed building;

  • at least 50 damaged buildings;

  • at least 2 fatalities;

  • any documented financial losses.

These rather small threshold values reflect our intention to distinguish the majority of felt earthquakes that have no impact on the society from the minority of events that do. Although focusing on the most devastating events might be an equally interesting strategy, the necessity for a considerable amount of data examples in both the classes to obtain stable modeling results led us to choose the adopted scheme. By choosing a threshold value of two fatalities instead of one, we avoid classifying a considerable amount of earthquakes as high impact, in which single people died due to incidents that happened during, but cannot directly be related to an earthquake (e.g., Shoaf et al., 1998; Nievas et al., 2020).

The final database comprising 254 high‐impact and 1994 low‐impact events is summarized in Figure 2. The geographic distribution of events and felt reports in Figure 2a,b reveals the global utilization of the LastQuake service, albeit with a substantial bias toward Europe, in which ∼75% of felt reports are collected. The distribution of durations to collect 50 felt reports dt50 (Fig. 2c) shows that for over 1000 events the required data are collected within 10 min, emphasizing the efficiency of felt report collection via LastQuake. The distribution of events according to the features I and R is depicted in Figure 3. Despite the overlap of the two types of events, visual inspection indicates differing underlying distributions for high‐ and low‐impact events. We identify the trend that larger impact is expected for strong shaking felt over large areas. This reasonable finding is in agreement with the conclusions of, for example, Atkinson and Wald (2007).

We derive a probabilistic model providing the probability p of an earthquake being high impact (H), rather than low impact (L), given its features X derived from felt reports. In the following, we consider the case in which X comprises two predictive parameters X2D=(lnI,lnR), in which ln denotes the natural logarithm.

The desired posterior probability p(H|X2D) is calculated via Bayes’ theorem:
in which f(X2D|H) and f(X2D) denote the densities of the likelihood and the marginal, respectively. We estimate the prior p(H) probability of occurrence of a high‐impact event from the numbers NH and NL of high‐impact and low‐impact events in our database, respectively:
We infer the likelihood f(X2D|H) (and also f(X2D|L)) from data. Visual inspection of Figure 3 suggests to model f(X2D|H) and f(X2D|L) as bivariate Gaussians; however, Kolmogorov–Smirnoff tests indicate that normality cannot be assumed for most predictive features (see Fig. S4). We therefore choose kernel density estimation (KDE) with Gaussian kernels to estimate f(X2D|H) and f(X2D|L). When fitting a KDE to data, optimal choice of the kernel bandwidth h is crucial. To ensure smoothness of the resulting density functions, we first generate the density function of a bivariate Gaussian m^gauss from the mean and covariance of the data. We then select the bandwidth hg that leads to a density function generated from the KDE m^hg that is most similar to that of the Gaussian in terms of mean‐squared error:

Nx=Ny=50 denotes the horizontal and vertical dimensions of the regular grid xij on which the densities are compared (the grid spans exactly the value range of the predictive parameters). This way we ensure a smooth shape of the estimated density and simultaneously account for the non‐Gaussian properties of the data.

The marginal f(X2D) is modeled as a mixture of f(X2D|H) and f(X2D|L):
in which C is a normalization constant,

The posterior p(H|X2D) is presented in Figure 4. We additionally calculated alternative solutions for p(H|X2D), in which we modeled the likelihoods in equation (1) as (1) bivariate gaussians and (2) with an alternative KDE approach in which we optimize the kernel bandwidth in a leave‐one‐out cross‐validation procedure to maximize the likelihood of the data. Visualizations of resulting likelihoods, posteriors, and uncertainty estimates derived from bootstrapping with 5000 draws are given in Figure S5.

Confirming the previous interpretations, the posterior presented in Figure 4 suggests that the stronger the shaking and the larger the area over which the shaking is felt, the more likely it is for an earthquake to be of high impact. The posterior p(H|X2D) provides fairly low values where the density of high‐impact events is high and only exceeds the value of 0.5 occasionally. This is caused by the small value of the prior p(H) (equation 2), that is, as the vast majority of ∼92% of felt earthquakes are actually low impact (see the supplemental material S1). Although the orientations of isolines of p(H|X2D) in Figure 4 seem reasonable where data density is large, considerable influence of individual data samples is obvious in regions where data density is low, indicating slight overfitting of the KDEs to the data.

For validation purposes, we applied our model to a selection of well‐recorded validation earthquakes from 2022 (see Fig. 4; Table 1). We notice that high‐impact events are generally assigned higher values of p(H|X2D) compared with low‐impact events. Distinct and accurate classification is seen for the low‐impact events from Croatia (E01) and California (E02). The intermediate values of 0.07p(H|X2D)0.50 assigned to events E03–E10 indicate a nonnegligible chance of impact that would be difficult to interpret in a real‐time operation of the system. Also the comparatively large value of p(H|X2D)=0.68 assigned to the event from Japan (E11) is ambiguously interpretable. We notice that the high‐impact events from Sumatra (E04) and Nepal (E05) are assigned relatively small values compared to their impact, whereas the event from Chile (E06) with a slightly larger value is actually of low impact.

Modeling decisions

We conducted a fully data‐driven modeling approach via the use of kernel density estimators. As expected, the resulting posterior probability p(H|X2D) is poorly constrained where data density is low (e.g., wiggles in isolines in Fig. 4), which is reflected by increased uncertainty estimates in these regions (compare Fig. S5). Figure S5 shows that even though normality of predictive features is formally not given, assuming bivariate gaussian distributions still leads to a useful and even more reasonable (straight isolines) model that might be more applicable in a practical implementation of our model.

Interpretation of the posterior

The most interesting property of our model is that 39% of low‐impact events in the calibration dataset and not a single high‐impact event fall in the region in which p(H|X2D)<0.01 (Fig. 4). Consequently, considerable impact can almost certainly be ruled out for future earthquakes with similar appearance. Bearing in mind that these events are still largely felt and may cause considerable public anxiety (e.g., Casey et al., 2018; Becker et al., 2019), the ability to comfort the affected population in these cases is a key strength of our methodology.

For few earthquakes with 0.5<p(H|X2D)0.93, the analysis suggests that the occurrence of impact is more likely than its absence, although uncertainties are still large in most cases.

The overlap of high‐ and low‐impact events, and the subsequent small‐to‐medium values of the posterior p(H|X2D) (Fig. 4) raise the question of how to utilize modeling results for 0.01<p(H|X2D)<0.5. One possible solution would be to introduce a traffic light system that suggests a decision maker to not take any further action at low p(H|X2D) (green), to suggest further investigations at intermediate levels of p(H|X2D) (yellow), or to raise an alert at large p(H|X2D) (red). Because the exact thresholds that define the boundaries between “green,” “yellow,” and “red” events would largely depend on the intended use case, we will not suggest any particular values.

Performance and applicability

Because of the exclusive selection of events with at least 50 reports, our suggested methodology is applicable to ∼22% of 2746 felt earthquakes in 2021, in which for ∼14% (393 events) a result can be obtained within 10 min. We expect these numbers to increase over time according to the increasing usage of the LastQuake service (the number of reported earthquakes with 4 ≤ M ≤ 5 increased on average by 23% per year from 155 in 2014 to 646 in 2021). Because every smartphone user is a potential contributor of felt reports, we are still far from what could possibly be achieved once the value of dense and inexpensive felt report collection is properly acknowledged and encouraged by governments and emergency response operatives. Admittedly, our model will be of minor impact in regions where dense real‐time strong‐motion networks and automatized impact assessment are already in place, as is the case in the Friuli Venezia Giulia region in Italy, for example (Poggi et al., 2021). However, the Mw 5.9 event in Afghanistan on 22 June 2022 (E10) in Table 1 is a striking example of the potential impact that our model could have in remote regions that lack seismic instrumentation. The required amount of 50 reports was in this case collected within about 8 min, and even though the corresponding p(H|X2D)=0.41 does not unambiguously hint at the extreme impact of this event, at least the considerable probability of impact could have been noticed rapidly after the earthquake. In such regions, promoting the low‐cost usage of LastQuake might be a worthwile option as long as the installation of strong‐motion instruments is infeasible.

Geographic and operational prerequisites

The validation events from Sumatra and Nepal (E04 and E05 in Table 1) are assigned relatively small values of p(H|X2D), despite their considerable impact. In the first case, only few felt reports were issued from Sumatra, whereas the majority were submitted from Kuala Lumpur, Malaysia, some 400 km away, causing distorted distributions of pseudointensities and barycentral distances. In Nepal, our approach suffers from the inaccessibility of LastQuake in China (Bossu et al., 2018), causing a lack of reports beyond the Chinese border. These two cases emphasize the necessity of active participation of LastQuake users, on the one hand, and the transnational collection of felt reports, on the other hand. Furthermore, the Nepal case indicates limited applicability to coastal regions, where the azimuthal distribution of felt reports is likewise highly nonuniform. The validation events in Afghanistan (E10) and at the French–German border (E03) are two counterexamples where reports were successfully derived across borders between Afghanistan and Iran, and Germany and France, respectively. Subsequently, the derived p(H|X2D) is equally valid for all affected countries.


For the sake of interpretability, we utilize only two predictive parameters to describe an earthquake. However, the simple formulation of the modeling task in equation (1) allows for a straightforward extension to additional parameters obtained from felt reports or other sources, such as population density data products. Additional crowdsourced datasets, such as the one collected by the earthquake network initiative (EQN, Bossu et al., 2022), could contribute more information and improve modeling results. With the database of felt reports increasingly growing in coming years, also calibration of our model to specific continents or regions will be within reach in the near future.

In this study, we have presented the development of a data‐driven, probabilistic model to rapidly distinguish high‐impact from low‐impact earthquakes based on LastQuake felt reports. For 14% of 2740 felt earthquakes in 2021, our model could have provided a classification estimate within 10 min of the event. The key strength of our model is the ability to correctly classify a large portion of 39% of low‐impact events with high confidence, such that urgent necessity for comprehensive emergency measures can be ruled out reliably and rapidly after such an event. Active participation of LastQuake users is a key prerequisite for the proper functionality of our model. If reports are collected numerously and fast, our model might be among the first available information sources to independently characterize the situation after a felt earthquake. Our inexpensive and easily implementable approach could be an effective option to potentially improve rapid response in regions where the installation of strong‐motion networks in the near future is unlikely or unaffordable.

Earthquake impact data used in this study was derived from the Global Significant Earthquake Database (GSED) of the National Geophysical Data Centre and the World Data Service (NGDC/WDS) provided by the National Centers for Environmental Information (NCEI) available at https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.hazards:G012153; the International Events Database (EM‐DAT) of the Université Catholique de Louvain, Belgium, available at https://public.emdat.be/; the Earthquake Impact Database (EID) available at https://erdbebennews.de/earthquake-impact-database-2021/. All impact sources were visited last on 26 October 2022. The U.S. Geological Survey’s earthquake catalog was available at https://earthquake.usgs.gov/earthquakes/search/ to fill gaps in the EM‐DAT database. The python code developed within the scope of this study is available at https://git.gfz-potsdam.de/lilienka/lq_impact. All websites were last accessed in January 2023. The supplemental material to this article contains two additional texts, five figures, and two tables providing details concerning the data processing and modeling decisions.

The authors declare no competing interests.

The authors warmly thank David Wald and Danielle Sumy for their careful reviews and well-targeted suggestions that considerably improved the quality of our study. The authors also express their thanks to Cecilia Nievas for fruitful discussions regarding the availability of earthquake impact data. The first author acknowledges the support of the Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS). This article was partially funded by the European Union (EU)’s Horizon 2020 Research and Innovation Program under Grant Agreement RISE Number 821115 and Grant GEO-INQUIRE Number 101058518. Opinions expressed in this article solely reflect the authors’ views; the EU is not responsible for any use that may be made of information it contains. European‐Mediterranean Seismological Centre (EMSC) thanks the SCOR Foundation for Science for its support.

Part of this research is funded by the European Commission, ITNMarie Sklodowska‐Curie New Challenges for Urban Engineering Seismology URBASIS‐EU project, under Grant Agreement 813137.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data