We compared the accuracies of the probabilistic predictions of strong ground motions made by ground‐motion models (GMMs) using the observed ground motions from 13 Japanese and 14 New Zealand shallow crustal earthquakes with moderate‐to‐large magnitude (5.5–6.6 for Japan and 5.07–7.85 for New Zealand). The data are independent of the GMMs so only the predictive power, instead of the explanatory power, of the models is evaluated. We examined the performance gains of state‐of‐the‐art GMMs developed under the Next Generation Attenuation‐West2 (NGA‐West2) project over widely adopted regional GMMs for Japan and New Zealand. The large global dataset used by NGA‐West2 GMMs allows sophisticated modeling, whereas the regional datasets used by regional GMMs may more directly represent region‐specific ground‐motion features. We measured the model performance by a newly developed method based on the multivariate logarithmic score, an extension of the widely used univariate logarithmic score (LLH) method. Our method measures the relative performance of models, taking into account the effects of data correlation, unbalanced data, and result variability. For the Japan case, we evaluated the model predictions for peak ground velocity (PGV) and found that NGA‐West2 GMMs unambiguously performed better than regional GMMs and the superseded NGA GMMs. Proposed regional optimizations implemented in NGA‐West2 GMMs improved the predictions for some models but had adverse effects for others. For the New Zealand case, we evaluated the model predictions for peak ground acceleration (PGA) and spectral accelerations at 0.3, 1, and 3 s and found that a recently developed regional GMM performed well, but NGA‐West2 GMMs with performance comparable to or better than the regional model can also be identified. There appears to be no general answer as to whether a regional or global model should be preferred or whether a newer model is always better than the superseded model. This highlights the importance of evaluating the predictive power of GMMs using independent data.