We develop a testing methodology for the New Zealand national probabilistic seismic hazard (PSH) model that builds on the groundwork of previous studies. Our fundamental approach is to test the full model, or in other words, the final output of the model (ground-motion exceedance for a given return period). Our results show that the PSH model is rejected as underpredicting the historical number of exceedances for specific peak ground acceleration (PGA) levels obtained directly from instrumental strong-motion data over the last 1–4 decades. However, when aftershock ground motions are removed from the strong-motion data, the model is not inconsistent with the observations. The implications for the PSH model are that the lack of aftershocks in the model led to initial model rejection and that the model may perform better for short (decadal) time periods if aftershocks are included in the PSH model. The results are different from those of earlier Modified Mercalli Intensity (MMI)-based studies that suggested the PSH model was predicting hazard slightly higher than that of the historical record. Our new test dataset has the advantage of using observed PGA rather than PGA inferred from MMI. Establishment of a protocol for formally testing future versions of the New Zealand PSH model within a testing center such as those using the Collaboratory for the Study of Earthquake Predictability protocol will require consideration of the fact that the tests are limited by the available datasets of strong earthquake shaking.