This study compares 12 hazard models based on dated and recent ground‐motion prediction equations (GMPEs) to evaluate the improvement provided by new equations on probabilistic seismic‐hazard assessments in Italy. To this end, a statistical procedure is applied to score the outcomes of each hazard model at 56 different accelerometric sites that have been operating for at least 25 years. This procedure, which calculates the likelihood of the outcomes of the hazard models relative to available observations, evaluates the performance of each model and, indirectly, the influence of the selected GMPEs in providing effective hazard estimates. We have found that older GMPEs tend to yield high‐frequency ground‐motion hazard values that are overconservative at shorter mean return periods and underconservative at longer ones. To identify the sources of the different behavior between older and more recent equations, the biasing of each GMPE is evaluated by comparing median predictions with observations available at two accelerometric sites where a relatively large number of ground motions from different earthquakes have been recorded and local soil conditions are well established. Results indicate that two decades of research on GMPEs have resulted in a significant reduction of bias with an improvement in the accuracy of predictions. Major improvements have been observed from 2008 to 2010. These may be related to the increased completeness of regression data sets and to an increased effectiveness of functional forms, which allow a better modeling of the physical process governing the propagation of ground motions. Since then, the GMPE bias has remained almost stable and no significant improvement in the performances of the relative hazard models has been observed. Our results also indicate that worldwide GMPEs applied to Italy are less effective at providing hazard results corroborated by observations.