Data‐driven machine‐learning approaches are being increasingly applied to construct empirical ground‐motion models (GMMs). It is a standard practice to divide observational records into learning and test datasets to correctly evaluate the predictive performance of a developed model. However, in this study, we show that division based on records or earthquakes is inappropriate for evaluating the generalization performance on recorded sites when GMMs include site‐condition proxies as input variables. Complex models exhibit small residuals at sites used in the training process, but exhibit large residuals at new sites owing to overfitting to the trained sites. As a simple solution, we propose a neural network model that has monotonic dependence on some of the input variables. The model successfully obtains the generalization performance on recorded sites, although it lacks ability to represent oversaturation with input variables suggested in extreme ground‐motion ranges. Therefore, alternative methods should be investigated to develop robust data‐driven models under general conditions. Dividing the sites into learning and test data would play a fundamental role in developing such robust models.