Almost all forecasting of the magnitude of the next earthquake has assumed the same independent probability distribution, such as the Gutenberg–Richter (G‐R) law, with the same ‐value ( in Japan region standard), throughout an earthquake sequence. Identifying a broadened forecasting procedure for general models of space–time magnitude sequences may enhance the information gain of earthquake forecasts. This article explores and evaluates three such models for earthquake magnitude forecasting. The first model forecasts magnitudes by location‐dependent ‐values; the second model forecasts magnitudes by space–time weighted moving average of the short‐term past and neighboring magnitude sequence; and the third forecasts based on short‐term tightness of clustering among earthquakes. The forecasting performances of these models estimated in a learning period are shown at each time in a testing period. Except for the last example, the forecasts do not outperform the baseline G‐R law with the ‐value of 0.9. We discuss the reasons by some residual analysis.