Estimates of the probability that an ML 4.8 earthquake, which occurred near the southern end of the San Andreas fault on 24 March 2009, would be followed by an M 7 mainshock over the following three days vary from 0.0009 using a Gutenberg–Richter model of aftershock statistics (Reasenberg and Jones, 1989) to 0.04 using a statistical model of foreshock behavior and long‐term estimates of large earthquake probabilities, including characteristic earthquakes (Agnew and Jones, 1991). I demonstrate that the disparity between the existing approaches depends on whether or not they conform to Gutenberg–Richter behavior. While Gutenberg–Richter behavior is well established over large regions, it could be violated on individual faults if they have characteristic earthquakes or over small areas if the spatial distribution of large‐event nucleations is disproportional to the rate of smaller events. I develop a new form of the aftershock model that includes characteristic behavior and combines the features of both models. This new model and the older foreshock model yield the same results when given the same inputs, but the new model has the advantage of producing probabilities for events of all magnitudes, rather than just for events larger than the initial one. Compared with the aftershock model, the new model has the advantage of taking into account long‐term earthquake probability models. Using consistent parameters, the probability of an M 7 mainshock on the southernmost San Andreas fault is 0.0001 for three days from long‐term models and the clustering probabilities following the ML 4.8 event are 0.00035 for a Gutenberg–Richter distribution and 0.013 for a characteristic‐earthquake magnitude–frequency distribution. Our decisions about the existence of characteristic earthquakes and how large earthquakes nucleate have a first‐order effect on the probabilities obtained from short‐term clustering models for these large events.