Direct stacks of teleseismic waveforms recorded at a station have been used as an alternative to receiver functions for the retrieval of crustal 1D S‐wave velocity models through inversion. Although they generally feature lower signal‐to‐noise ratios, their use has recently gained some attention because they do not rely on deconvolution. Avoiding deconvolution in waveform processing is a significant advantage for probabilistic (Bayesian) inversion methods that rely on a realistic assumption about the statistical distribution of waveform noise. However, the preservation of the effective source time function (STF) in the waveform data poses new challenges in the data processing. In this short note, we show that the simple technique that has been applied to directly stack waveforms to date lacks precision, because waveforms with emergent onsets or more complicated STFs are often stacked out of phase, which leads to artifacts in the stacked trace. We introduce a new cross‐correlation‐based stacking technique that avoids phase errors by stacking groups of mutually coherent traces and creating stacks for each of these families of traces. This separates the dataset into groups of events with similar STFs, which can be inverted jointly or separately.