Deep learning is a promising approach to velocity model building because it has the potential of processing large seismic surveys with minimal resources. By leveraging large quantities of model-gather pairs, neural networks (NNs) can automatically map data to the model space, directly providing a solution to the inverse problem. Such mapping requires big data, which proves prohibitive for 2D and 3D surveys of realistic size. We have developed a transfer learning (TL) strategy. A network is first trained on a smaller subproblem, which then becomes the starting solution to a larger, more difficult data set, akin to the hierarchical multiscale strategy for full-waveform inversion. We perform TL by having subobjectives that escalate in complexity and by first training an NN at estimating horizontally layered velocity models and then proceeding to train an augmented network at estimating 2D dipping layered models. TL improves convergence and allows using a lesser quantity of 2D models for training. For synthetic tests, the structural similarity index measure of 2D interval velocity models in the time domain is and the root-mean-square (rms) error is m/s. We benchmark our algorithm on the Marmousi2 model and observe that our method can apply to velocity models with continuous deformed layers with dips up to 35°. We benchmark our algorithm on 2D marine field data and produce an rms velocity model that leads to coherent stacking and a time interval velocity model that reproduces salient features of the stacked section. TL expedites and regularizes training and data-driven techniques may be applied to field data with minimal preprocessing even though we lack real target velocity models.