Full-waveform inversion (FWI) is a popular technique to obtain high-resolution estimates of earth model parameters using all information present in seismic data. Thus, it can provide important information about the subsurface. The FWI algorithm is formulated as a data-fitting minimization problem that iteratively updates an initial velocity model using the gradient of the misfit until an acceptable match is obtained between the real and synthetic data under a tolerance level based on noise in the data. The inversion is computationally expensive and can converge to a local minimum if the starting model used is not close enough to an optimal model. Here, we propose an alternative approach using a combination of machine learning and the physics of the forward model. Unlike conventional supervised machine learning, known answers are not required to train our network. The shot gathers are input to a convolutional neural network-based autoencoder, the output of which is used as the velocity model that is used to compute synthetic seismograms. The synthetic data are compared against observed input data, and the misfit is estimated. The gradient of the misfit with respect to the velocity model parameters is calculated using the adjoint state method. The adjoint state gradient is then used to update the network weights using the automatic differentiation technique. Once the misfit term converges, the neural network can generate velocity models consistent with the observed data. We observe that the neural network can capture spatial correlations at different scales and thus can introduce regularization in our inverse problem. Experiments with the Marmousi model and SEG Advanced Modeling Corporation Phase 1 salt model suggest that the proposed method can overcome local minima, requires no starting model, and produces robust results in the presence of noise and complex salt body structures.