Earthquake focal mechanisms present information on fault plane and stress direction, which is crucial for understanding tectonics and seismicity. Source focal mechanisms of small earthquakes are often difficult to determine from waveform modeling but feasible to infer from initial polarities. Here, we employ a state‐of‐the‐art neural network infused with an attention mechanism to simultaneously pick arrivals and determine the first‐motion polarity. The model is trained and tested with data from southern California. Compared with polarity inference with manual picks in the catalog, predicted polarity inference can help obtain more focal mechanism solutions in southern California. We test this model with data from different regions and observe high generalizability. The predicted arrival and polarity data are consistent with the labeled arrival and polarity data in Japan. The average‐picking error is 0.04 s, and the accuracy of polarity classification is 99%. We infer the focal mechanisms from the predicted polarity in Oklahoma. The derived focal mechanisms are consistent with referencing focal mechanisms. This method allows routinely obtaining arrival and polarity data, and deriving focal mechanism solutions for events.