Three data analytics party tricks
Three data analytics party tricks (in Data analytics and machine learning, Mike Davidson (editor))
Leading Edge (Tulsa, OK) (March 2017) 36 (3): 262-266
Making little software tools might seem trivial, but "party tricks" are a good way to explore a new field, find useful code libraries, and help build skills. In this spirit, it is fun and instructive to apply machine learning methods to text-mining tasks. It is especially interesting to use bibliographic data from the journal Geophysics because the results actually might be useful to those conducting geophysical research. For example, by vectorizing abstracts - using free and open-source natural-language processing tools in Python - it is possible to use the vector space to find nearby abstracts and interpret those as being similar in content. This forms the basis of a recommendation engine for geophysical papers. If not outright useful, then the party trick still might be interesting. For example, the collaboration network from the journal reveals the most prolific collaborators as George McMechan, Alan Green, and Jerry Harris, and it lets us calculate the collaboration distance between Brian Russell and Sergey Fomel (it is 4). Other party tricks are less useful and strictly silly, for example a recurrent neural network that generates random articles and authors from a parallel universe (e.g., Like-wave beam by D. J. Laniert; one imagines Like waves are a sort of attenuated Love wave).