Understanding language can be difficult from human to human, much less for machines attempting to understand the intricacies of human speech and text. Google knows this as well as anyone considering the countless queries made every hour which take a user where they need to go, even in the face of abhorrent sentence structure and unfortunate spelling mistakes. Today, Google is open-sourcing something called SyntaxNet and specifically a component for it, Parsey McParseface. With an endearing reference to the polled name Boaty McBoatface for NERC’s research vessel earlier this year, Google is releasing the tools it uses to understand natural language when typed into a box or interpreted via spoken word.
SyntaxNet is the overall framework for parsing sentences, called a “syntactic parser.” Mr. McParseface is the English language plugin for SyntaxNet. Google claims the plugin can identify objects, subjects, verbs, and other grammatical building blocks of sentences as well as, or even better than, trained human linguists.
Understanding language is key in helping machines properly provide the answers a user would be seeking. For example, when asked about “food in mexico”, the interpretation could be either about mexican cuisine, or specifically restaurants located within Mexico. Neither is necessarily incorrect, but subtle meaning differences in language is what makes Google’s tool so powerful.
Google has been open sourcing more and more with its machine learning tools. After opening up their learning platform TensorFlow last year, they’re releasing all the code needed to train new SyntaxNet models on individual data, as well as analyze English text.