Understanding language can be difficult from human to human, much less for machines attempting to understand the intricacies of human speech and text. Google knows this as well as anyone considering the countless queries made every hour which take a user where they need to go, even in the face of abhorrent sentence structure and unfortunate spelling mistakes.
Today, Google is open sourcing something called SyntaxNet and, specifically, a component for it named Parsey McParseface. With an endearing reference to the publicly suggested name Boaty McBoatface for NERC’s research vessel earlier this year, Google is releasing the tools it uses to understand natural language when typed into a search box or interpreted through spoken word.
SyntaxNet is the overall framework for parsing sentences, called a “syntactic parser.” Parsey McParseface is the English language plugin for SyntaxNet. Google claims the plugin can identify objects, subjects, verbs, and other grammatical building blocks of sentences as well as, or even better than, trained human linguists.
Understanding language is key in helping machines properly provide the answers users are seeking. For example, when asked about “food in Mexico,” the interpretation could either be about Mexican cuisine or specifically restaurants located within Mexico. Neither interpretation is necessarily incorrect, but subtle differences in language are what make Google’s tool so powerful.
Google has been open sourcing more and more of its machine learning tools. After opening up its learning platform TensorFlow last year, the company is now releasing all the code needed to train new SyntaxNet models on individual datasets as well as analyze English text.