Apple’s popular Siri virtual assistant feature may soon be able to better understand local speech patterns, based on new algorithms that include geolocation data.
The method is outlined in Apple’s (News - Alert) patent filing published by the U.S. Patent and Trademark Office on Sept. 12, titled "Automatic Input Signal Recognition Using Location Based Language Modeling."
"As the number and type of possible input signals has broadened, providing accurate results has remained a challenge," the filing’s authors wrote. "This is particularly true for recognition systems that rely on a global language model for all input signals. In such cases, input signals that are unique to a particular geographic region are often improperly recognized."
Users already can choose between languages and regional dialects. Global models include general language properties and high-probability word strings. However, as Apple’s iOS devices increase in worldwide popularity, issues with local speech patterns have become more pronounced.
The patent authors use the words "goat hill" as an example of a voice input with a low probability of being spoken globally. Currently, the system may determine the speaker is saying "good will." With the integration of geolocation, however, the technology may recognize that a nearby location or store is called Goat Hill, leading the system to determine that input as the more likely word string.
The patent’s authors admit situations could occur where the software would incorrectly assume a person is using a local dialect or talking about a local business or attraction.
"Such a solution only considers one geographic region, which can still produce inaccurate results if the location is close to the border of the geographic region and the input signal corresponds to a word sequence that is unique in the neighboring geographic region," the patent authors wrote.
To alleviate this issue, the authors describe a way to assign a weight to global word sequences and local word sequences to determine if the user's voice input should be interpreted related to their geographic location.
The technology would gather location data via GPS, cell tower triangulation or manual entry. Combining the location data with local language models involves a "centroid," or a predefined focal point for a given region, such as an address, building, town hall or the geographic center of a city. If the thresholds surrounding centroids overlap, the technology would weigh one local language model higher than another, creating the hybrid language model.
It is unclear if Apple is already using some form of the language interpretation technology described in the recent filing.
Edited by Rory J. Thompson