Thomson Reuters Open Calais™ offers an interesting and potentially easy way to tag the people, places, companies, facts, and events in your content to increase its value, accessibility and interoperability.
How does it work? Well in essence it uses Natural Language Processing (NLP) and machine learning algorithms trained by hundreds of Thomson Reuters’ Editorial teams for several years to create a combination of company extraction and relevance.
For the user, the process is pretty simple. You feed unstructured text into the extraction engine (news articles, blog postings, etc.) to examine your text and locate:
Relationships: (John Smith works for Widget Corp.)
Facts: (John Smith is a 42-year old male CFO)
Events: (Jane Brown was appointed a board member of Widget Corp.)
Topics: (Story is about M&As in the media industry)
Open Calais then processes the information extracted from the text and returns semantic metadata in RDF format. Here are some of the outputs:
Contextual navigation: This pinpoints the most relevant companies, people and industries
More focused news: This acquires highly relevant, targeted news for companies and industries of interest
Fast processing: It takes, on average, well under a second to process a sizable news article
Greater intelligence: The system goes far beyond classic entity identification and returns the relevant facts and events hidden within the text
The quality output of Open Calais relies on the curated authorities maintained by thousands of Thomson Reuters’ data team members and also leverages the identity management provided by Thomson Reuters’ experts.