Wellington developer refines word search tool

01.09.2011
Wellington company SYL Enterprise Search has introduced a semantic search engine that enables an enterprise to understand the context of words. The patented technology allows content to be processed as language, not just words occurring in documents.

The engine, known as SYLSemantics, automatically understands synonyms and relationships, allowing it to consider a much broader portion of content when evaluating a search request.

The three-and-a-half-year development programme behind SYLSemantics has been co-funded by the Ministry of Science and Innovation.

The initial development was done by Codec, a Wellington company that specialised in bespoke software development and integration. In July, Codec merged with RHE Infrastructure, once part of the former RHE Group, to form SYL Enterprise Search.

"This has allowed us to bring the product to market," says SYL Enterprise Search chief executive Sean Wilson. "We found we had similar clients, which led to the merger. We rebranded the Codec development as SYLSemantics. It was launched as an enterprise search engine, but it is more like an information access tool. SYL reaches into any text document and creates its own semantic index, which is used for aggregating information."

SYLSemantics is based on Linux and Java.

"Content in the enterprise is quite different to web content," Wilson says. "An enterprise search is about getting into content repositories like databases and document management, which tend to exist as silos.

"Knowledge workers spent up to 25 percent of their time just trying to find information," he claims.

The engine contains an English dictionary comprising 2.25 million words and 2.5 million relationships that may link the words, as synonyms, and in other relationships. Auckland, for example, is identified as a city but also as part of New Zealand.

"We've also harvested Wikipedia," Wilson says. "For example, Harrison Ford is linked to the word 'actor'."

An important component is industry jargon. In fact, any dictionary can be imported from flat files.

Wilson says that in linguistics words are categorised around 16 broad categories. SYL has a drill-down function to define the desired use of the word. "If you were searching for the word 'bank', you would tell SYL whether you meant a financial institution or a slope. Automatically, the search results reflect this disambiguation -- thus they are more relevant and allow a user to narrow the search results.

Security is also important. The organisation's security model defines what the user has access to. "You'd be surprised how many organisations have implemented enterprise search, then yanked it two hours later when the CEO's emails become available to everyone," Wilson says.

Where an organisation has multiple branches, the software uses agents to harvest data and feed it back to the core system.

SYLSemantics is deployed as a virtual machine and can be installed and configured within hours.