EMC plans to offer data classification

22.06.2006
EMC Corp. plans to provide users with data classification functionality, a move that could presage another wave of mergers and deals with storage industry start-ups.

While EMC has not yet formally announced a product, CEO Joe Tucci discussed the company's intentions at its analyst day on June 7 in New York, and executives have been talking up the technology, which the company calls Intelligent Information Management.

Data classification software lets storage administrators set up policies so that data is automatically categorized by its importance and then stored using a hierarchical storage management scheme. Critical data is assigned to high-end storage, and less important and infrequently accessed data is relegated on slower, cheaper storage.

EMC users said they were looking forward to the new functionality.

John Halamka, CIO at Harvard Medical School and CareGroup Healthcare System in Boston, said he believes that EMC's new product will be the type of functionality his organizations need to maximize the utility of their storage investment. Halamka said he looks forward to being an early adopter.

Kenneth J. Kucera, senior vice president and CIO of First National Bank of Omaha, said his organization would probably take a look at it as a proof-of-concept. He also said that the technology is a logical corollary to the tiered storage architecture his organization already has.

EMC did not release product details but indicated that it would ship in the fourth quarter, according to George Symons, chief technology officer for information management at the Hopkinton, Mass.-based company.

Initially, the technology will focus on unstructured files such as text files, spreadsheets, PowerPoint presentations and semistructured files such as e-mail. It will eventually support databases as well, Symons said. It will enable administrators to set up four to 10 classes to which data could be assigned, along with retention requirements, who has access to it, and compliance requirements, he said.

Such a system could also be set up to automatically delete information based on criteria such as how much time has passed since it was accessed, Symons said. Setting up policies and processes for such deletion will help organizations follow compliance and e-discovery requirements by demonstrating that they have a policy and a process and that files are not being deleted randomly, he said.

Brian Babineau, an analyst at Enterprise Strategy Group in Milford, Mass., said EMC has all the pieces to create a data-classification product but must now knit them together.

Though EMC is using technology from prior acquisitions, such as Legato Systems Inc., Documentum Inc. and Smarts Inc., to develop the new data-classification technology, the company is not acquiring any of the data-classification start-ups to jump-start development, said Symons. This is actually unusual for EMC because the company has a history of producing innovation through acquisition and is typically a bellwether in the storage industry, said Simon Robinson, an analyst at The 451 Group Inc. in New York.

EMC is nearly always first to market with new technologies, and users could typically expect that when EMC makes an acquisition in a particular area, other acquisitions tend to follow Robinson said.

EMC has had an agreement for some time with Arkivio Inc., which makes its Auto-stor product in this area. Network Appliance Inc. penned an agreement with Kazeon Systems Inc., along with and Hitachi Data Systems Corp., which also has a reseller agreement with Scentric Inc.

Other storage vendors, such as Hewlett-Packard Co., IBM and Sun Microsystems Inc., have not yet announced such agreements with data-classification start-ups such as StoredIQ Corp.

Mountain View, Calif.-based Kazeon produces the Kazeon Information Server, which it first shipped in October 2005. In addition to selling its product directly, Network Appliance licenses it and sells it to customers, said Michael Marchi, vice president of solution marketing. Kazeon focuses on unstructured data and recently announced an alliance with Google Inc. to provide file searching to Google's search appliance, he said.

Austin-based StoredIQ produces the Information Classification and Management 5000 information server, which works with both unstructured data and e-mail.

Alpharetta, Ga.-based Scentric makes the Scentric Destiny software.

George Rodriguez, lead systems programmer at ABC Distributing LLC in Miami, has been testing the software and is considering it for use at his company, which does catalog sales and has up to 5,000 employees during busy times of the year.

Rodriquez's company has a two-tiered storage infrastructure, and he said Scentric Destiny not only moves data to secondary storage, but also does it in a way that the user does not know it has been moved, he said.

Scentric, which shipped its product in April, was the first company in the area to support both unstructured files and e-mail, according to Larry Cormier, senior vice president of marketing. The software typically costs US$100,000, he said.