How ComScore Is Using Hadoop to Tame Its Big Data Flow

08.06.2012

In ComScore's case, Brown found that Syncsort's software made the Hadoop migration a piece of cake. "You don't have to change any code, except the push code," he says. "We use DMExpress in [more than] 30 different apps. It's our tool for any situation [where] we have to adjust the data."

"We can store twice as much data on the cluster," he continues, "and we also use it to improve performance. One big problem it solved was the ability to chunk and split the large files we have into files that fit perfectly into the chunks on Hadoop. This enables us to have a higher rate of parallelism on compressed files while reducing our costs for disk on the cluster."

That, Brown says, translates into saving 75 terabytes of data storage a month. That, too, is big data.

in CIO's Process Improvement Drilldown.