IBM: Puzzles provide clues to better analysis

22.03.2012

The key to analysis is a mixture of streaming and batch processing. The Apache Hadoop data framework is designed for batch processing, in which a lot of data in a static file is analyzed. This is different from stream processing, in which a continually updated string of data is observed. "Until this project, I didn't know the importance of the little batch jobs," he said.

Batch processing is a bit like "deep reflection," Jonas said. "This is no different than staying at home on the couch mulling what you already know," he said. Instead of just staring at each puzzle piece, participants would try to understand what the puzzle depicted, or how larger chunks of assembled pieces could possibly fit together.

For organizations, the lesson should be clear, Jonas explained. They should analyze data as it comes across the wire, but such analysis should be informed by the results generated by deeper batch processes, he said.

Jonas' talk, while seemingly irreverent, actually illustrated many important lessons of data analysis, said Seth Grimes, an industry analyst focusing on text and content analytics who attended the talk. Among the lessons: Data is important. Context accumulates and real-time streams of data should be augmented with deeper analysis.

"These are great lessons, communicated really effectively," Grimes said.