NoSQL offers users scalability, flexibility, speed

26.08.2011
Users of NoSQL databases and data processing frameworks such as CouchDB and Hadoop are deploying these new technologies for their speed, scalability and flexibility, judging from a number of sessions at the NoSQL Now conference being held this week in San Jose, California.

EMC is using a mixture of traditional databases and newfangled NoSQL data stores to analyze public perception of the company and its products, explained Subramanian Kartik, distinguished EMC engineer, during one talk.

The process, called sentiment analysis, involves scanning hundreds of technology blogs, finding mentions of EMC and its products, and assessing if the references are positive or negative, using words in the text.

To execute the analysis, EMC gathers the full text of all the blog and Web pages mentioning EMC, and compiles them into a version of MapReduce running on its Greenplum data analysis platform. It then uses Hadoop to weed out the Web markup code and non-essential words, which slims the data set considerably. It then passes the word lists into SQL-based databases, where a more thorough quantitative analysis is done.

The NoSQL technologies are useful in summarizing a huge data set, while SQL can then be used for a more detailed analysis, Kartik said, adding that this hybrid approach can be applied to many other areas of analysis as well.

"There is all sorts of information out there, and at some point you will have to go through tokenizing, parsing and natural language processing. The way to get to any meaningful quantitative measures of this data is to put it in an environment you know can manipulate it well, in a SQL environment," Kartik said.