Google's BigQuery Offers Infrastructure to Crunch Big Data

01.05.2012

Customers upload their data to Google as CSV files using a data ingestion API. Kwek says the API uses concurrent compressed streams that allow customers to upload several hundred gigabytes in about 15 or 20 minutes.

"Because you're using Google's data centers, the amount of data you can put into the system is for all practical purposes unlimited," Kwek says.

Kwek notes the data is protected with multiple layers of security, replicated across multiple data centers and can easily be exported. Access to the data is managed via group- and user-based permissions using Google accounts.

Unlike many Big Data systems, the service does not leverage Apache Hadoop (derived from Google File System (GFS) and Google MapReduce), but Kwek says it does use a distributed query and data storage architecture. The service abstracts the guts of the analytics operation from the user, sharding out the data, distributing it and managing it.

Once uploaded to Google's storage, customers can use SQL-like query language to interrogate the data. Customers can use BigQuery through a Web UI called the BigQuery browser tool, the bq command-line tool or by making calls to the REST API using various client libraries in multiple languages, including Java and Python. Google's infrastructure can analyze billions of rows in seconds, Kwek says, adding that it is an ideal tool for ad-hoc analysis, standardized reporting, data exploration and Web applications.