Recently released Aster In-Database MapReduce from Aster Data is said to bring the Google MapReduce paradigm to the world of relational databases and structured data. The startup has already worked with such Web 2.0 luminaries as MySpace to bring MapReduce to the broader markets.
Tasso Argyros,CTO and founder, Aster Data, is not one to connect MapReduce to Cloud Computing.'''Cloud' is overloaded at this point. It means ten different things. It's too overloaded,'' he said. Put simply, the objective of recently released Aster In-Database MapReduce is to bring the Google MapReduce paradigm to the world of relational databases and structured data.
In Argyros' view, MapReduce is an API that can be used by programmers to build parallel applications on clusters of commodity servers. "What we are doing is taking that API and integrating it closely with SQL. MapReduce is an enabler," he said.
"Now we allow customers to extend SQL by writing MapReduce procedures. Now someone writing in Java can write a parallel application in MapReduce. Then that can be exposed to a bus analyst's [program] or reporting tools.
Aster Data sees MapReduce as a programming paradigm that allows programmers to create programs that process data in parallel across a distributed cluster. In this take, MapReduce is significant because it allows ordinary developers to create various parallel programs without having to worry about programming for intra-cluster communication, task monitoring or failure handling.
MapReduce redux drill-down
What is at the core of this approach to data? The technology was a major topic of interst at TheServerSide Java Symposium in Las Vegas earlier this year, where Eugene Ciurana, director of Systems Infrastructure at LeapFrog Enterprises and contributing editor for TheServerSide.com described MapReduce as a set of implementation patterns for indexing large amounts of data according to a criteria selected by the architect.
The function called 'map' takes two arguments. "It is a generic interface for a method. It takes 'list' and 'function,'" he said. "The power of that is you can define your functions and load them only when you need them."
"You have a huge amount of of data you divide into buckets," said Ciurana. "Each bucket of data is fed into a function that does some pre-processing for you. The result for each bucket [or, intermediate results] are sent to another program that does some post processing. You can keep doing these reductions until you get to an answer you are looking for."
Ciurana described some of the 'low hanging fruit' for this approach. "This is a very good strategy for protein folding, for analyzing HTTP logs - some financial data patterns would be good [for MapReduce] to analyze."
The approach is for the data hungry. "When you think about it, you are not restricted by you database or by a single processor that would get bogged down by sheer amounts of of data," said Ciurana.
And the scalability promised by Cloud Computing is a key attribute of MapReduce. "You can have 100 machines doing the first mapping, and 90 machines doing the follow-up mappings, and 10 machines to do the final reduction," he said.
"That's where a lot of the power comes," said Ciurana.
[With reporting by Jack Vaughan.]