posted on October 15, 2013 21:06
On November 14 at IBM’s Analytics Solution Center, there was an event highlighting Big Data. Four speakers presented their points of view on Big Data relative to their respective studies. Below, I present highlights from two of the speakers.
First up was one of my colleagues from IBM who, like me, focuses much of his time these days on Big Data, Tim Paydos. Tim presented “Demystifying Big Data: Deciding the Big Data Commission Report.” Tim’s presentation highlighted the following:
· Intensifying business challenges coupled with an explosion in data have pushed agencies to a tilling point.
· Agency leaders embrace this, are defining the new requirements, and are demonstrating success
· The path to success lies in starting with business challenges and imperatives, versus the technology.
· The time is now – the experience & assets exist to help you define a strategy and a roadmap to guide your transformation
The second speaker was Andras Szakal, IBM’s Federal CTO. Andreas focused on best practice Big Data architecture:
· Old Compute-centric Model: Data lives on disk and tape. Move the data to the CPU as needed, and maintain a deep storage hierarchy.
· New Data-centric Model: Data lives in persistent memory. Many CPU’s surround and use a shallow/flat storage hierarchy.
I have written a lot about the business value take-aways of Big Data, but I haven’t focused much on the architecture. So let me stop here for a second. What Andras is pointing out is profound and a major architectural difference of a real Big Data architecture. Frankly, most organizations that I talk to or read about today that think that that they have started into the Big Data direction really just have a lot of data sitting on their old architecture. And to increase performance they are simply adding more CPU power onto an old-school architecture, thereby not realizing the real power of a big data architecture. The figure below illustrates Andras’ point.
Andras goes on to discuss when the Map Reduce / Hadoop framework is the right solution:
· Data volumes cannot be cost effectively managed using existing technologies
· Analyzing larger volumes of data can provide better results
· Mining insights from non-relational data types – unstructured
· Exploring data to understand its potential value to the business
· When diverse data must be cost effectively stored and analyzed on the same platform
He expands by discussing when stream computing is the right solution:
· It would be too expensive to store the data before analyzing
· Data fusion across multiple, disparate, streams brings advantages
· True real-time data analysis can provide netter business outcomes
· Ability to run multiple analytic models or applications against the same data
The morning was closed out by Ros Doktor from IBM who focused on the policy recommendations surrounding Big Data. These highlights include:
· The culture of information sharing and decision making needs to grow to include Big Data Analysis. John Kamensky, presents more detail on this topic from the Partnership for Public Service’s recent study.
· The Ability to glean new insights from the massive amounts of data demands new skills and out-of-the-box thought processes.
· The $200 million Big Data R&D funding demonstrates the government’s commitment to the transformative nature of Big Data. But sustained, aggressive funding across a variety of domains is needed.
· Realizing the promise of Big Data does not require the sacrifice of personal privacy.
· The government should avoid Big Data contract vehicle duplication.