Tuesday, October 25, 2011

Too Big (Data) to Fail

Will we look back at 2011 and think of it as “the year of Big Data”? This does feel like the year when organizations can genuinely take advantage of the opportunity presented by big data – harnessing its volume, variety and velocity – in both concept and implementation.


The venture capital and investment community has been betting with its wallet. During the first three quarters of 2011, several high-profile acquisitions occurred and at least one dozen new, early stage investments were made. The key theme is big data analytics and the goal is big insights that drive new business decisions (presumably, decisions that couldn’t have been made without leveraging that big data).


The problem is that, currently, big data analytics is fraught with far too much data and far too little analytics. Should this continue without a more intelligent way to connect to and actually use all this data, the result will often be project failure.


The next generation of big data connectors must be more intelligent, providing views into these vast swaths of data, so the opportunity for big insights can be more commonly realized. Jaspersoft’s recent work and announcement this week with IBM and its InfoSphere BigInsights product take a major step in this direction.


IBM InfoSphere BigInsights

Building on the Apache Hadoop open source framework, IBM InfoSphere BigInsights adds administrative, workflow, provisioning, and security features, along with best-in-class analytical capabilities. The IBM software package comes in a Basic Edition (freely downloadable) and an Enterprise Edition. The Basic Edition includes the complete Apache Hadoop install, a web-based management console, and pre-built integration with IBM InfoSphere Warehouse, IBM Smart Analytics System, and DB2. The Enterprise Edition goes on to include text analytics capabilities with a rules engine, a spreadsheet-like browser-based tool (called BigSheets) for data exploration and job creation, a metric-driven scheduler, large scale indexing, a JDBC connector, LDAP support, and a query language that enables analysis of structured and non-traditional data types (called Jaql).


Finding insight from within all the data can be challenging. The BigInsights toolset is made far more useful with a modern, powerful BI server out in front of it. So, IBM’s partnership with Jaspersoft provides this critical component of a complete Big Data analytics solution.


2nd Generation “Intelligent” Connectors

Connecting to a Hadoop-class data source is only useful if done intelligently. Running a query that returns millions of rows (and columns) of data probably won’t answer the business question being posed. Intelligently interrogating the data structure during the query is necessary. To accomplish this, Jaspersoft has delivered a 2nd generation connector for the IBM InfoSphere BigInsights platform. This connector builds incrementally on providing data access via Hive and it builds exponentially on allowing direct and intelligent access to HBase. The Jaspersoft connector supports filters, delivers greater performance and usability, and enables yet unseen flexibility for interacting with Big Data.

1. Filters: Because HBase has no native query language, there's no automatic filtering capability. But there are filtering APIs. The new Jaspersoft connector not only supports simple filters (e.g., StartRow and EndRow) but also supports a wide array of complex filters (like RowFilter, FamilyFilter, ValueFilter, SkipValueFilter, and so on). In fact, the universe of supported Apache Hadoop filters is listed here.

2. Performance & Usability: In addition to the systems monitoring and management niceties provided by IBM, a Jaspersoft HBase query can specify exactly the ColumnFamilies and/or Qualifiers that are to be returned. This is particularly helpful for query performance tuning and usability, in that some HBase users have very wide tables, so accessing just the necessary fields offers a much faster and more usable solution.

3. Flexibility: To unpack data from HBase and make sense of it within a reporting tool, Jaspersoft’s connector supports a deserialization engine framework. The connector automatically understands HBase's shell and Java default serializations. Then, a customer can plug in existing or customized Java deserializers so the connector will automatically convert from HBase's raw bytes into meaningful data types. This delivers flexible support for the widest array of data within Hadoop’s HBase environment.


We’ve truly come a long way from the earliest days of Apache Hadoop, moving beyond the technical elite, on to the IT team (thanks to IBM) and now on to the business user (thanks to Jaspersoft). The result of Jaspersoft’s integration with IBM InfoSphere BigInsights is a complete Big Data solution, including the ability to manage and process large volumes of data and the ability to extract key information using flexible and easy-to-use reporting, dashboard and analytic views in one integrated solution. There’s plenty more to learn about Jaspersoft’s integration with IBM InfoSphere BigInsights.


The fastest path toward uncovering real analytic insight from Hadoop comes through a combination of proven, best-in-class software. Just in time, because the untapped potential for bold new insight from within the growing volumes of data is too big to fail.


Brian Gentile

Chief Executive Officer

Jaspersoft