Understanding Big Data: The Ecosystem

Source: Dataconomy, by Eileen McNulty, Date: 3rd of June 2014

I like this article where you kind find many new technologies but from some familiar names like IBM with Machine Learning technology called Watson. What makes this article stand out is how it breaks down the ecosystem into key categories making it easier to attribute context and use cases to a business problem you may be dealing with. Key categories include: Infrastructure Analytics Applications Open Source Data Sources For the uninitiated, the Big Data landscape can be daunting. The vast proliferation of technologies in this competitive market mean there’s no single go-to solution when you begin to build your Big Data architecture. In this series of articles, we will examine the Big Data ecosystem, and the multivarious technologies that exist to help enterprises harness their data. This first article aims to serve as a basic map, a brief overview of the main options available for those taking the first steps into the vastly profitable realm of Big Data and Analytics.

Ultimately, a Big Data environment should allow you to store, process, analyse and visualise data. It starts with the infrastructure, and selecting the right tools for storing, processing and often analysing. There are then specialised analytics tools to help you find the insights within the data. Further on from this, there are also applications which run off the processed, analysed data. All of these are valuable components of the Big Data ecosystem.

Infrastructure

Infrastructural technologies are the core of the Big Data ecosystem. They process, store and often also analyse data. For decades, enterprises relied on relational databases– typical collections of rows and tables- for processing structured data. However, the volume, velocity and variety of data mean that relational databases often cannot deliver the performance and latency required to handle large, complex data. The rise of unstructured data in particular meant that data capture had to move beyond merely rows and tables. Thus new infrastructural technologies emerged, capable of wrangling a vast variety of data, and making it possible to run applications on systems with thousands of nodes, potentially involving thousands of terabytes of data.

Some of the key infrastructural technologies include:

  • Hadoop- A whole ecosystem of technologies designed for the storing, processing and analysing of data. The core Hadoop technologies work on the principle of breaking up and distributing data into parts and analysing those parts concurrently, rather than tackling one monolithic block of data all in one go.
  • NoSQL- Stands for Not Only SQL; also involved in processing large volumes of multi-structured data. Most NoSQL databases are most adept at handling discrete data stored among multi-structured data. Some NoSQL databases, like HBase, can work concurrently with Hadoop.
  • Massively Parallel Processing (MPP) Databases- MPP databases work by segmenting data across multiple nodes, and processing these segments of data in parallel, and uses SQL. Whereas Hadoop is usually run on cheaper clusters of commodity servers, most MPP databases run on expensive specialised hardware.

Many enterprises make use of combinations of these three (and other) kinds of Infrastructure technology in their Big Data environment.

Analytics

Although infrastructural technologies incorporate data analysis, there are specific technologies which are designed specifically with analytical capabilities in mind. Sub-categories of analytics on the big data map include:

  • Analytics Platforms- Integrate and analyse data to uncover new insights, and help companies make better-informed decisions. There is a particular focus on this field on latency, and delivering insights to end users in the most timely manner possible.
  • Visualization Platforms- Specifically designed- as the name might suggest- for visualizing data; taking the raw data and presenting it in complex, multi-dimensional visual formats to illuminate the information
  • Business Intelligence (BI) Platforms- Used for integrating and analysing data specifically for businesses. BI Platforms analyse data from multiple sources to deliver services such as business intelligence reports, dashboards and visualizations
  • Machine Learning- Also falls under this category, but is dissimilar to the others. Whereas the analytics platforms input processed data and output analytics/dashboards/visualisations for end users, the input in machine learning is data the algorithm ‘learns from’, and the output depends on the use case. One of the most famous examples is IBM’s super computer Watson, which has ‘learned’ to scan vast amounts of information to find specific answers, and can comb through 200 million pages of structured and unstructured data in minutes. The computer recently combed through recipes and taste combinations tocreate its own sauce.

Applications

Applications are big data businesses and startups which revolve around taking the analysed big data and using it to offer end-users optimised insights. Fields in which applications are used include:

  • Health- Mintlabs is a compendium of 3D brain scans and neurological information which can be accessed by neurosurgeons from all over the world and help in the diagnosis, prognosis, and treatment of patients with brain diseases
  • Retail- Avansera run mobile shopping apps that offer insights for food production companies into variables that affect food purchase, such as brand loyalty and price flexibility
  • Energy- AutoGrid uses data from smart meters, building management systems, voltage regulators and thermostats to help consumers track and curb power use, reduce waste, balance the grid, improve system operations and even predict future consumption

This is just a brief insight into the multi-faceted and ever-expanding cartography of Big Data. In the coming weeks in the ‘Understanding Big Data’ series, I will be examining different areas of the Big Landscape- infrastructure, analytics, open source, data sources and cross-infrastructure/analytics- in more detail, discussing further what they do, how they work and the differences between competing technologies.

There are many different types of technologies out there, which can offer infinite opportunities to their users. The key is identifying the right components to meet your specific needs.

(Image credit: Matt Turck)

Pin It on Pinterest