A primary consideration when undertaking a big data project is the projected amount of real-time and non-real-time required to carry out your initiative. Big data is often about doing things that weren’t possible because the technology was not advanced enough or the cost was prohibitive. The big change happening with big data is the capability to leverage massive amounts of data without all the complex programming required in the past.
Many organizations are at a tipping-point in terms of managing large volumes of complex data. Big data approaches will help keep things in balance so businesses don’t go over the edge as the volume, variety, and velocity of data changes. Companies have had a difficult time managing increasing amounts of data that needs to be managed at high speeds.
Organizations had to settle for analyzing small subsets of data which often lacked critical information to get a full picture that the data could reveal. As big data technologies evolve and get deployed, companies will be able to more easily analyze the data and use it to make decisions or take actions.
The real-time aspects of big data can be revolutionary when companies need to solve significant problems. What is the impact when an organization can handle data that is streaming in real time? In general, this real-time approach is most relevant when the answer to a problem is time sensitive and business critical. This may be related to a threat to something important like detecting the performance of hospital equipment or anticipating a potential intrusion risk.
The following list shows examples of when a company wants to leverage this real-time data to gain a quick advantage:
-
Monitoring for an exception with a new piece of information, like fraud/intelligence
-
Monitoring news feeds and social media to determine events that may impact financial markets, such as a customer reaction to a new product announcement
-
Changing your ad placement during a big sporting event based on real-time Twitter streams
-
Providing a coupon to a customer based on what he bought at the point of sale
Sometimes streaming data is coming in really fast and does not include a wide variety of sources, sometimes a wide variety exists, and sometimes it is a combination of the two.
The question you need to ask yourself if you’re moving to real time is this: Could this (problem) be solved with traditional information management capabilities or do you need newer capabilities? Is the sheer volume or velocity going to overwhelm our systems? Oftentimes it is a combination of the two.
So, if you need real-time capabilities, what are the requirements of the infrastructure to support this? The following list highlights a few things you need to consider regarding a system’s capability to ingest data, process it, and analyze it in real time:
-
Low latency: Latency is the amount of time lag that enables a service to execute in an environment. Some applications require less latency, which means that they need to respond in real time. A real-time stream is going to require low latency. So you need to be thinking about compute power as well as network constraints.
-
Scalability: Scalability is the capability to sustain a certain level of performance even under increasing loads.
-
Versatility: The system must support both structured and unstructured data streams.
-
Native format: Use the data in its native form. Transformation takes time and money. The capability to use the idea of processing complex interactions in the data that trigger events may be transformational.
The need to process continually increasing amounts of disparate data is one of the key factors driving the adoption of cloud services. The cloud model is large-scale and distributed.