Big data analytics is all about real-time data visualization, transformation of data, real-time analysis and reporting to enable business-in-motion to take decisions that can improve operational efficiency, check and act real-time on security threats and more. Big Data analytics is not new, is it? For years, scientists have been launching satellites, and defence forces have launched guided missiles in war operations simulations. The entire path of airline flights is monitored and manoeuvred in real-time, and decisions to change direction, fire a booster engine or in worst cases abort a failed mission are all taken on real-time. The volume of data – flight simulation or missile launch – that is monitored is very voluminous.
Years ago, I used to support systems that monitored satellite launch vehicles. I have witnessed huge volumes of data received from the launch vehicles (often in simulation mode and periodically on mission-critical launches), and the scientists routinely analyzed the data and made sense out of it. I was not able to, but held these scientists in awe for their amazing engineering research and analysis.
Datawatch is one of the pioneering companies with a product that enables data visualization, parsing, selection of contextual data, transformation data from unstructured to structured formats. It helps in real-time analysis against business rules and then in-motion perform contextual search on historical data (or data in rest) from data warehouses or in-memory databases and deliver trends, spot anomalies or patterns and report what the business needs to know in real-time to make decisions. Amazing, is it not?
Some of the major commercial industries that consume Big Data products include:
- Capital Markets to perform pre and post trade analytics, options trading, market making, risk management etc.
- Oil and gas companies to perform data analysis that can anticipate interruptions to drilling, maximise asset utilization through real-time analysis of data returned from SCADA systems on well-heads, perform “identity traces” to identify seismic trace signatures which may have been previously overlooked and more.
- Telecommunications industry to maximise network performance and utilization, perform deep packet monitoring for malicious activity, capacity analysis and real time capacity allocation / de-allocation on-demand
Then there is IoT (internet of things) that promises humungous volumes of data through machine-to-machine communications and includes all kinds of trigger-based, schedule-based communications to operate, monitor, manage and even self-heal to a large extent in an automated fashion.
What used to be significantly challenging to build and maintain – an enterprise data warehouse and business intelligence system that consumed multiple source systems, but predominantly using data-at-rest, largely structured data in database tables or delimited file formats – now seems dwarfed in terms of complexity and size when compared to the fast emerging volume of real-time data handling needs from a variety of sources far exceeding the traditional BI and analytics systems. The requirement is now to consume data from reports and texts, pdf files, html files, EDI streams, print spools, besides deep packet data from the networks and several more.
Here I would like to reflect back on one of the initiatives I took up several years ago working for a global leader in telecommunications. We were handling numerous complaints by the users of a web-based user-interactive system. The users were not happy with the responsiveness, frequent time-outs of the transactions and inability to book orders and view the statuses due to frequent and unpredicted transaction failures. This led to a deeper analysis; searching for patterns and anomalies, which resulted in the initiative of taking an all new dimension. I went on to build a real-time host based security analytics system which queried continuously the real time stream of data, analyse, compute, aggregate, report and remediate yet making sure the consumption of compute resources was minimal. This proved to be a game changer and helped restore customer confidence in the organizations B2B application systems.
The success of the solution delivered three things: (1) data discovery and visualization in real-time, apply filters for results, identification of anomalies in in-motion data in real time, (2) ability to handle data streams, compare observations with signatures learnt and stored (data at rest) and (3) the effective management of which helped improve customer satisfaction, revenue and reduction of support costs. The most interesting fact from this initiative was that there was no high performance server or storage infrastructure that was needed. Is that not what the promise of Big Data Analytics all about?
Raju is a former Happiest Mind and this content was created and published during his tenure.