Did you ever wonder how "Google Search" suggests you when you’re half-typing your query—or how "Cheapest Airlines" starting to appear everywhere after you searched for a country?
The power of real-time streaming data analytics is astonishing indeed. Now, since serverless technology is gaining some momentum, maybe you won’t have to worry about taking risky decisions on your own at all. This post covers the basics of "Serverless Streaming Data Processing" and how it will be an influential component of our decision making in the future.
Life is Data
Life is an endless series of events. The technology around us has made it a stream of digital actions emitting streams of data. If you turn back and investigate your life very carefully, you'll see the never-ending string of data you have generated with your every digital action. It could be a lot to digest at first, but let’s explore some scenarios and try to find what applies to you and me.
- Online banking and convenient e-commerce purchasing capabilities
- Ride-sharing, modern-day travelling and transportation
- Industrial equipment and agricultural use cases like monitored machinery, autonomous tractors and precision farming
- Automated power generation and smart grids, Zero-net Buildings, Smart metering
- Real-estate property recommendations based on geo-location, predictive maintenance
- Financial trading according to the real-time changes in the stock market, analytical risk management
- Movies, songs and other digital media with a better experience depending on the demographics, preference, and emotions
- Improved web and mobile application experience based on usage
- Dynamic and personalised experiences in online gaming
- Enhanced social media experiences with personalised marketing and predictive analytics
- Telemetry from connected devices, or remote data centres from geospatial or spatial services like weather, resource assessment
- Sports analytics to enhance the players’ performance reducing health risks
All these events produce data—lots of it. Due to the frequency of this data emission, it has become an increasing burden to the digital space.
85% of today’s data is not used, the scope of the challenge becomes clear.
This quote from IBM’s whitepaper about how they drive advanced analytics with HortonWorks—published back in May 2018—highlights the pressing issue of under-utilisation of data.
Data that is poured continuously by a gazillion sources every second has become a fact we can’t just ignore. Big Data disciplinary was an eye-opener for the tech world to apply this once irritating data to do something useful. This same irksome data is collected and analysed by a new species, namely data scientists . Due to the nature of continuity and often being in small sizes (order of Kilobytes) these data flows—usually referred by the moniker streaming data—are collected simultaneously as records and sent in for further processing.
From Streaming Data to Decisions
A streaming data processing structure usually comprises of two layers—a storage layer and a processing layer. The former is responsible for ordering large streams of records and facilitating persistence and accessibility at high speeds. The processing layer takes care of data consumption, executing computations, and notifying the storage layer to get rid of already processed records. Data processing is done for each record incrementally or by matching over sliding time windows. Processed data is then, subjected to streaming analytics operations and the derived information is used to make context-based decisions. For instance, companies can track public sentiment changes on the products by analysing social media streams continuously—world's most influential nations can intervene in decisive events like presidential elections in other powerful countries—mobile apps can offer personalised recommendations for products based on geo-location of devices, user emotions.
Most applications collect a portion of their data at the outset to produce simple summary reports and take simple decisions such as triggering alarms or calculating a moving average value. When the time flies by, these become more and more sophisticated, and companies might want to access profound insights to perform intricate activities in turn with the aid of Machine Learning algorithms and data analysis techniques. The continual growth of data has made data scientists work around the clock to come up with trailblazing solutions to utilise as much data as possible to fabricate alternate futures with better decisions.
Adoption of the ideal cloud provider to fit organisational requirements can be overwhelming. However, all the major cloud service providers are equipped with competitive options to accommodate stream processing due to its impacting ubiquity. Here's a list of commonly used serverless services to bolster enterprise-grade applications, highly relying on streaming data.
Amazon Web Services:
- Amazon Kinesis: Platform offering services to load and analyze streaming data
- Amazon Athena: Fully-managed interactive query service to analyze data using standard SQL
- Amazon DynamoDB Streams: Fast and flexible NoSQL database service that captures time-ordered sequences of item-level modifications
- Amazon Machine Learning: Managed service to build ML models and generate smart predictions
- AWS Glue: Fully-managed extract, transform, and load service that enables easy preparation of data for analytics
- BigQuery: Fast, fully-managed enterprise data warehouse for analytics at any scale
- Cloud Dataflow: Stream and batch data processing service with reliability and expressiveness
- Cloud Pub/Sub: Scalable foundation for stream analytics that ingests event streams in real time
- Cloud ML Engine: Managed service that allows building superior ML models to production
- Azure Stream Analytics: Fully-managed complex event processing engine for real-time streaming data
- Event Hubs: Fully-managed streaming data platform that enables event ingestion
- IBM Event Streams: Event-streaming platform based on Apache Kafka
- IBM Cloud SQL Query: Interactive querying service for analyzing data
World As We See It
Many companies use insights from stream analytics to enhance the visibility of their businesses which allows them to deliver customers a personalised experience. Additionally, near real-time transparency gives these firms the flexibility to promptly address emergencies. The emerging serverless architecture has driven all the leading cloud service platforms to present complementary solutions. Stream processing was made available for serverless application development with fully-managed, cloud-based services for real-time data processing over large Distributed Data Streams.
Netflix Television Network
Netflix, the leading online television network in the world, developed a solution which centralises their flow logs using Amazon Kinesis Streams. As a system processing billions of traffic flows every day, this eliminates plenty of complexity for them because of the absence of a database in the architecture. Due to the high scalability and lightning speed, they can discover and address issues as they arise, monitor the application on a massive scale. With the upgraded recommendation algorithm, video transcoding, and licensing popular media, this subsequently grants a seamless experience to the subscribers. With the exponential growth of the subscribers, the company’s responsibilities increase by the day. However, nothing seems to be a problem for Netflix for many years to come since they are considered to have a sound decision-making model.
Thomson Reuters Professional Services
As a leading source of integrated and intelligent information for businesses and professionals, Thomson Reuters provide their services to decision makers in a wide range of domains like financing and risk, science, legal, technology. This company built an in-house analytics engine to take full control of data and moved to AWS because they were familiar with its capabilities and scale. The new real-time pipeline attached to Amazon Kinesis stream produces better results in perceptive customer experience with accurate economic forecasts, financial trends for beneficiaries including a range of government activities.
GO-JEK Ride-hailing and Logistics
Jakarta has become a heavily congested city where the motorcycle deemed the most efficient mode of transport. To exploit this business opportunity, GO-JEK—one of the few unicorn businesses in Southeast Asia—started as a call centre for motorcycle taxi bookings. However, to meet the demand in exceeding expectations, the company had to consider expansion. Now with the support of Google Cloud Professional Services, the business architecture built on Cloud Dataflow for stream inference enables them to predict changes in demand effectively.
Limitations of Serverless Stream Processing
Serverless stream processing is increasingly becoming a vital part of decision-making engines. However, with the current set of features, it’s not the ideal solution for some scenarios. Implementing real-time analytics for sliding windows and temporal event patterns is not a course for the faint-hearted.
The best way to assimilate never-ending data of this magnitude is through real-time dashboards which requires additional data organisation and persisting. These manoeuvres introduce undesirable latency and data management issues into the context. However, technology is evolving and trying to catch up to the speeds with integration using advanced cloud data management techniques to produce materialised views.
Stream Processing often uses a time-based or record-based window to be processed in contrast to the batch-based processing, which can lead to challenges in use cases that require query re-execution.
Nowadays, application requirements grow beyond aggregated analytics. Increasing the window size seems to be an appropriate temporary solution but, it develops another intractable problem—Memory Management. Modern-day solutions usually provide advanced memory management and scheduling techniques to overcome this, but the world will see further improvements.
All in all, it’s apparent that serverless stream processing has been playing a prominent role around us without us even knowing. With the power of serverless data stream processing, applications can evolve from traditional batch processing to real-time analytics. The revelation of profound insights will result in effective decision making without having to manage infrastructure. Even today, many organizations practise orthodox decision-making strategies based on the analytics derived using the big data clusters that belonged to THE PAST. New horizons of serverless and real-time data processing are now equipped with the power to make effective decisions and create a—more productive, relevant and most importantly secure—world around you.
Chamath is an Associate Technical Lead at SLAppForge, a company that provides powerful next-generation serverless application developer toolkit. He’s a strong supporter of Serverless and loves data analytics. You might catch him playing table tennis, badminton or having a swim, when he’s not making the world a better place.