Monday, May 23, 2016

Embrace the Change with Event Streams

Information systems thrive on change. We track and record changes of the state of the world, we generate new knowledge by observing changes and perhaps most importantly, we predict future changes by accumulating the knowledge about changes in the past. Tracking and observing all significant changes in large information systems is not a trivial challenge, however.

How do we observe change? As one would expect, there are several different methods for doing just that. You could be an active observer, remembering the state of the world and comparing your knowledge to the current state of the world once in a while. This kind of “polling” approach can be useful in situations where changes do not happen often, the state of the world is relatively easy to observe and small enough to have your own copy of it. Another option would be to have the world “tell” you directly when it changes. This requires very little of you as an observer, but puts expectations on the one changing the state, increasing overall complexity. Plus, if a new observer arrives, the world must adapt to inform both of you on the change. Clearly, we need something in between these two. Something that allows for significant changes of the world state (what we sometimes call events) to be observed without too many restrictive requirements on the world and its observers. This is where event-driven architectures come to play.

Event Stream is a new communication platform in Playtech's Information Management Solution (IMS) that is destined to fill this middle ground and to put forth a common software platform that provides a continuous flow of events from internal event producers to all interested event consumers. The project kicked off at the end of 2015 and is currently being developed in the IMS team located in Tallinn, Estonia, with the help and input from architects and developers from our offices around the globe. As a result of this project, a new method of communication between the technical services of IMS will be delivered. This allows for further decoupling and future expansion of IMS as a set of small services. For more information on what the IMS means to us and how it is being developed, have a look at this article by one of my colleagues, Ivo.

From the perspective of a software engineer, the Event Stream platform is built on top of Apache Kafka, which already is taking roots in the IMS ecosystem. We develop a custom-made API for both consumers and producers, but also provide message wire format information for implementing your own. Most of message serialization is done using the open Apache Avro binary or JSON formats. In the middle of the streams stands an Event Stream Manager service, built on top of Apache Spark that keeps watch on the consistency and general health of the stream.

Does this mean that the way we have been doing inter-service communication is wrong? Not at all, the Event Stream is not the “silver bullet” to replace current communication protocols. It does, however, add a new dimension to communication and broadens the scale of future opportunities. Imagine running a bus company. The direct service-to-service APIs act like bus drivers – they know how to drive the system, how to get from one stop to another. You most definitely need drivers to run a successful bus company. But from the business perspective, it would not hurt to also have passengers or observers. They prefer to sit on the bus and watch the scenery – events – go by, enjoying the ride and coming back for more. This “observer experience” is what the Event Stream brings to applications.

Being an observer opens up a whole new set of opportunities. First of all, it gives us motivation to sort out what exactly are the significant (business) events that happen in IMS, bringing us closer to a model of the gaming business that is unified over all the services of IMS. On the other hand, new knowledge – and what might be even more important, additional business value – can be created on the observer-side (i.e. without the involvement of the real sources of information). For example, think about our core business activity – gaming. Based on gaming, we can assign bonuses and awards, provide player support, involve players in different campaigns and perform a number of other activities. Yet, gaming by itself is a relatively localized process with a restricted set of actions involved. Involving every other potentially interested party in the gaming process directly would (and in some cases, it already has) rapidly increase the complexity of our information systems, out of our grasp in terms of maintainability. To keep this evolution under control, we introduce a new view of the shared timeline of all IMS services, a view where all events that have happened flow through separate “streams” which every other service is able to follow and make decisions upon.

This unified medium of delivery between services also leads us to the path of distributed data. No longer do we need – or want – to have all the data centralized in a place where most of the action happens. Instead, it will become more natural to distribute data between different tiers of accessibility, availability, structure and cost. While real-time systems can work on the usual on-line transaction processing databases, reporting and business intelligence can gather their data from bigger, less normalized data warehouses. Data aggregation can happen immediately from streamed events, but also from historical data stored in an offline distributed data store. A lot of the infrastructure can start using commodity hardware to save on costs.

As one might expect, there are substantial expectations on the quality of such an event delivery solution. Over a million events per minute, being delivered in an eventually consistent manner in a predictable order is a feat that requires specialized software components and design patterns. We aim to keep the overhead on the event producer side to a minimum, while still keeping consistency guarantees and providing near-realtime delivery to consumers. Load balancing between consumers must happen naturally and the internal technical API both consumers and producers are bound to use must be simple, yet technology-agnostic.

Most important of all, we hope to gain a lot of experience from introducing event-driven architecture to IMS and to see how the open-source technologies behind it can shape our understanding of how modern scalable online gaming systems could be built. We aim to do our best sharing this experience with you and perhaps spark an interest in Spark and other interesting technologies out there. If our wary adventures into the wonderful world of event streaming could be interesting for you to read about, feel free to leave a comment. No-one should be left behind.