All the technical detail, expertise and advice

Back

Driving Data Loading to find the facts

Published: Tuesday, 24 February 2015 15:24 by Ant Phillips, Senior Developer
Big Data Analytics

It is just a few weeks since we completed the acquisition of Celebrus by IS Solutions. Since then we’ve moved offices from Newbury to Sunbury-on-Thames. Not surprisingly that involved clearing out a whole heap of stuff which has been lying around for far too long. Along the way we unearthed several dusty product release CDs going back over a decade.

Looking back at those earlier releases, it’s interesting to contrast the focus back then with our latest release (v8 update 11). Ten years ago, data collection was focused primarily around reporting.

Lots of totals, averages and aggregations of one kind or another. And the technology matched those requirements. In Celebrus terms, this was, and still is, implemented by our Analytics Server, part of our v8 Big Data Engine. Every so often the Analytics Server fires up and calculates summary information from activity in the last five minutes, hour, day etc. The results of that processing are written to a set of database tables. There’s nothing inherently wrong with the Analytics Server approach, it is simply that the world has moved on.

The focus today is almost exclusively on highly detailed data about individuals, not just summary information. The data also needs to be available in near real-time. This information is crucial to understand each and every journey a customer has had with your brand. Armed with this insight into customer behaviour, a whole slew of possibilities unfold which enable you to understand and optimise your business, whether that be to offer a discount to a valuable customer, or to understand why someone chose a competitor’s product. All these use cases and many more start with data.

As Sherlock Holmes once said:

It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.

So with all this in mind, you will see in our latest release our new Data Loader. The Data Loader is our go-forward architecture for loading data at lightning fast speeds. The Data Loader scales to support huge amounts of traffic on some of the busiest web sites in the world, and can process 10’s of thousands of events per second (sustained). Not just that, it delivers the data into your systems in less than a minute, making new use cases around streaming analytics possible. We support MySQL, Microsoft SQL Server, Oracle and Teradata out of the box. Better yet, the Data Loader includes a pre-defined database schema covering some 75+ tables and models: everything you might want to understand about your digital customers.

In addition we’ve been working hard with the folks over at MongoDB. This release has been fully certified with MongoDB Enterprise Edition. This makes MongoDB the perfect data store for Celebrus customer journey data. This customer journey data is focused towards operational applications. For example, contact centre staff use this information to help them understand a customer’s interactions with your brand.

This is the first release where we have worked with a document database, and it has been a really good experience. The flexibility, simplicity and productivity of MongoDB are tremendous. For example, in MongoDB we simply store all business events in a single collection (rather than lots of normalised relational tables). Each type of business event contains some common attributes (timestamp, session number, customer identifier, event type and so on). An event also contains some data specific to that event, for example the purchase price, quantity and SKU code for a purchase transaction event. All of this just works with MongoDB, no friction, no joins, no complexity. Job done!