Chattermill Insights

Get our latest insights right in your inbox

by Sam Frampton on 15 Aug 2018

The Customer Experience Tech Stack - Aggregation Tools

Network of Nodes
Network of Nodes

Before you can do anything with all that data you’re collecting, it needs to be aggregated. And having the right tools to store, manipulate, query and retrieve that data means it will be ready for analysis, putting you one step closer to using it to create your CX.

So what aggregation tool should you go with? There are enough options out there to make this an overwhelming decision for most business owners and managers.

All of the solutions we considered have an MPP (massively parallel processing) architecture, which means that your data and queries get distributed across several nodes for the most efficient storage and processing. For further discussion on MPP you can find out more here.

We also limited our options to fully managed solutions. Some warehousing solutions require a license to be purchased. Rather than go through the trouble of buying a license and setting the warehouse up on your cloud or on-premise servers, these fully managed solutions enable you to create an account and get started simply.

Amazon S3

S3

When it comes to data storage and aggregation, there’s one name you’re going to come across over and over: Amazon. In fact, according to data from Synergy Research, in the decade since its launch, AWS has grown into the most successful cloud infrastructure company on the planet, garnering more than 30 percent of the market. Amazon Web Services (AWS) provides storage and aggregation options that will suit just about any business’ needs.

One of those is S3, a low-cost cloud-based object storage service which currently stores trillions of objects. Yes, trillions. S3 was one of the earliest drivers of AWS, and it shows in just how ubiquitous the service has become.

Enterprise users will likely appreciate that S3 offers pricing tiers depending on storage usage, numbers of requests and data transfers, offering the flexibility businesses need to create an aggregation plan that suits their needs — and their budget. Prices also vary based on the choice for the location of your data storage. As of 2018, prices range from $0.023 to $0.0405 per Gb per month for the first 50 terabytes of data.

To query data stored in S3 you can use Amazon Athena an interactive service to S3 that makes it easy to analyze data using standard SQL. Given SQL’s wide usage, the vast majority of teams within your organisation will already be comfortable using SQL to query data. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing data immediately.

Notable S3 customers include Netflix, Microsoft, Dropbox, Bitcasa, Tumblr and more. Amazon has created a storage and aggregation service that has the flexibility and reliability to be one of the most-used on the market, making it a solid first choice.

Amazon Redshift

Redshift

Amazon Redshift is a fully managed, high-performance MPP data warehouse solution in the cloud that can scale up to a petabyte or more, while costing an order-of-magnitude less than legacy data warehousing solutions.

It’s one of the more accessible data warehouse solutions allowing anybody to deploy a data warehouse solution without extensive setup costs or deep technical expertise. A report from June 2017 by Forrester mentions that Redshift has over 5,000 deployments with plenty of success stories. That’s strong uptake, considering it has only been available since 2012.

Amazon Redshift uses industry-standard SQL and is based off PostgresSQL 8.0.2. It’s worth noting that there is some difference between PostgresSQL and Amazon Redshift SQL, most of which are there to optimise Redshift performing online analytical processing (“OLAP”) queries efficiently over petabytes of data. Another bonus is that it’s easy to get data into Redshift by being based on PostgresSQL, Redshift implements most of its COPY statements, which works well. There are also a good amount of client-side libraries like python’s psycopg to connect to Redshift.

Redshift comes with Amazon’s reliability, and partners with other businesses to provide cross-platform business intelligence, data integration and consulting. Users of Actian, Looker, MicroStrategy, Qlik, Tableau, Stitch and more will find seamless integration with Redshift, as those popular tools all belong to the “Amazon Partner Network.” That makes Redshift an easy tool to plug into your CX process if you’re also using technology to collect, analyse and visualise your data.

Redshift is the price leader. Business can try Redshift with a free trial and then can choose from a number of pricing options, including on-demand usage or “Spectrum,” which charges by a number of bytes of data scanned. There’s also the option to save up to 75 per cent over on-demand rates by committing to using Redshift for one to three years. There are no upfront costs for Redshift — you pay based only on your usage, making Redshift inexpensive compared to its peers. If you want to store and analyse a large volume of data as cheaply as possible, then Redshift is the cheapest way to do it.

For more information on Redshift it’s worth viewing this quora thread.

Google Big Query

Google Bigquery

BigQuery is an impressive offering from Google and should be on your shortlist of analytics warehouses. Google also offers a free tier making it a low-risk proposition to try. Every month Google gives you a free terabyte of queries, and you’ll be able to load your data at no cost — up to 10GB.

Like Amazon Redshift and S3, Google BigQuery provides a managed cloud-based data warehouse, for most companies there no way they could have a physical server warehouse in this office: A managed cloud-based service takes the stress away from managing your servers. You can just set up an account and get started.

At its core, Bigquery is an externalisation of Dremel, a query service used at Google to enable the handling of data for products like Google Search, Youtube, Google Docs and Gmail on a daily basis. That’s a colossal amount of data to be processing at any one time. Dremel allows Google employees to blaze through the data at impressive speeds and draw insight. Since 2006 it has been Google’s semi-secret weapon.

BigQuery is the public implementation of Dremel that enables third-party developers access to the core set of features to Dremel. Of note, is Google BigQuery speed. It can even execute a complex regular expression text matching on a large logging table that consists of about 35 billion rows and 20 TB, in merely tens of seconds. To access the platform you can use, Big Query Rest API, command line interface, web UI, Access control and Google Cloud SDK & Docker.

The user can load data managed in columnar storage from an array of data sources including Google’s ecosystem of marketing and storage products. To visualise the data, Google offers an extensive variety of integrations. Including some of our favourites such as Looker, Tableau and QlikView.

Unlike Amazon’s Redshift and S3, it does not use the industry standard SQL language but a similar language called Bigquery SQL dialect. Team members from a wide-ranging set of teams need to regularly query databases, for non-technical teams, having to learn the nuances of a new language could cause difficulties in accessing data.

Keep Learning


  • img

    Sam Frampton

    Growth Marketing at Chattermill - A.I for better customer experience