by Erika Bakse, Head of BI
Ask Media Group manages over 15 web properties, and coordinates performance marketing programs across most of these sites. Result? LOTS of data.
The Business Intelligence team at Ask is tasked with making sense of all this data. That means wrangling over 200 million web log events daily (over 1 TB of raw data), cleansing, structuring, and combining data sources to give users a holistic view of the business. Here’s how we do it.
Workflow of Data
The bulk of our data is generated by users interacting with our web sites. Our homegrown logging system tracks everything on the page, as well as how the users interact with the site, as a stream of JSON objects. These objects are enriched in-flight with extra goodies like geographic and device information. These objects are streamed into Amazon S3, where they are copied every minute into our Snowflake data warehouse. From there, we cleanse the data and transform it into a traditional dimensional model for analysis. It takes 45 minutes for an event occurring on one of our websites to be available for our business team to analyze.
But that’s only the beginning of the story. Ask boasts a world class SEM program managing a portfolio of hundreds of millions of keywords. The data involved for those campaigns gets imported daily into the data warehouse as well. We process revenue data from the various types of ads we run on our own properties. We import information from our content management system to track our content lifecycle and performance. We also manage internal metadata for our revenue reporting and A/B testing.
Once all this data comes into our data warehouse, we create different data marts to serve various parts of the business. For example, we merge our SEM data with our web log data and revenue data to get a complete view of a user’s experience on a property—how they enter our site based on our marketing efforts, how they interacted with the site, and how we were able to monetize the session.
Using the Data
Our internal users have a variety of options to interact with the data warehouse. Snowflake is accessible via web browser, ODBC driver, JDBC driver, python connector, spark connector, R connector—you name it, we’ve got it. This allows our users to interact with the front end tool they find best fits their needs.
The BI team specifically supports Looker and Alteryx for our users, as well. Looker is a great browser-based data visualization tool that provides fantastic semantic modeling capabilities so users can focus more on answering questions than writing SQL queries. Alteryx functions as our reporting workhorse—large reports containing thousands of data points go to business teams every hour. It also helps us handle the trickier database pulls, as well as providing a front end for our metadata management workflows.
Ultimately our job is to remove any and all barriers between business and data, and we are constantly evaluating the best ways to do just that.