Case Study: E-Commerce Personalization
Working with a Fortune 500 company, built an auto-ml pipeline to assess potential for personalization and publish the best performing approach.
Outcome: Effective Personalization at Reduced Cost
Problem Description
A Fortune 500 company expressed a desire to develop a mechanism to assess the potential for personalization in a systematic way. The client is a large organization with a B2C e-commerce website. The potential to develop personalized customer experiences across their website is large. An A/B test is typically used to compare two or more ways of serving a customer experience. When a personalized experience is introduced, it is usually compared using an A/B test to what was there before. This comparison is conducted using business KPIs. While A/B tests are a great tool for data-driven decision making they have their limitations:
- When an A/B test is conducted web traffic is re-routed and randomly assigned to one of the experiences being tested. If any of the experiences are loosing revenue, a test might end up affecting the bottom line.
- The more approaches for personalization are compared to each other, the more data needs to be gathered. So, testing many options at the same time is also potentially risky and expensive.
- An A/B test only compares two or more things. It does not tell us what it is that's relevant in the customer's decision making.
- Within a large organization, customer experiences are tweaked all the time. This means there is typically a much larger need for testing, compared to what can be reasonably accomodated wihout making most of the web traffic part of some randomized test.
The goal of this project was to enable internal clients to evaluate personalization strategies while minimizing the number of live tests that are conducted.
Reliancy Solution
Over the course of three years, we worked with an internal team which was tasked to deliver an experimentation platform. Our contributions involved:
Research and Education
Research on the state-of-the-art approaches in offline policy evaluation. These are methods that would allow for comparisons of personalization approaches offline, without running a dedicated A/B test. Educated the team and leadership on what's possible.
Development of a Python-Based Offline Policy Evaluation Pipeline
Given the data from an initial A/B test, we lead the development of a Python pipeline to extract relevant features and evaluate the potential for personalization. The output of this pipeline is either that there is no potential for personalization or specific methods that are expected to perform better the current approach in use. The team that we worked with developed an entire framework for running Bayesian A/B test and for serving personalization methods.
Productionalizing the Pipeline and Continued Support
We worked with the engineers on the team to integrate the pipeline into their experimentation framework.
Impact
Our contributions enabled the experimentation team to deliver for their internal clients:
Helped Deliver an Enterprise-Wide Experimentation Capability
The experimentation team developed a Java framework for conducting experiments. The pipeline we worked on was invoked offline and helped internal clients decide what approaches for personalization to pick, while reducing the number of A/B tests that are conducted.
Helped Several Internal Clients Deliver Personalized Experiences
Over the course of three years several internal clients relied on the experimentation framework to test hypothesis. In multiple instances approaches were identified that outperformed what was used previously. KPIs were usually tied to engagement or sales. Our contributions helped in improving outcomes.
Helped Internal Clients Identify Existing Approaches Which Were Useless
In some instances, teams were deploying approaches which turned out to be not effective. In the pipeline we would find deployed approaches which were not any better than serving experiences at random.