A Primer on Customer Lifetime Value

A challenge faced by many companies is properly budgeting for marketing campaigns. Recently I've been digging into quite a bit of literature on customer lifetime value in preparation for a project I will be taking on with a client. In this post I will provide an quick overview of methods for modeling customer lifetime value for business. I will also show a simple calculation of customer lifetime value in python. Many of the methods highlighted in this post will be explored more in detail in future posts.

What is customer lifetime value?

Customer lifetime value, or CLTV, is a metric that measures the total value a customer will bring to a company over his/her lifetime. Customer lifetime value is an important metric that helps companies gauge the health of their business.

For starters, the cost of acquiring new customers is typically high for many organizations. Retaining these customers is therefore a huge interest to a company. CLTV allows companies to effectively allocate resources in order to prevent churning.

Much like the RFM technique discussed in my last post, customer lifetime value helps a company identify customers identify customers that are more valuable to the business and optimize their marketing spend. By analyzing historical purchases made by customers, CLTV allows companies to predict what kind of purchases they can expect their customers to make in the future.

How customer lifetime value is calculated?

As I mentioned at the beginning of this post, there are several methods for calculating and predicting customer lifetime value. I'll now go through each method in order of simplest to most complex.

Method 1: Simple Average

This method involves calculating a single value that represents the average customer lifetime value for the entire customer base. Calculating customer lifetime value is pretty straight forward. You'll need to know three things:

  • The average monetary value for each transaction (AOV)
  • The average customer lifespan (ACL)
  • The average purchase frequency rate (APF)

The formula for customer lifetime value is as follows:

Calculating CLTV in Python

As a demonstration I will walk you through the computation of customer lifetime value using the CDNow dataset.

The CDNow dataset contains the purchase history up until the end of June 1998 of a cohort of individuals who made their first purchase at CDNow in the first quarter of 1997. The text file containing the data provides the unique identifier for the customer, the date of the transaction, the number of CDs purchased, and the dollar value of the transaction.

We'll start by reading the text file and creating a dataframe:

with open('CDNOW_master.txt') as f:
    dataset = f.read().split("\n")

records = []
for line in dataset:
    if line == '':
        continue
    row = list(filter(lambda token: token != '', line.split(' ')))
    rec = {}
    rec['customerID'] = row[0]
    rec['purchaseDate'] = datetime.strptime(row[1], '%Y%m%d')
    rec['quantity'] = int(row[2])
    rec['price'] = float(row[3])
    records.append(rec)
transactions_df = pd.DataFrame(records)
CDNow dataframe

Next we will calculate the average purchase frequency:

transactions_per_customer = transactions_df.groupby('customerID')['purchaseDate'].count()
avg_frequency = transactions_per_customer.mean()

Now for the average customer lifespan:

minmax_purchase_dates_by_customer = transactions_df.groupby('customerID')['purchaseDate'].agg(['min','max'])
customer_lifetimes = minmax_purchase_dates_by_customer.apply(lambda row: (row['max'] - row['min']).days, axis=1)
avg_lifetime = customer_lifetimes.mean()

Last but not least the average dollar value per transaction:

transactions_df['total'] = transactions_df['price'] * transactions_df['quantity']
avg_order_value = transactions_df['total'].mean()

Finally we will calculate the customer lifetime value using all the pieces:

customer lifetime value calculation

As you can see the final number amounts to $64,906. This method, while simple can severely overestimate CLTV for many customers. Using pandas we can examine the distribution of the purchases for each customer.

avg_order_value_per_customer = transactions_df.groupby('customerID')['total'].mean()
avg_order_value_per_customer.describe()
Average order value summary statistics

As you can see we have some big spending customers who are making the CLTV much higher than it should be. This is even more apparent when we look at a boxplot of the same data.

Boxplot

Method 2: Cohort Average

This method involves dividing customers into cohorts and computing customer lifetime value for each. A cohort defined as customers who made the purchase during the same time period. Not only does this method provides a better estimation of CLTV just using a simple average, but it also allows us to study the behavior patterns of customers as a result of different marketing campaigns. I'll cover this method more in detail in a future post.

Method 3: Predictive Probabilistic Models

The more sophisticated modeling techniques cited in literature use probability distributions to estimate the frequency, lifetime, and monetary components of the customer lifetime value equation. There are a number of methods that can be used, each ideal for specific business contexts. The Beta Geometric/Negative Binomial Distribution, Pareto/Negative Binomial Distribution, and Gamma-Gamma models are a few of the most popular probabilistic models used by companies today. Look out for future posts on this blog about all of these models.

Method 4: Machine Learning

All the approaches I highlighted utilize recency, purchase frequency, and order values for each customer to predict customer lifetime value. What if you'd like to incorporate other variables beyond these in your prediction? This is where machine learning comes in handy. Machine learning algorithms can find patterns in the data to accurately predict future customer behaviors.

That's all folks!

The code shown in this post can be found here. Until next time!