Calculating Lifetime Value with Cohorts
Customer lifetime value is a metric that can be used to gauge the health of a business. In my last blog post on the topic, I showed python code that estimates customer lifetime value by computing a simple average. Today I will show you another approach that is an improvement over the one described previously.
Introducing Cohort Aggregates
This method for estimating customer lifetime value computes multiple metrics, one for each cohort of customers in the dataset. Not only does this provides a much better estimate over a simple average, but it also allows us to study the behavior of customers in response to different marketing campaigns. We will carry out the calculation using the same <link to dataset> dataset used in the last post.
Let's begin the same way as before by loading the dataset
with open('CDNOW_master.txt') as f:
dataset = f.read().split("\n")
records = []
for line in dataset:
if line == '':
continue
row = list(filter(lambda token: token != '', line.split(' ')))
rec = {}
rec['customerID'] = row[0]
rec['purchaseDate'] = datetime.strptime(row[1], '%Y%m%d')
rec['quantity'] = int(row[2])
rec['price'] = float(row[3])
records.append(rec)
transactions_df = pd.DataFrame(records)
Next we'll create a new column in our dataframe that contains total dollar value for each transaction. This column will be used later when we calculate customer lifetime value for each cohort.
transactions_df['total'] = transactions_df['price'] * transactions_df['quantity']
Speaking of cohorts, allow me to explain what I mean by this. For the same of this analysis I'll define a cohort as a group of customers who made their first transaction in the same month. In order to perform this grouping, we first need to figure out what month each person became a customer. The following code snippet will create a dataframe containing the information we need.
first_transactions_df = transactions_df.groupby('customerID')['purchaseDate'].min().reset_index()
first_transactions_df['firstTransactionMonth'] = first_transactions_df['purchaseDate'].dt.month
first_transactions_df['firstTransactionMonth'].value_counts()

Amazing! Now we can merge this dataframe into our transactions dataset so everything is in one place.
transactions_df = pd.merge(transactions_df, first_transactions_df[['customerID','firstTransactionMonth']], on='customerID')

We're now ready to calculate customer lifetime value for each cohort. To do this we will define a function that computes the customer lifetime value for a subset of customers.
def calculate_cltv(df):
transactions_per_customer = df.groupby('customerID')['purchaseDate'].count()
avg_frequency = transactions_per_customer.mean()
minmax_purchase_dates_by_customer = df.groupby('customerID')['purchaseDate'].agg(['min','max'])
customer_lifetimes = minmax_purchase_dates_by_customer.apply(lambda row: (row['max'] - row['min']).days, axis=1)
avg_lifetime = customer_lifetimes.mean()
avg_order_value = transactions_df['total'].mean()
return avg_frequency * avg_lifetime * avg_order_value
All that's left to do now is to aggregate our transactions by the firstTransactionMonth and apply the function we defined earlier to each group.
transactions_df.groupby('firstTransactionMonth').apply(calculate_cltv)

Lo and behold we are done!
That's all folks!
You can find the complete code for this post here. Until next time!