Buy Till You Die Modeling with Pareto/NBD

This post is all about Pareto/Negative Binomial Distribution models or commonly referred to as Pareto/NBD. In my previous post I described four business cases where customer lifetime value can be calculated using probability models. Pareto/NBD can be used to model transactions in a non-contractual, continuous business settings. These are cases where customers can not only make transactions at any time but also terminate their relationship with a business at any time but also terminate their relationship with a business at any time as well.

First, we'll take a look at the nuts and bolts of the Pareto/NDB and the assumptions that drive the use of the model. Afterwards we'll look at some code that shows the model in action.

Pareto/NBD Modeling Assumptions

There are six key assumptions underlying the use of the Pareto/NBD model. Let's go though each and cover what they mean.

Assumption #1: A customer can be "alive" for an unobserved period of time, and then die.

By "alive" we mean that the customer is actively purchasing products and services from the business. We consider the customer as "dead" when there is no purchasing activity from the customer for some period of time.

Assumption #2: While "alive", the number of transactions made by a customer can be described using a Poisson distribution.

A Poisson distribution is a probability distribution that is used to describe the chance of one or more events occurring during a time period. Let's suppose for example we have 18 months of transaction data for a grocery store customer. We know that on average that the customer makes 2.5 transactions per week and we'd like to find the probability that the customer would make 5 transactions next week. We can find this out using below formula:

The µ symbol represents the event rate per time period, which in our case is 2.5. The variable k represents the number of events. For our pet problem this will be 5. Punching in the numbers we'll get an approximate number 0.06680.

If we create a plot of the Poisson distribution for little scenario we'll get this:

Poisson distribution plot

From the plot, we can see the distribution peaks at 2.5. We can also see that the probability becomes smaller the further it is away from 2.5, which is consistent with the calculation we made earlier.

As an aside before moving on if we wanted to calculate the probability of events occurring over multiple time periods (as in 12 transactions in 3 weeks), we can use this modified version of the Poisson formula:

Where t represents the number of periods we're interested in.

Assumption 3: Heterogeneity in the transaction rate across customers follows a gamma distribution.

Customers have different shopping habits. If we're looking at a massage business for instance some customers may schedule massage sessions every week, while other customers request massages once in a blue moon. With this in mind, we would need a way to account for the variation in transaction rates for each customer.  That way is the gamma distribution.

Whereas a Poisson distribution can be used to determine the chance of a transaction happening at a time period, a gamma distribution is used to predict the amount of time before a desired number of transactions occurring.

Using the grocery store example from the previous section, if we wanted to determine the probability that it take 3 weeks for the customer to make 5 transactions, this is the formula we would use:

The variables α and β represent the number of events and the event rate. The variable t represents the wait time. The expression in the denominator of the formula is known as the gamma function. You can find our more information about it here.

Assumption 4: Each customer's unobserved lifetime is distributed exponentially.

There's always a degree of uncertainty as to whether or not a customer has taken their business elsewhere.

Let's say that you have two customers. Customer A gets massages consistently roughly one a month over the past 15 months. We could safely assume that this customer will be returning in the next month. However customer B used to get massages once a week but hasn't been present for a few months. In that case we may assume that the customer won't be around in the next month.  We can use an exponential distribution to model this behavior.

The exponential distribution represents the time between events in a Poisson distribution. In our case we can use it to predict when a customer will terminate their relationship with our business.

Assumption 5: Heterogeneity in dropout rates across customers follows a gamma distribution.

Not every customer has an equal likihood of leaving your business. Much like the transaction rates discussed earlier we need to model the variation in dropout rates for customers using a probability distribution. Once again, the gamma distribution is what is used for this.

Assumption 6: Both the transaction rates and the dropout rates vary independently across customers.

This means that the dropout and transaction rate of one customer tells us nothing about the dropout rate and transaction rate about another customer. They don't impact each other. The transaction of one customer does not influence the transaction of another customer.

The Model

So… lets now talk about how these assumptions create the Pareto/NDB model. If we combine the second and fourth assumption from above, we'll get the negative binomial distribution (NBD) model. This model is used to predict the number of transactions k a customer will make while alive, given at a time period t.

When we combine assumptions three and five, we get the pareto distribution, which is used to model the lifetime of the customer.

If you'd like to see the derivation of the entire model you can an excellent resource that breaks down the process here.

The parameters of the Pareto/NBD model, α and β, are estimated using the recency and frequency of the customer transactions.

Now that we've discussed the components of the Pareto/NDB model, let's see it in action.

Pareto/NBD in action

To demonstrate Pareto/NBD we will use the online retail dataset.

We'll begin by loading the dataset:

transactions_df = pd.read_csv('data.csv',encoding='ISO-8859-1')

The Pareto/NDB model that we will be using comes from the lifetimes Python module. In order to fit the model to our dataset, three parameters are required:

  • The frequency of the customers information
  • The amount of time passed since the customer's last purchase, a.k.a recency.
  • The amount of time since the customer's first purchase, a.k.a age

Lifetimes also provides a utility function that can be used to calculate the above information for most datasets. Our dataset, unfortunately, is not formatted in a way that allows the function to properly calculate the frequency. We will therefore need to calculate the information manually.

For the sake of brevity I will only be showing the calculation of the customer age. Read my RFM blog post to find the calculation for recency and frequency.

first_transactions = transactions_df.groupby('CustomerID')['InvoiceDate'].min().reset_index()
first_transactions['age'] = first_transactions['InvoiceDate'].apply(lambda date: (most_recent_transaction - date).days)

We'll merge the recency, frequency, and age into one pandas dataframe before fitting the Pareto/NDB model

recency_frequency_df = pd.merge(pd.merge(recency_df, frequency_df, on='CustomerID').drop('InvoiceDate',axis=1), 
                                first_transactions, on='CustomerID').drop('InvoiceDate', axis=1)

Now for the model fitting

mdl = lifetimes.ParetoNBDFitter()
mdl.fit(recency_frequency_df['frequency'], recency_frequency_df['recency'], recency_frequency_df['age'])

The Pareto/NBD model can be used to generate probabilities that a customer is still alive. Let's generate probabilities for each customer in the dataset using conditional_probability_alive method:

recency_frequency_df['probability_alive'] = mdl.conditional_probability_alive(recency_frequency_df['frequency'],
                                                                              recency_frequency_df['recency'],
                                                                              recency_frequency_df['age'])
Conditional Probabilities of Being Alive

We can also visually inspect a heatmap of probabilities that our customers are alive using the plot_conditional_probability_alive function.

from lifetimes.plotting import plot_probability_alive_matrix
plot_probability_alive_matrix(mdl)
Probability Alive Plot

The heatmap shows the probability that our customers are still alive based on historical frequency and recency values. Recency in the plot is defined as the time between the first and last transaction. From the plot we can see that

  • Customers who have made multiple purchases over a large period of time have a high chance of being alive.
  • New customers with a few transactions are also likely to be still be alive.

The model can also estimate the number of transactions customers will make in the future. We will use the conditional_expected_number_of_purchases_up_to_time method to predict purchases for each customer 20 days into the future:

recency_frequency_df['predicted_transactions'] = mdl.conditional_expected_number_of_purchases_up_to_time(20,recency_frequency_df['frequency'],
                                                                                                         recency_frequency_df['recency'],
                                                                                                         recency_frequency_df['age'])
Expected transactions

Both the predicted probabilities and expected number of transactions give us more insight on the high value customers in the dataset.

That's all folks!

In my next post I will cover BD/NBD model, a probability model that is very similar to the Pareto/NBD model but makes slightly different assumptions about the customer behavior. You can find the entire code discussed in the post here.