BG/BB CTLV Modeling for Charities

Do you run a professional conference that occurs periodically? Are you an owner of a cruise line or a blood drive? Perhaps you own a church. If any of these apply to you then you're running a business that operates in a noncontractual, discrete-time context. Noncontractual discrete-time contexts are business settings where customer transactions occur at fixed intervals and where a customer can terminate their relationship with the business at anytime. If you wanted to model customer lifetime value for these type of businesses, the Pareto/NBD and BG/NBD would not cut it; both models work only in noncontractual, continuous business settings. What you can use is the Beta Geometric/Beta Bernoulli model, or BG/BB for short. In this post I'll dive into the nuts and bolts of the model and show you how a charity can use it to predict donations.

BG/BB Modeling Assumptions

There are several assumptions that underline the use of the BG/BB model. As done before, I'll go through each assumption and explain what they mean.

Assumption #1: A customer can be "alive" for some unobserved period of time, and become permanently inactive, aka "die".

"Alive" in this context means that the customer is making transactions with the business. Given that this is a non contractual business setting, we are unable to determine whether the customer has ended their relationship with the business or is taking a long term hiatus. It's therefore assumed that the customer becomes inactive after some unknown period of time.

Assumption #2:  The number of transactions that a customer will make while alive follows a binomial distribution.

Let's look at this assumption in the context of a charity that requests donations every month from individuals who have made a contribution in the past. Each time the charity reaches out, some individuals may choose to make a donation and some may choose not to.  Given an individual and a month, the we represent the chance that he or she will make a donation using a probability. We'll call this probability p.

We can use the binomial distribution to model the probability that an individual will make a number of donations over the course of several months.

Binomial Distribution

The number of months and number of donations is represented with n and x respectively. The probability of no donation during a single month is represented by q. This is essentially the inverse of p, the probability that the individual will make a donation.

Assumption #3: The unobserved lifetime of a customer can be described using a geometric distribution.

Going back to the charity scenario I created in the previous section, we assume that each month a donor can "drop out" by some probability. We'll represent the probability using ϴ. If we were curious about the probability a donor will drop out after x number of months we can use the geometric distribution to find out:

Geometric distribution

If we were to plot the geometric distribution using some value for ϴ and a range of values for x, we would see that the probability would decrease as x increases, which reflects what we would expect from a customer who has made many donations in the past.

Assumption #4: Heterogeneity in the transaction probability for each customer follows a beta distribution.

Each donor will have a different likihood of making a contribution each month. Some donors will contribute at every opportunity while others may choose to donate occasionally. The beta distribution can be used to model the donation probabilities for each donor.

Assumption #5: Heterogeneity in the dropout probability for each customer follows a beta distribution

We can also reasonably assume that the dropout probability will vary for each donor. We can also use the beta distribution to model this for each donor.

Assumption #6: The transaction probability and the dropout probability vary independently across customers

What this means is that there's no relationship between the transaction probability and dropout probabilities amongst the customer base. 

The Model

The second and fourth assumptions outlined in the previous section result in the Beta Bernoulli model. Similarly, the third and fifth assumptions lead to the beta-geometric model. Together, they form the Beta Geometric/Beta Bernoulli model. For insights on the derivation of the model as well as the functions used to perform various predictions, you can read the paper here.

BG/BB In Action

As done with the other models discussed thus far, I will  show you a demonstration of BG/BB. I will be using a dataset included in the lifetimes python model that contains recency and frequency data of individuals making donations to a major nonprofit organization located in the United States.

We'll begin by loading the donations dataset.

import pandas as pd
import lifetimes
import lifetimes.datasets as ld

donations_df = ld.load_donations()
Donations Dataset

The dataset provided by lifetimes provides all the information we need to utilize the BG/BB model.

  • The frequency column in the dataset records the number of observed periods where a transaction was made.
  • The recency column shows the time period where the most recent transaction was made
  • The periods column indicates the number of transaction opportunities that were provided.
  • The weight column indicates the number of customers with a given frequency, recency, and period combination. 

We will fit a BG/BB model using the above information.

bgbb_mdl = lifetimes.BetaGeoBetaBinomFitter()
bgbb_mdl.fit(frequency=donations_df['frequency'],
             recency=donations_df['recency'],
             n_periods=donations_df['periods'],
             weights=donations_df['weights'])

Let's say we were interested in identifying  customers that will still be active 3 months into the future. We can use the conditional_probability_alive method to do this:

donations_df['is_alive_3_months'] = bgbb_mdl.conditional_probability_alive(m_periods_in_future=3,
                                       frequency=donations_df['frequency'],
                                       recency=donations_df['recency'],
                                       n_periods=donations_df['periods'])
BG/BB 3 Month Customer Alive Probabilites
BG/BB 3 Month Customer Alive Probability

Assuming we were only interested in customers with a alive probability of 0.7 or greater, we can see that roughly a 1/3 of the customer base will still be active after 3 months.

We can also estimate the number of transactions each customer will make in 3 months time using the conditional_expected_number_of_purchases_up_to_time method:

donations_df['num_tx_3_months'] = bgbb_mdl.conditional_expected_number_of_purchases_up_to_time(m_periods_in_future=3,
                                                                                               frequency=donations_df['frequency'],
                                                                                               recency=donations_df['recency'],
                                                                                               n_periods=donations_df['periods'])
BG/BB 3 Month Predicted Transactions
BG/BB 3 Month Predicted Transactions

Looking at the same group of active customers, we can expect to receive at least 1 donation from these customers within the next 3 months.

That's all folks!

You can find the complete code discussed in this post here.