Marketing Campaign Analysis - Part 1

Background

About the Business

Company A is a debt negotiation business. Clients have substantial debt and are unable to make their minimum monthly payments on the debt accounts. To help manage their debts, Company A creates a dedicated bank account where clients deposit regular monthly payments. Those accounts are then used to negotiate with creditors on behalf of the client to settle the outstanding debt.

Company A's profits come from client-collected fees. When Company A successfully negotiates a settlement agreement these fees are charged to the client's account. The number of settlement agreements Company A can negotiate is proportional to the monthly deposited amount.

Analysis Purpose

Company A want to analyze the efficacy of a recent marketing campaign promoting the financial freedom clients can attain using their debt relief program. The campaign was aimed at both new and current clients with the intent of invigorating account deposits.

The goal of this analysis is to show marketing, sales and operations the success of the campaign. Specifically, the analysis includes:

  • A quantitative assessment of whether the marketing campaign was successful.
  • Recommended adjustments to the campaign strategy to improve performance.

Defining Success

Given our available data, these are the possible success factors that will be considered:

  • Increase in number of deposits during the campaign month
  • Increase in total amount of deposits during the campaign month

Data Overview

There are three datasets provided for the analysis; each is already cleaned and prepared:


client_data.csv: Fictional clients

Column Name Description
client_id Unique client ID
client_geographical_region Client geo-location (U.S. Census definitions)
client_residence_status Client residence status
client_age Client age


deposit_data.csv: Client deposit behavior

Column Name Description
client_id Unique client ID
deposit_type Scheduled vs. Actual deposit
deposit_amount Deposit amount
deposit_cadence Frequency of deposit (scheduled)
deposit_date Deposit date


calendar_data.csv: Calendar reference table

Column Name Description
gregorian_date Gregorian calendar date
month_name Month relative to data set

Notes:

  • Month 1 and 2 are pre-campaign
  • Month 3 is the campaign
  • Month 4 and 5 are post-campaign

Assumptions:

  • There is no seasonality in the results
  • The campaign spend was distributed evenly across Month 3 (i.e., spend on the first day is the same as spend on the last day)
# import packages
import numpy as np
import pandas as pd
import datetime
import plotly.express as px
import matplotlib.pyplot as plt

Data Wrangling

For this first section, we're focusing on combining the relevant sample information into a single dataframe.

The first thing to note is that there are generally two rows for each deposit: a scheduled deposit then the actual deposit. Each of the two rows has the same client_id, date, deposit amount, and cadence.

As such, we're going to create dummy variables for the two deposit types then group the deposits together so there is only one sample per deposit. This will create the following options for each deposit:

Actual Scheduled Description
True True A deposit is scheduled and completed
True False A deposit is scheduled but not completed
False True An unscheduled deposit is completed

At the end of this section, each row will represent a single deposit. The final columns in the dataframe will be as follows:

Column Name Description Value Type/Values
client_id Unique client ID Integer
deposit_amount Amount of Deposit Decimal
deposit_cadence Frequency of Deposit Categorical: Bi-Weekly, Extra, Monthly
deposit_date Date of the deposit datetime
actual_deposit If the deposit was completed Boolean: True/False
scheduled_deposit If the deposit was scheduled Boolean: True/False
month_name Month relative to data set Categorical: 1, 2, 3, 4, 5
client_geographical_region Client geo-location (U.S. Census definitions) Categorical: Midwest, West, South, Northeast
client_residence_status Client residence status Categorical: Rent, Own
client_age Client's age Integer
age_group Client's age in decades Str: 30s, 40s, etc
# read in data
df_clients = pd.read_csv('data/client_data.csv')
df_deposits = pd.read_csv('data/deposit_data.csv')
df_calendar = pd.read_csv('data/calendar_data.csv')

# return basic details on dataframes
var_init_df_names = [df_clients, df_deposits, df_calendar]
for i in var_init_df_names:
    print(i.head())
    print('Total rows: ',i.shape[0])
    print('')
         client_id  ... client_age
0  538839486596724  ...         91
1  321708286091707  ...         83
2  848531901757235  ...         84
3  854405182328779  ...         83
4  769102176031316  ...         85

[5 rows x 4 columns]
Total rows:  46347

         client_id       deposit_type  ...  deposit_cadence deposit_date
0  446495122764671     Actual Deposit  ...          Monthly   2019-10-23
1  446495122764671     Actual Deposit  ...          Monthly   2019-09-23
2  446495122764671  Scheduled Deposit  ...          Monthly   2019-09-23
3  446495122764671  Scheduled Deposit  ...          Monthly   2019-10-23
4  446495122764671  Scheduled Deposit  ...          Monthly   2019-06-23

[5 rows x 5 columns]
Total rows:  480394

  gregorian_date month_name
0     2019-06-01    Month 1
1     2019-06-02    Month 1
2     2019-06-03    Month 1
3     2019-06-04    Month 1
4     2019-06-05    Month 1
Total rows:  153

         client_id  ... client_age
0  538839486596724  ...         91
1  321708286091707  ...         83
2  848531901757235  ...         84
3  854405182328779  ...         83
4  769102176031316  ...         85

[5 rows x 4 columns]
Total rows:  46347

         client_id       deposit_type  ...  deposit_cadence deposit_date
0  446495122764671     Actual Deposit  ...          Monthly   2019-10-23
1  446495122764671     Actual Deposit  ...          Monthly   2019-09-23
2  446495122764671  Scheduled Deposit  ...          Monthly   2019-09-23
3  446495122764671  Scheduled Deposit  ...          Monthly   2019-10-23
4  446495122764671  Scheduled Deposit  ...          Monthly   2019-06-23

[5 rows x 5 columns]
Total rows:  480394

  gregorian_date month_name
0     2019-06-01    Month 1
1     2019-06-02    Month 1
2     2019-06-03    Month 1
3     2019-06-04    Month 1
4     2019-06-05    Month 1
Total rows:  153
# verify types of deposits
df_deposits.deposit_type.unique()
array(['Actual Deposit', 'Scheduled Deposit'], dtype=object)
# get dummies for deposit types
df = pd.get_dummies(data=df_deposits, prefix='', prefix_sep='', columns=['deposit_type'])
print(df.head())
print('Total rows: ',df.shape[0])
         client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  446495122764671           303.0  ...           True             False
1  446495122764671           303.0  ...           True             False
2  446495122764671           303.0  ...          False              True
3  446495122764671           303.0  ...          False              True
4  446495122764671           303.0  ...          False              True

[5 rows x 6 columns]
Total rows:  480394
         client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  446495122764671           303.0  ...           True             False
1  446495122764671           303.0  ...           True             False
2  446495122764671           303.0  ...          False              True
3  446495122764671           303.0  ...          False              True
4  446495122764671           303.0  ...          False              True

[5 rows x 6 columns]
Total rows:  480394
# group scheduled/actual deposits for matching
df = df.groupby(['client_id', 'deposit_amount', 'deposit_cadence', 'deposit_date'], 
    as_index=False).agg({'Actual Deposit': 'max', 'Scheduled Deposit': 'max'})
print(df.head())
print('Total Rows: ',df.shape[0])
      client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  146046305811           247.0  ...          False              True
1  146046305811           247.0  ...           True              True
2  146046305811           247.0  ...           True              True
3  146046305811           247.0  ...           True              True
4  146046305811           247.0  ...           True              True

[5 rows x 6 columns]
Total Rows:  256143
      client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  146046305811           247.0  ...          False              True
1  146046305811           247.0  ...           True              True
2  146046305811           247.0  ...           True              True
3  146046305811           247.0  ...           True              True
4  146046305811           247.0  ...           True              True

[5 rows x 6 columns]
Total Rows:  256143
# merge dataframes
df = pd.merge(df, df_calendar, left_on='deposit_date', right_on='gregorian_date', 
    how='inner', suffixes=('',''))
df = pd.merge(df, df_clients, on='client_id', how='inner', suffixes=('',''))

# clean columns
df['month_name'] = df['month_name'].str.extract(r'(\d+)').astype(int)
df = df.rename(columns={'month_name': 'month', 'Actual Deposit': 'actual_deposit', 
    'Scheduled Deposit': 'scheduled_deposit'})
df = df.drop(['gregorian_date'], axis=1)

df.head()
client_id deposit_amount deposit_cadence deposit_date actual_deposit scheduled_deposit month client_geographical_region client_residence_status client_age
0 146046305811 247.0 Biweekly 2019-06-05 False True 1 Midwest Rent 42
1 146046305811 247.0 Biweekly 2019-06-19 True True 1 Midwest Rent 42
2 146046305811 247.0 Biweekly 2019-07-03 True True 2 Midwest Rent 42
3 146046305811 247.0 Biweekly 2019-07-17 True True 2 Midwest Rent 42
4 146046305811 247.0 Biweekly 2019-07-31 True True 2 Midwest Rent 42
# establish function for grouping ages
def group_by_age(age):
    if age <30:
        return '20s'
    elif age <40:
        return '30s'
    elif age <50:
        return '40s'
    elif age <60:
        return '50s'
    elif age <70:
        return '60s'
    elif age <80:
        return '70s'
    elif age <90:
        return '80s'
    else:
        return '90s and above'

# apply function
df['age_group'] = df['client_age'].apply(group_by_age)
# update categorical columns
df = df.astype({'deposit_cadence':'category', 'month':'category', 
    'client_geographical_region':'category', 'client_residence_status':'category'})

# update datetime column
df['deposit_date'] = pd.to_datetime(df['deposit_date'])

# download final csv for other use
df.to_csv('data/merged_data.csv')

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256143 entries, 0 to 256142
Data columns (total 11 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   client_id                   256143 non-null  int64         
 1   deposit_amount              256143 non-null  float64       
 2   deposit_cadence             256143 non-null  category      
 3   deposit_date                256143 non-null  datetime64[ns]
 4   actual_deposit              256143 non-null  bool          
 5   scheduled_deposit           256143 non-null  bool          
 6   month                       256143 non-null  category      
 7   client_geographical_region  256143 non-null  category      
 8   client_residence_status     256143 non-null  category      
 9   client_age                  256143 non-null  int64         
 10  age_group                   256143 non-null  object        
dtypes: bool(2), category(4), datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 11.2+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256143 entries, 0 to 256142
Data columns (total 11 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   client_id                   256143 non-null  int64         
 1   deposit_amount              256143 non-null  float64       
 2   deposit_cadence             256143 non-null  category      
 3   deposit_date                256143 non-null  datetime64[ns]
 4   actual_deposit              256143 non-null  bool          
 5   scheduled_deposit           256143 non-null  bool          
 6   month                       256143 non-null  category      
 7   client_geographical_region  256143 non-null  category      
 8   client_residence_status     256143 non-null  category      
 9   client_age                  256143 non-null  int64         
 10  age_group                   256143 non-null  object        
dtypes: bool(2), category(4), datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 11.2+ MB

Data Exploration

In this section we'll be reviewing the distributions for different columns to get an understanding of the provided data.

# overview of deposit amounts
df['deposit_amount'].describe()
count    256143.000000
mean        370.823427
std         401.602436
min           0.010000
25%         200.000000
50%         290.000000
75%         454.000000
max       30000.000000
Name: deposit_amount, dtype: float64
# remove outliers for histogram
df_sub1k_deposits = df[df['deposit_amount'] < 1000]

# histogram for deposit amounts < 1000
fig1 = px.histogram(df_sub1k_deposits, x='deposit_amount',
    color_discrete_sequence=px.colors.qualitative.Bold,
    text_auto='.2s', nbins=15)
fig1.update_layout(
    title={
        'text': 'Distribution of Deposits under 1k',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Deposit Amount',
    yaxis_title='Deposit Count',
    legend_title='Deposit Amount',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig1.show()

# overview of client age
df['client_age'].describe()
count    256143.000000
mean         52.125684
std          13.794174
min          21.000000
25%          41.000000
50%          52.000000
75%          62.000000
max         105.000000
Name: client_age, dtype: float64
# boxplot for client ages
fig2 = px.box(df, x='client_age',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig2.update_layout(
    title={
        'text': 'Distribution of Client Ages',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Ages',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig2.show()

# histogram for client age groups < 1000
fig3 = px.histogram(df.sort_values(by=['age_group']), x='age_group',
color='age_group',
color_discrete_map={
                '20s': px.colors.qualitative.Bold[0],
                '30s': px.colors.qualitative.Bold[1],
                '40s': px.colors.qualitative.Bold[2],
                '50s': px.colors.qualitative.Bold[3],
                '60s': px.colors.qualitative.Bold[4],
                '70s': px.colors.qualitative.Bold[5],
                '80s': px.colors.qualitative.Bold[6],
                '90s and above': px.colors.qualitative.Bold[7],},
text_auto='.2s')
fig3.update_layout(
    title={
        'text': 'Distribution Client Age Groups',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Age Groups',
    yaxis_title='Client Count',
    legend_title='Age Groups',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig3.show()

# bar graph of deposit cadences
fig4 = px.histogram(df.sort_values(by=['deposit_cadence']), x='deposit_cadence',
    color='deposit_cadence',
    color_discrete_map={
                'Biweekly': px.colors.qualitative.Bold[0],
                'Extra': px.colors.qualitative.Bold[1],
                'Monthly': px.colors.qualitative.Bold[2]},
    text_auto='.2s')
fig4.update_layout(
    title={
        'text': 'Distribution of Deposit Frequencies',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Deposit Cadence',
    yaxis_title='Deposit Count',
    legend_title='Deposit Cadence',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig4.show()

# bar graph of geo regions
fig5 = px.histogram(df.sort_values(by=['client_geographical_region']), x='client_geographical_region',
    color='client_geographical_region',
    color_discrete_map={
                'Midwest': px.colors.qualitative.Bold[0],
                'West': px.colors.qualitative.Bold[1],
                'South': px.colors.qualitative.Bold[2],
                'Northeast': px.colors.qualitative.Bold[3]},
    text_auto='.2s')
fig5.update_layout(
    title={
        'text': 'Distribution of Client Geographical Regions',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Geographical Regions',
    yaxis_title='Client Count',
    legend_title='Geographical Regions',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig5.show()

# bar graph of residence types
fig6 = px.histogram(df.sort_values(by=['client_residence_status']), x='client_residence_status',
    color='client_residence_status',
    color_discrete_map={
                'Own': px.colors.qualitative.Bold[0],
                'Rent': px.colors.qualitative.Bold[1]},
    text_auto='.2s')
fig6.update_layout(
    title={
        'text': 'Distribution of Client Residence Statuses',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Residence Status',
    yaxis_title='Client Count',
    legend_title='Residence Status',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig6.show()

Initial Analysis

The analyses are primarily temporal line graphs broken out by month. As such, in each line there is a single data point for each month and there is a shaded purple zone that indicates the time period of the campaign (the third month).

Aggregated Number of Deposits per Month

# get count of deposits each month
df_deposit_counts = df.groupby(['month'], as_index=False, observed=True).size()
df_deposit_counts
month size
0 1 42257
1 2 43503
2 3 58674
3 4 56397
4 5 55312
# line graph of deposit counts
fig7 = px.line(df_deposit_counts, x='month', y='size',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig7.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
    ), 
    plot_bgcolor='#F5F5F5'
)
fig7.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig7.show()

# percent change in count, months 2-3, all deposits
var_mo2_counts = df_deposit_counts.loc[df_deposit_counts['month'] == 2, 'size'].values[0]
var_mo3_counts = df_deposit_counts.loc[df_deposit_counts['month'] == 3, 'size'].values[0]
var_pct_change_all_counts = round(var_mo2_counts/var_mo3_counts*100,2)
var_pct_change_all_counts
74.14

Total Amount of Deposits per Month

# get sum deposits per month
df_deposit_amt = df.groupby(['month'], as_index=False, observed=True).agg({'deposit_amount': 'sum'})
df_deposit_amt
month deposit_amount
0 1 16113037.42
1 2 16465435.33
2 3 21889292.74
3 4 20388307.58
4 5 20127752.00
# line graph of summed deposit amounts
fig8 = px.line(df_deposit_amt, x='month', y='deposit_amount',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig8.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
    ), 
    plot_bgcolor='#F5F5F5'
)
fig8.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig8.show()

# percent change in amount, months 2-3, all deposits
var_mo2_amt = df_deposit_amt.loc[df_deposit_amt['month'] == 2, 'deposit_amount'].values[0]
var_mo3_amt = df_deposit_amt.loc[df_deposit_amt['month'] == 3, 'deposit_amount'].values[0]
var_pct_change_all_amts = round(var_mo2_amt/var_mo3_amt*100,2)
var_pct_change_all_amts
75.22
# sum deposits <1k per month
df_sub1k_deposit_amt = df_sub1k_deposits.groupby(['month'], as_index=False, 
    observed=True).agg({'deposit_amount': 'sum'})
df_sub1k_deposit_amt
month deposit_amount
0 1 13864331.54
1 2 14129522.15
2 3 18405323.07
3 4 17554143.58
4 5 17354664.81
# line graph of summed deposit amounts < 1k
fig9 = px.line(df_sub1k_deposit_amt, x='month', y='deposit_amount',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig9.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Limited to Deposits under $1k</sup>',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
    ), 
    plot_bgcolor='#F5F5F5'
)
fig9.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig9.show()

# percent change in amount, months 2-3, all deposits
var_mo2_sub1k_amt = df_sub1k_deposit_amt.loc[df_sub1k_deposit_amt['month'] == 2, 
    'deposit_amount'].values[0]
var_mo3_sub1k_amt = df_sub1k_deposit_amt.loc[df_sub1k_deposit_amt['month'] == 3, 
    'deposit_amount'].values[0]
var_pct_change_sub1k_amts = round(var_mo2_sub1k_amt/var_mo3_sub1k_amt*100,2)
var_pct_change_sub1k_amts
76.77

Impact of Client Geographical Region

# monthly deposit count per geographical region
df_region_count = df.groupby(['month', 'client_geographical_region'], 
    as_index=False, observed=True).size()

# monthly deposit sum per geographical region
df_region_sum = df.groupby(['month', 'client_geographical_region'], 
    as_index=False, observed=True).agg({'deposit_amount': 'sum'})

print(df_region_count.head())
print('')
print(df_region_sum.head())
  month client_geographical_region   size
0     1                    Midwest   7669
1     1                  Northeast   6789
2     1                      South   9822
3     1                       West  17977
4     2                    Midwest   7933

  month client_geographical_region  deposit_amount
0     1                    Midwest      2985379.00
1     1                  Northeast      2426698.10
2     1                      South      3813617.93
3     1                       West      6887342.39
4     2                    Midwest      3015617.64
  month client_geographical_region   size
0     1                    Midwest   7669
1     1                  Northeast   6789
2     1                      South   9822
3     1                       West  17977
4     2                    Midwest   7933

  month client_geographical_region  deposit_amount
0     1                    Midwest      2985379.00
1     1                  Northeast      2426698.10
2     1                      South      3813617.93
3     1                       West      6887342.39
4     2                    Midwest      3015617.64
# line graph for deposit count by region
fig10 = px.line(df_region_count, x='month', y='size',
    color_discrete_map={
                'Midwest': px.colors.qualitative.Bold[0],
                'West': px.colors.qualitative.Bold[1],
                'South': px.colors.qualitative.Bold[2],
                'Northeast': px.colors.qualitative.Bold[3]},
    color='client_geographical_region',
    line_dash='client_geographical_region')
fig10.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019<br><sup>Divided by Geographic Region</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    legend_title='Geographical Regions',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig10.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig10.show()

# pivot df
df_region_count_pct = df_region_count.pivot(index='client_geographical_region', 
    columns='month', values='size').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_region_count_pct['pct_change'] = round((df_region_count_pct['month_3'] - \
    df_region_count_pct['month_2']) / df_region_count_pct['month_2'] * 100,2)

# select relevent columns
df_region_count_pct = df_region_count_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_region_count_pct
client_geographical_region pct_change
1 Northeast 37.56
3 West 36.26
0 Midwest 33.87
2 South 31.23
# line graph for deposit sum by region
fig11 = px.line(df_region_sum, x='month', y='deposit_amount',
    color_discrete_map={
                'Midwest': px.colors.qualitative.Bold[0],
                'West': px.colors.qualitative.Bold[1],
                'South': px.colors.qualitative.Bold[2],
                'Northeast': px.colors.qualitative.Bold[3]},
    color='client_geographical_region',
    line_dash='client_geographical_region')
fig11.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Divided by Geographic Region</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    legend_title='Geographical Regions',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig11.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig11.show()

# pivot df
df_region_sum_pct = df_region_sum.pivot(index='client_geographical_region', 
    columns='month', values='deposit_amount').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_region_sum_pct['pct_change'] = round((df_region_sum_pct['month_3'] - \
    df_region_sum_pct['month_2']) / df_region_sum_pct['month_2'] * 100,2)

# select relevent columns
df_region_sum_pct = df_region_sum_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_region_sum_pct
client_geographical_region pct_change
1 Northeast 35.73
3 West 35.15
0 Midwest 30.31
2 South 29.09

Impact of Client Residence Status

# monthly deposit count per residence status
df_residence_count = df.groupby(['month', 'client_residence_status'], 
    as_index=False, observed=True).size()

# monthly deposit sum per residence status
df_residence_sum = df.groupby(['month', 'client_residence_status'], 
    as_index=False, observed=True).agg({'deposit_amount': 'sum'})

print(df_residence_count)
print('')
print(df_residence_sum)
  month client_residence_status   size
0     1                     Own  30224
1     1                    Rent  12033
2     2                     Own  30953
3     2                    Rent  12550
4     3                     Own  42469
5     3                    Rent  16205
6     4                     Own  40517
7     4                    Rent  15880
8     5                     Own  39769
9     5                    Rent  15543

  month client_residence_status  deposit_amount
0     1                     Own     12292979.87
1     1                    Rent      3820057.55
2     2                     Own     12468493.20
3     2                    Rent      3996942.13
4     3                     Own     16789509.21
5     3                    Rent      5099783.53
6     4                     Own     15547196.13
7     4                    Rent      4841111.45
8     5                     Own     15372575.14
9     5                    Rent      4755176.86
  month client_residence_status   size
0     1                     Own  30224
1     1                    Rent  12033
2     2                     Own  30953
3     2                    Rent  12550
4     3                     Own  42469
5     3                    Rent  16205
6     4                     Own  40517
7     4                    Rent  15880
8     5                     Own  39769
9     5                    Rent  15543

  month client_residence_status  deposit_amount
0     1                     Own     12292979.87
1     1                    Rent      3820057.55
2     2                     Own     12468493.20
3     2                    Rent      3996942.13
4     3                     Own     16789509.21
5     3                    Rent      5099783.53
6     4                     Own     15547196.13
7     4                    Rent      4841111.45
8     5                     Own     15372575.14
9     5                    Rent      4755176.86
# line graph for deposit count by residence type
fig12 = px.line(df_residence_count, x='month', y='size',
    color_discrete_map={
                'Own': px.colors.qualitative.Bold[0],
                'Rent': px.colors.qualitative.Bold[1]},
    color='client_residence_status',
    line_dash='client_residence_status')
fig12.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019<br><sup>Divided by Residence Status</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    legend_title='Residence Statuses',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig12.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig12.show()

# pivot df
df_residence_count_pct = df_residence_count.pivot(index='client_residence_status', 
    columns='month', values='size').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_residence_count_pct['pct_change'] = round((df_residence_count_pct['month_3'] - \
    df_residence_count_pct['month_2']) / df_residence_count_pct['month_2'] * 100,2)

# select relevent columns
df_residence_count_pct = df_residence_count_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_residence_count_pct
client_residence_status pct_change
0 Own 37.20
1 Rent 29.12
# line graph for deposit sum by residence type
fig13 = px.line(df_residence_sum, x='month', y='deposit_amount',
    color_discrete_map={
                'Own': px.colors.qualitative.Bold[0],
                'Rent': px.colors.qualitative.Bold[1]},
    color='client_residence_status',
    line_dash='client_residence_status')
fig13.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Divided by Residence Status</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    legend_title='Residence Statuses',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig13.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig13.show()

# pivot df
df_residence_sum_pct = df_residence_sum.pivot(index='client_residence_status', 
    columns='month', values='deposit_amount').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_residence_sum_pct['pct_change'] = round((df_residence_sum_pct['month_3'] - \
    df_residence_sum_pct['month_2']) / df_residence_sum_pct['month_2'] * 100,2)

# select relevent columns
df_residence_sum_pct = df_residence_sum_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_residence_sum_pct
client_residence_status pct_change
0 Own 34.66
1 Rent 27.59

Impact of Client Age

# monthly deposit count per age group
df_age_count = df.groupby(['month', 'age_group'], 
    as_index=False, observed=True).size()

# monthly deposit sum per age group
df_age_sum = df.groupby(['month', 'age_group'], 
    as_index=False, observed=True).agg({'deposit_amount': 'sum'})

print(df_age_count.head())
print('')
print(df_age_sum.head())
  month age_group   size
0     1       20s   1180
1     1       30s   6826
2     1       40s   9852
3     1       50s  10914
4     1       60s   8216

  month age_group  deposit_amount
0     1       20s       294596.98
1     1       30s      2181426.60
2     1       40s      3562326.77
3     1       50s      4224857.92
4     1       60s      3391942.68
  month age_group   size
0     1       20s   1180
1     1       30s   6826
2     1       40s   9852
3     1       50s  10914
4     1       60s   8216

  month age_group  deposit_amount
0     1       20s       294596.98
1     1       30s      2181426.60
2     1       40s      3562326.77
3     1       50s      4224857.92
4     1       60s      3391942.68
# line graph for deposit count by age group
fig14 = px.line(df_age_count, x='month', y='size',
    color_discrete_map={
                '20s': px.colors.qualitative.Bold[0],
                '30s': px.colors.qualitative.Bold[1],
                '40s': px.colors.qualitative.Bold[2],
                '50s': px.colors.qualitative.Bold[3],
                '60s': px.colors.qualitative.Bold[4],
                '70s': px.colors.qualitative.Bold[5],
                '80s': px.colors.qualitative.Bold[6],
                '90s and above': px.colors.qualitative.Bold[7],},
    color='age_group',
    line_dash='age_group')
fig14.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019<br><sup>Divided by Age Group</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    legend_title='Age Groups',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig14.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig14.show()

# pivot df
df_age_count_pct = df_age_count.pivot(index='age_group', 
    columns='month', values='size').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_age_count_pct['pct_change'] = round((df_age_count_pct['month_3'] - \
    df_age_count_pct['month_2']) / df_age_count_pct['month_2'] * 100,2)

# select relevent columns
df_age_count_pct = df_age_count_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_age_count_pct
age_group pct_change
0 20s 78.35
1 30s 43.19
2 40s 36.40
3 50s 32.47
5 70s 28.65
4 60s 27.68
6 80s 21.28
7 90s and above 17.12
# line graph for deposit sum by age group
fig15 = px.line(df_age_sum, x='month', y='deposit_amount',
    color_discrete_map={
                '20s': px.colors.qualitative.Bold[0],
                '30s': px.colors.qualitative.Bold[1],
                '40s': px.colors.qualitative.Bold[2],
                '50s': px.colors.qualitative.Bold[3],
                '60s': px.colors.qualitative.Bold[4],
                '70s': px.colors.qualitative.Bold[5],
                '80s': px.colors.qualitative.Bold[6],
                '90s and above': px.colors.qualitative.Bold[7],},
    color='age_group',
    line_dash='age_group')
fig15.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Divided by Age Group</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    legend_title='Age Groups',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig15.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig15.show()

# pivot df
df_age_sum_pct = df_age_sum.pivot(index='age_group', 
    columns='month', values='deposit_amount').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_age_sum_pct['pct_change'] = round((df_age_sum_pct['month_3'] - \
    df_age_sum_pct['month_2']) / df_age_sum_pct['month_2'] * 100,2)

# select relevent columns
df_age_sum_pct = df_age_sum_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_age_sum_pct
age_group pct_change
0 20s 76.05
1 30s 42.81
2 40s 37.88
3 50s 31.84
4 60s 26.19
5 70s 24.29
6 80s 19.30
7 90s and above -12.31

Results

The two success measures we focused on were:

  • Increase in number of deposits during the campaign month
  • Increase in total amount of deposits during the campaign month

In this section we'll consider how those goals measured up in the full client group as well as in client subsets.

Campaign Success on Full Client Set

In the full client set, there was a distinct increase in both the number of deposits and summed total of deposits between months two (pre-campaign) and three (campaign period).

The following graph, Deposits from June 2019 to October 2019, demonstrates a 74.4% increase in the total number of deposits between months two and three, with month three being our campaign month (highlighted in purple).

The increase in the number of deposits wasn't just clients making more deposits of smaller amount, though. The total dollar amount deposited increased by a similar percent between months two and three, as well.

The two following graphs, Summed Deposits from June 2019 to October 2019 and Summed Deposits from June 2019 to October 2019 (Limited to Deposits Under $1k), show 74.14% and 76.77% increases in the dollar amount of deposits between months two and three, respectively.

The fact that each of the three measures had an increase of approximately 75% during the month of the campaign versus the preceeding months is a strong indicator that the campaign was a relative success. Of course, we will want to take into consideration the fact that these values again dropped after the campaign, though not to pre-campaign numbers. This will be discussed in the recommended next steps.

Campaign Success on Client Subsets

This section reviews the campaign efficacy on the following client subsets:

  • Client Geographical Regions (Northeast, Midwest, West, South)
  • Client Residence Status (Rent, Own)
  • Client Age Group (divided by decade)

As with the prior review, we'll consider efficacy in terms of an increase in the number of deposits and an increase in the total dollar amount of deposits between months two and three.

It is important to note, though, that intersectionality is not taken into account within these subsets. As such, it is possible for confounding to occur. This is discussed in recommended next steps.

Success Measured by Deposit Count

When reviewing the increase in deposit counts between months two and three, every sub-group in every category had an increase. However, not all increases are created equal. These are the spreads for each of the subsets:

  • Client Geographical Region: 6.33% (Low: 31.23%; High: 37.56%)
  • Client Residence Status: 8.08% (Low: 29.12%; High: 37.20% )
  • Client Age Group: 61.23% (Low: 17.12%; High: 78.35%)

Region % Change in Count
Northeast 37.56%
West 36.26%
Midwest 33.87%
South 31.23%

Residence Status % Change in Count
Own 37.20%
Rent 29.12%

Age Group % Change in Count
20s 78.35%
30s 43.19%
40s 36.40%
50s 32.47%
60s 27.68%
70s 28.65%
80s 21.28%
90s + 17.12%

From this, we can posit that individual regions and residence statuses did not play a huge role in the campaign success. However, client age had a direct impact on the overall success. Specifically, there's a strong trend that younger clients were more likely to act on the campaign than older clients.

Success Measured by Dollar Amount of Deposits

For increases in the dollar amount deposited between months two and three, though, there was not an increase in every sub-group. Rather, while clients in their 90's and above did increase the number of deposits, there was actually a decrease in the total dollar amount deposited. The rest of the subsets did show an increase, though, with the following ranges represented:

  • Client Geographical Region: 6.64% (Low: 29.09%; High: 35.73%)
  • Client Residence Status: 7.07% (Low: 27.59%; High: 34.66% )
  • Client Age Group*: 56.75% (Low: 19.30%; High: 76.05%)

*Removing clients in their 90's and above, who showed a decrease in dollar amounts deposited

Region % Change in Amount
Northeast 35.73%
West 35.15%
Midwest 30.31%
South 29.09%

Residence Status % Change in Amount
Own 34.66%
Rent 27.59%

Age Group % Change in Amount
20s 76.05%
30s 42.81%
40s 37.88%
50s 31.84%
60s 26.19%
70s 24.29%
80s 19.30%
90s + -12.31%

The trends shown in the number of deposits continued in the total dollar amount deposited per sub-group: geographical region and residence status had minimal imact on the campaign efficacy, while client ages had a clear impact on campaign success; younger age groups had better response to the campaign than older age groups.

While the campaign was generally successfull, there are three things mentioned in this analysis which require further review:

  • Decreases in both deposit counts and dollar amounts deposited after the campaign
  • The impact of intersectionality on campaign success
  • Identifying priority clients to get the best ROI on campaign expenses

Post-Campaign Decreases

While the campaign showed initial success we would want to consider long-term effects on clients. An influx in payments over a single month is beneficial to both Company A and clients, but if the accounts quickly revert to historical payment schedules then the overall benefit of the campaign is reduced. As such, we would want to identify a plan to ensure ongoing success.

Impact of Intersectionality

In general, it seems as if client age groups played the bigged role in the overall success of the campaign. However, some deeper analysis to identify the combined subsets of customers on whom the campaign was most effective would benefit the final point of identifying priority clients.

Priority Clients

Finally, after deeper analysis to hone in on the combined client subsets who had the best camapgin reponse, we can refine the campaign to further focus on those client groups. This would help improve the overall efficacy of the campaign. Alogn with that, we could also get a better understanding of the worst-performing subsets to design a plan to invigorate those groups using their own specialized campaigns.