Marketing Campaign Analysis - Part 1

Background

About the Business

Company A is a debt negotiation business. Clients have substantial debt and are unable to make their minimum monthly payments on the debt accounts. To help manage their debts, Company A creates a dedicated bank account where clients deposit regular monthly payments. Those accounts are then used to negotiate with creditors on behalf of the client to settle the outstanding debt.

Company A's profits come from client-collected fees. When Company A successfully negotiates a settlement agreement these fees are charged to the client's account. The number of settlement agreements Company A can negotiate is proportional to the monthly deposited amount.

Analysis Purpose

Company A want to analyze the efficacy of a recent marketing campaign promoting the financial freedom clients can attain using their debt relief program. The campaign was aimed at both new and current clients with the intent of invigorating account deposits.

The goal of this analysis is to show marketing, sales and operations the success of the campaign. Specifically, the analysis includes:

A quantitative assessment of whether the marketing campaign was successful.
Recommended adjustments to the campaign strategy to improve performance.

Defining Success

Given our available data, these are the possible success factors that will be considered:

Increase in number of deposits during the campaign month
Increase in total amount of deposits during the campaign month

Data Overview

There are three datasets provided for the analysis; each is already cleaned and prepared:

client_data.csv: Fictional clients

Column Name	Description
client_id	Unique client ID
client_geographical_region	Client geo-location (U.S. Census definitions)
client_residence_status	Client residence status
client_age	Client age

deposit_data.csv: Client deposit behavior

Column Name	Description
client_id	Unique client ID
deposit_type	Scheduled vs. Actual deposit
deposit_amount	Deposit amount
deposit_cadence	Frequency of deposit (scheduled)
deposit_date	Deposit date

calendar_data.csv: Calendar reference table

Column Name	Description
gregorian_date	Gregorian calendar date
month_name	Month relative to data set

Notes:

Month 1 and 2 are pre-campaign
Month 3 is the campaign
Month 4 and 5 are post-campaign

Assumptions:

There is no seasonality in the results
The campaign spend was distributed evenly across Month 3 (i.e., spend on the first day is the same as spend on the last day)

# import packages
import numpy as np
import pandas as pd
import datetime
import plotly.express as px
import matplotlib.pyplot as plt

Data Wrangling

For this first section, we're focusing on combining the relevant sample information into a single dataframe.

The first thing to note is that there are generally two rows for each deposit: a scheduled deposit then the actual deposit. Each of the two rows has the same client_id, date, deposit amount, and cadence.

As such, we're going to create dummy variables for the two deposit types then group the deposits together so there is only one sample per deposit. This will create the following options for each deposit:

Actual	Scheduled	Description
True	True	A deposit is scheduled and completed
True	False	A deposit is scheduled but not completed
False	True	An unscheduled deposit is completed

At the end of this section, each row will represent a single deposit. The final columns in the dataframe will be as follows:

Column Name	Description	Value Type/Values
client_id	Unique client ID	Integer
deposit_amount	Amount of Deposit	Decimal
deposit_cadence	Frequency of Deposit	Categorical: Bi-Weekly, Extra, Monthly
deposit_date	Date of the deposit	datetime
actual_deposit	If the deposit was completed	Boolean: True/False
scheduled_deposit	If the deposit was scheduled	Boolean: True/False
month_name	Month relative to data set	Categorical: 1, 2, 3, 4, 5
client_geographical_region	Client geo-location (U.S. Census definitions)	Categorical: Midwest, West, South, Northeast
client_residence_status	Client residence status	Categorical: Rent, Own
client_age	Client's age	Integer
age_group	Client's age in decades	Str: 30s, 40s, etc

# read in data
df_clients = pd.read_csv('data/client_data.csv')
df_deposits = pd.read_csv('data/deposit_data.csv')
df_calendar = pd.read_csv('data/calendar_data.csv')

# return basic details on dataframes
var_init_df_names = [df_clients, df_deposits, df_calendar]
for i in var_init_df_names:
    print(i.head())
    print('Total rows: ',i.shape[0])
    print('')

         client_id  ... client_age
0  538839486596724  ...         91
1  321708286091707  ...         83
2  848531901757235  ...         84
3  854405182328779  ...         83
4  769102176031316  ...         85

[5 rows x 4 columns]
Total rows:  46347

         client_id       deposit_type  ...  deposit_cadence deposit_date
0  446495122764671     Actual Deposit  ...          Monthly   2019-10-23
1  446495122764671     Actual Deposit  ...          Monthly   2019-09-23
2  446495122764671  Scheduled Deposit  ...          Monthly   2019-09-23
3  446495122764671  Scheduled Deposit  ...          Monthly   2019-10-23
4  446495122764671  Scheduled Deposit  ...          Monthly   2019-06-23

[5 rows x 5 columns]
Total rows:  480394

  gregorian_date month_name
0     2019-06-01    Month 1
1     2019-06-02    Month 1
2     2019-06-03    Month 1
3     2019-06-04    Month 1
4     2019-06-05    Month 1
Total rows:  153

         client_id  ... client_age
0  538839486596724  ...         91
1  321708286091707  ...         83
2  848531901757235  ...         84
3  854405182328779  ...         83
4  769102176031316  ...         85

[5 rows x 4 columns]
Total rows:  46347

         client_id       deposit_type  ...  deposit_cadence deposit_date
0  446495122764671     Actual Deposit  ...          Monthly   2019-10-23
1  446495122764671     Actual Deposit  ...          Monthly   2019-09-23
2  446495122764671  Scheduled Deposit  ...          Monthly   2019-09-23
3  446495122764671  Scheduled Deposit  ...          Monthly   2019-10-23
4  446495122764671  Scheduled Deposit  ...          Monthly   2019-06-23

[5 rows x 5 columns]
Total rows:  480394

  gregorian_date month_name
0     2019-06-01    Month 1
1     2019-06-02    Month 1
2     2019-06-03    Month 1
3     2019-06-04    Month 1
4     2019-06-05    Month 1
Total rows:  153

# verify types of deposits
df_deposits.deposit_type.unique()

array(['Actual Deposit', 'Scheduled Deposit'], dtype=object)

# get dummies for deposit types
df = pd.get_dummies(data=df_deposits, prefix='', prefix_sep='', columns=['deposit_type'])
print(df.head())
print('Total rows: ',df.shape[0])

         client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  446495122764671           303.0  ...           True             False
1  446495122764671           303.0  ...           True             False
2  446495122764671           303.0  ...          False              True
3  446495122764671           303.0  ...          False              True
4  446495122764671           303.0  ...          False              True

[5 rows x 6 columns]
Total rows:  480394
         client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  446495122764671           303.0  ...           True             False
1  446495122764671           303.0  ...           True             False
2  446495122764671           303.0  ...          False              True
3  446495122764671           303.0  ...          False              True
4  446495122764671           303.0  ...          False              True

[5 rows x 6 columns]
Total rows:  480394

# group scheduled/actual deposits for matching
df = df.groupby(['client_id', 'deposit_amount', 'deposit_cadence', 'deposit_date'], 
    as_index=False).agg({'Actual Deposit': 'max', 'Scheduled Deposit': 'max'})
print(df.head())
print('Total Rows: ',df.shape[0])

      client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  146046305811           247.0  ...          False              True
1  146046305811           247.0  ...           True              True
2  146046305811           247.0  ...           True              True
3  146046305811           247.0  ...           True              True
4  146046305811           247.0  ...           True              True

[5 rows x 6 columns]
Total Rows:  256143
      client_id  deposit_amount  ... Actual Deposit Scheduled Deposit
0  146046305811           247.0  ...          False              True
1  146046305811           247.0  ...           True              True
2  146046305811           247.0  ...           True              True
3  146046305811           247.0  ...           True              True
4  146046305811           247.0  ...           True              True

[5 rows x 6 columns]
Total Rows:  256143

# merge dataframes
df = pd.merge(df, df_calendar, left_on='deposit_date', right_on='gregorian_date', 
    how='inner', suffixes=('',''))
df = pd.merge(df, df_clients, on='client_id', how='inner', suffixes=('',''))

# clean columns
df['month_name'] = df['month_name'].str.extract(r'(\d+)').astype(int)
df = df.rename(columns={'month_name': 'month', 'Actual Deposit': 'actual_deposit', 
    'Scheduled Deposit': 'scheduled_deposit'})
df = df.drop(['gregorian_date'], axis=1)

df.head()

	client_id	deposit_amount	deposit_cadence	deposit_date	actual_deposit	scheduled_deposit	month	client_geographical_region	client_residence_status	client_age
0	146046305811	247.0	Biweekly	2019-06-05	False	True	1	Midwest	Rent	42
1	146046305811	247.0	Biweekly	2019-06-19	True	True	1	Midwest	Rent	42
2	146046305811	247.0	Biweekly	2019-07-03	True	True	2	Midwest	Rent	42
3	146046305811	247.0	Biweekly	2019-07-17	True	True	2	Midwest	Rent	42
4	146046305811	247.0	Biweekly	2019-07-31	True	True	2	Midwest	Rent	42

# establish function for grouping ages
def group_by_age(age):
    if age <30:
        return '20s'
    elif age <40:
        return '30s'
    elif age <50:
        return '40s'
    elif age <60:
        return '50s'
    elif age <70:
        return '60s'
    elif age <80:
        return '70s'
    elif age <90:
        return '80s'
    else:
        return '90s and above'

# apply function
df['age_group'] = df['client_age'].apply(group_by_age)

# update categorical columns
df = df.astype({'deposit_cadence':'category', 'month':'category', 
    'client_geographical_region':'category', 'client_residence_status':'category'})

# update datetime column
df['deposit_date'] = pd.to_datetime(df['deposit_date'])

# download final csv for other use
df.to_csv('data/merged_data.csv')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256143 entries, 0 to 256142
Data columns (total 11 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   client_id                   256143 non-null  int64         
 1   deposit_amount              256143 non-null  float64       
 2   deposit_cadence             256143 non-null  category      
 3   deposit_date                256143 non-null  datetime64[ns]
 4   actual_deposit              256143 non-null  bool          
 5   scheduled_deposit           256143 non-null  bool          
 6   month                       256143 non-null  category      
 7   client_geographical_region  256143 non-null  category      
 8   client_residence_status     256143 non-null  category      
 9   client_age                  256143 non-null  int64         
 10  age_group                   256143 non-null  object        
dtypes: bool(2), category(4), datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 11.2+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256143 entries, 0 to 256142
Data columns (total 11 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   client_id                   256143 non-null  int64         
 1   deposit_amount              256143 non-null  float64       
 2   deposit_cadence             256143 non-null  category      
 3   deposit_date                256143 non-null  datetime64[ns]
 4   actual_deposit              256143 non-null  bool          
 5   scheduled_deposit           256143 non-null  bool          
 6   month                       256143 non-null  category      
 7   client_geographical_region  256143 non-null  category      
 8   client_residence_status     256143 non-null  category      
 9   client_age                  256143 non-null  int64         
 10  age_group                   256143 non-null  object        
dtypes: bool(2), category(4), datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 11.2+ MB

Data Exploration

In this section we'll be reviewing the distributions for different columns to get an understanding of the provided data.

# overview of deposit amounts
df['deposit_amount'].describe()

count    256143.000000
mean        370.823427
std         401.602436
min           0.010000
25%         200.000000
50%         290.000000
75%         454.000000
max       30000.000000
Name: deposit_amount, dtype: float64

# remove outliers for histogram
df_sub1k_deposits = df[df['deposit_amount'] < 1000]

# histogram for deposit amounts < 1000
fig1 = px.histogram(df_sub1k_deposits, x='deposit_amount',
    color_discrete_sequence=px.colors.qualitative.Bold,
    text_auto='.2s', nbins=15)
fig1.update_layout(
    title={
        'text': 'Distribution of Deposits under 1k',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Deposit Amount',
    yaxis_title='Deposit Count',
    legend_title='Deposit Amount',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig1.show()

# overview of client age
df['client_age'].describe()

count    256143.000000
mean         52.125684
std          13.794174
min          21.000000
25%          41.000000
50%          52.000000
75%          62.000000
max         105.000000
Name: client_age, dtype: float64

# boxplot for client ages
fig2 = px.box(df, x='client_age',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig2.update_layout(
    title={
        'text': 'Distribution of Client Ages',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Ages',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig2.show()

# histogram for client age groups < 1000
fig3 = px.histogram(df.sort_values(by=['age_group']), x='age_group',
color='age_group',
color_discrete_map={
                '20s': px.colors.qualitative.Bold[0],
                '30s': px.colors.qualitative.Bold[1],
                '40s': px.colors.qualitative.Bold[2],
                '50s': px.colors.qualitative.Bold[3],
                '60s': px.colors.qualitative.Bold[4],
                '70s': px.colors.qualitative.Bold[5],
                '80s': px.colors.qualitative.Bold[6],
                '90s and above': px.colors.qualitative.Bold[7],},
text_auto='.2s')
fig3.update_layout(
    title={
        'text': 'Distribution Client Age Groups',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Age Groups',
    yaxis_title='Client Count',
    legend_title='Age Groups',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig3.show()

# bar graph of deposit cadences
fig4 = px.histogram(df.sort_values(by=['deposit_cadence']), x='deposit_cadence',
    color='deposit_cadence',
    color_discrete_map={
                'Biweekly': px.colors.qualitative.Bold[0],
                'Extra': px.colors.qualitative.Bold[1],
                'Monthly': px.colors.qualitative.Bold[2]},
    text_auto='.2s')
fig4.update_layout(
    title={
        'text': 'Distribution of Deposit Frequencies',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Deposit Cadence',
    yaxis_title='Deposit Count',
    legend_title='Deposit Cadence',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig4.show()

# bar graph of geo regions
fig5 = px.histogram(df.sort_values(by=['client_geographical_region']), x='client_geographical_region',
    color='client_geographical_region',
    color_discrete_map={
                'Midwest': px.colors.qualitative.Bold[0],
                'West': px.colors.qualitative.Bold[1],
                'South': px.colors.qualitative.Bold[2],
                'Northeast': px.colors.qualitative.Bold[3]},
    text_auto='.2s')
fig5.update_layout(
    title={
        'text': 'Distribution of Client Geographical Regions',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Geographical Regions',
    yaxis_title='Client Count',
    legend_title='Geographical Regions',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig5.show()

# bar graph of residence types
fig6 = px.histogram(df.sort_values(by=['client_residence_status']), x='client_residence_status',
    color='client_residence_status',
    color_discrete_map={
                'Own': px.colors.qualitative.Bold[0],
                'Rent': px.colors.qualitative.Bold[1]},
    text_auto='.2s')
fig6.update_layout(
    title={
        'text': 'Distribution of Client Residence Statuses',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Residence Status',
    yaxis_title='Client Count',
    legend_title='Residence Status',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    showlegend=False, 
    plot_bgcolor='#F5F5F5'
)
fig6.show()

Initial Analysis

The analyses are primarily temporal line graphs broken out by month. As such, in each line there is a single data point for each month and there is a shaded purple zone that indicates the time period of the campaign (the third month).

Aggregated Number of Deposits per Month

# get count of deposits each month
df_deposit_counts = df.groupby(['month'], as_index=False, observed=True).size()
df_deposit_counts

	month	size
0	1	42257
1	2	43503
2	3	58674
3	4	56397
4	5	55312

# line graph of deposit counts
fig7 = px.line(df_deposit_counts, x='month', y='size',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig7.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
    ), 
    plot_bgcolor='#F5F5F5'
)
fig7.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig7.show()

# percent change in count, months 2-3, all deposits
var_mo2_counts = df_deposit_counts.loc[df_deposit_counts['month'] == 2, 'size'].values[0]
var_mo3_counts = df_deposit_counts.loc[df_deposit_counts['month'] == 3, 'size'].values[0]
var_pct_change_all_counts = round(var_mo2_counts/var_mo3_counts*100,2)
var_pct_change_all_counts

74.14

Total Amount of Deposits per Month

# get sum deposits per month
df_deposit_amt = df.groupby(['month'], as_index=False, observed=True).agg({'deposit_amount': 'sum'})
df_deposit_amt

	month	deposit_amount
0	1	16113037.42
1	2	16465435.33
2	3	21889292.74
3	4	20388307.58
4	5	20127752.00

# line graph of summed deposit amounts
fig8 = px.line(df_deposit_amt, x='month', y='deposit_amount',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig8.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
    ), 
    plot_bgcolor='#F5F5F5'
)
fig8.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig8.show()

# percent change in amount, months 2-3, all deposits
var_mo2_amt = df_deposit_amt.loc[df_deposit_amt['month'] == 2, 'deposit_amount'].values[0]
var_mo3_amt = df_deposit_amt.loc[df_deposit_amt['month'] == 3, 'deposit_amount'].values[0]
var_pct_change_all_amts = round(var_mo2_amt/var_mo3_amt*100,2)
var_pct_change_all_amts

75.22

# sum deposits <1k per month
df_sub1k_deposit_amt = df_sub1k_deposits.groupby(['month'], as_index=False, 
    observed=True).agg({'deposit_amount': 'sum'})
df_sub1k_deposit_amt

	month	deposit_amount
0	1	13864331.54
1	2	14129522.15
2	3	18405323.07
3	4	17554143.58
4	5	17354664.81

# line graph of summed deposit amounts < 1k
fig9 = px.line(df_sub1k_deposit_amt, x='month', y='deposit_amount',
    color_discrete_sequence=px.colors.qualitative.Bold)
fig9.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Limited to Deposits under $1k</sup>',
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
    ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
    ), 
    plot_bgcolor='#F5F5F5'
)
fig9.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig9.show()

# percent change in amount, months 2-3, all deposits
var_mo2_sub1k_amt = df_sub1k_deposit_amt.loc[df_sub1k_deposit_amt['month'] == 2, 
    'deposit_amount'].values[0]
var_mo3_sub1k_amt = df_sub1k_deposit_amt.loc[df_sub1k_deposit_amt['month'] == 3, 
    'deposit_amount'].values[0]
var_pct_change_sub1k_amts = round(var_mo2_sub1k_amt/var_mo3_sub1k_amt*100,2)
var_pct_change_sub1k_amts

76.77

Impact of Client Geographical Region

# monthly deposit count per geographical region
df_region_count = df.groupby(['month', 'client_geographical_region'], 
    as_index=False, observed=True).size()

# monthly deposit sum per geographical region
df_region_sum = df.groupby(['month', 'client_geographical_region'], 
    as_index=False, observed=True).agg({'deposit_amount': 'sum'})

print(df_region_count.head())
print('')
print(df_region_sum.head())

  month client_geographical_region   size
0     1                    Midwest   7669
1     1                  Northeast   6789
2     1                      South   9822
3     1                       West  17977
4     2                    Midwest   7933

  month client_geographical_region  deposit_amount
0     1                    Midwest      2985379.00
1     1                  Northeast      2426698.10
2     1                      South      3813617.93
3     1                       West      6887342.39
4     2                    Midwest      3015617.64
  month client_geographical_region   size
0     1                    Midwest   7669
1     1                  Northeast   6789
2     1                      South   9822
3     1                       West  17977
4     2                    Midwest   7933

  month client_geographical_region  deposit_amount
0     1                    Midwest      2985379.00
1     1                  Northeast      2426698.10
2     1                      South      3813617.93
3     1                       West      6887342.39
4     2                    Midwest      3015617.64

# line graph for deposit count by region
fig10 = px.line(df_region_count, x='month', y='size',
    color_discrete_map={
                'Midwest': px.colors.qualitative.Bold[0],
                'West': px.colors.qualitative.Bold[1],
                'South': px.colors.qualitative.Bold[2],
                'Northeast': px.colors.qualitative.Bold[3]},
    color='client_geographical_region',
    line_dash='client_geographical_region')
fig10.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019<br><sup>Divided by Geographic Region</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    legend_title='Geographical Regions',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig10.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig10.show()

# pivot df
df_region_count_pct = df_region_count.pivot(index='client_geographical_region', 
    columns='month', values='size').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_region_count_pct['pct_change'] = round((df_region_count_pct['month_3'] - \
    df_region_count_pct['month_2']) / df_region_count_pct['month_2'] * 100,2)

# select relevent columns
df_region_count_pct = df_region_count_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_region_count_pct

	client_geographical_region	pct_change
1	Northeast	37.56
3	West	36.26
0	Midwest	33.87
2	South	31.23

# line graph for deposit sum by region
fig11 = px.line(df_region_sum, x='month', y='deposit_amount',
    color_discrete_map={
                'Midwest': px.colors.qualitative.Bold[0],
                'West': px.colors.qualitative.Bold[1],
                'South': px.colors.qualitative.Bold[2],
                'Northeast': px.colors.qualitative.Bold[3]},
    color='client_geographical_region',
    line_dash='client_geographical_region')
fig11.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Divided by Geographic Region</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    legend_title='Geographical Regions',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig11.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig11.show()

# pivot df
df_region_sum_pct = df_region_sum.pivot(index='client_geographical_region', 
    columns='month', values='deposit_amount').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_region_sum_pct['pct_change'] = round((df_region_sum_pct['month_3'] - \
    df_region_sum_pct['month_2']) / df_region_sum_pct['month_2'] * 100,2)

# select relevent columns
df_region_sum_pct = df_region_sum_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_region_sum_pct

	client_geographical_region	pct_change
1	Northeast	35.73
3	West	35.15
0	Midwest	30.31
2	South	29.09

Impact of Client Residence Status

# monthly deposit count per residence status
df_residence_count = df.groupby(['month', 'client_residence_status'], 
    as_index=False, observed=True).size()

# monthly deposit sum per residence status
df_residence_sum = df.groupby(['month', 'client_residence_status'], 
    as_index=False, observed=True).agg({'deposit_amount': 'sum'})

print(df_residence_count)
print('')
print(df_residence_sum)

  month client_residence_status   size
0     1                     Own  30224
1     1                    Rent  12033
2     2                     Own  30953
3     2                    Rent  12550
4     3                     Own  42469
5     3                    Rent  16205
6     4                     Own  40517
7     4                    Rent  15880
8     5                     Own  39769
9     5                    Rent  15543

  month client_residence_status  deposit_amount
0     1                     Own     12292979.87
1     1                    Rent      3820057.55
2     2                     Own     12468493.20
3     2                    Rent      3996942.13
4     3                     Own     16789509.21
5     3                    Rent      5099783.53
6     4                     Own     15547196.13
7     4                    Rent      4841111.45
8     5                     Own     15372575.14
9     5                    Rent      4755176.86
  month client_residence_status   size
0     1                     Own  30224
1     1                    Rent  12033
2     2                     Own  30953
3     2                    Rent  12550
4     3                     Own  42469
5     3                    Rent  16205
6     4                     Own  40517
7     4                    Rent  15880
8     5                     Own  39769
9     5                    Rent  15543

  month client_residence_status  deposit_amount
0     1                     Own     12292979.87
1     1                    Rent      3820057.55
2     2                     Own     12468493.20
3     2                    Rent      3996942.13
4     3                     Own     16789509.21
5     3                    Rent      5099783.53
6     4                     Own     15547196.13
7     4                    Rent      4841111.45
8     5                     Own     15372575.14
9     5                    Rent      4755176.86

# line graph for deposit count by residence type
fig12 = px.line(df_residence_count, x='month', y='size',
    color_discrete_map={
                'Own': px.colors.qualitative.Bold[0],
                'Rent': px.colors.qualitative.Bold[1]},
    color='client_residence_status',
    line_dash='client_residence_status')
fig12.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019<br><sup>Divided by Residence Status</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    legend_title='Residence Statuses',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig12.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig12.show()

# pivot df
df_residence_count_pct = df_residence_count.pivot(index='client_residence_status', 
    columns='month', values='size').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_residence_count_pct['pct_change'] = round((df_residence_count_pct['month_3'] - \
    df_residence_count_pct['month_2']) / df_residence_count_pct['month_2'] * 100,2)

# select relevent columns
df_residence_count_pct = df_residence_count_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_residence_count_pct

	client_residence_status	pct_change
0	Own	37.20
1	Rent	29.12

# line graph for deposit sum by residence type
fig13 = px.line(df_residence_sum, x='month', y='deposit_amount',
    color_discrete_map={
                'Own': px.colors.qualitative.Bold[0],
                'Rent': px.colors.qualitative.Bold[1]},
    color='client_residence_status',
    line_dash='client_residence_status')
fig13.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Divided by Residence Status</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    legend_title='Residence Statuses',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig13.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig13.show()

# pivot df
df_residence_sum_pct = df_residence_sum.pivot(index='client_residence_status', 
    columns='month', values='deposit_amount').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_residence_sum_pct['pct_change'] = round((df_residence_sum_pct['month_3'] - \
    df_residence_sum_pct['month_2']) / df_residence_sum_pct['month_2'] * 100,2)

# select relevent columns
df_residence_sum_pct = df_residence_sum_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_residence_sum_pct

	client_residence_status	pct_change
0	Own	34.66
1	Rent	27.59

Impact of Client Age

# monthly deposit count per age group
df_age_count = df.groupby(['month', 'age_group'], 
    as_index=False, observed=True).size()

# monthly deposit sum per age group
df_age_sum = df.groupby(['month', 'age_group'], 
    as_index=False, observed=True).agg({'deposit_amount': 'sum'})

print(df_age_count.head())
print('')
print(df_age_sum.head())

  month age_group   size
0     1       20s   1180
1     1       30s   6826
2     1       40s   9852
3     1       50s  10914
4     1       60s   8216

  month age_group  deposit_amount
0     1       20s       294596.98
1     1       30s      2181426.60
2     1       40s      3562326.77
3     1       50s      4224857.92
4     1       60s      3391942.68
  month age_group   size
0     1       20s   1180
1     1       30s   6826
2     1       40s   9852
3     1       50s  10914
4     1       60s   8216

  month age_group  deposit_amount
0     1       20s       294596.98
1     1       30s      2181426.60
2     1       40s      3562326.77
3     1       50s      4224857.92
4     1       60s      3391942.68

# line graph for deposit count by age group
fig14 = px.line(df_age_count, x='month', y='size',
    color_discrete_map={
                '20s': px.colors.qualitative.Bold[0],
                '30s': px.colors.qualitative.Bold[1],
                '40s': px.colors.qualitative.Bold[2],
                '50s': px.colors.qualitative.Bold[3],
                '60s': px.colors.qualitative.Bold[4],
                '70s': px.colors.qualitative.Bold[5],
                '80s': px.colors.qualitative.Bold[6],
                '90s and above': px.colors.qualitative.Bold[7],},
    color='age_group',
    line_dash='age_group')
fig14.update_layout(
    title={
        'text': 'Deposits from June 2019 to Oct 2019<br><sup>Divided by Age Group</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Number of Deposits',
    legend_title='Age Groups',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig14.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig14.show()

# pivot df
df_age_count_pct = df_age_count.pivot(index='age_group', 
    columns='month', values='size').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_age_count_pct['pct_change'] = round((df_age_count_pct['month_3'] - \
    df_age_count_pct['month_2']) / df_age_count_pct['month_2'] * 100,2)

# select relevent columns
df_age_count_pct = df_age_count_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_age_count_pct

	age_group	pct_change
0	20s	78.35
1	30s	43.19
2	40s	36.40
3	50s	32.47
5	70s	28.65
4	60s	27.68
6	80s	21.28
7	90s and above	17.12

# line graph for deposit sum by age group
fig15 = px.line(df_age_sum, x='month', y='deposit_amount',
    color_discrete_map={
                '20s': px.colors.qualitative.Bold[0],
                '30s': px.colors.qualitative.Bold[1],
                '40s': px.colors.qualitative.Bold[2],
                '50s': px.colors.qualitative.Bold[3],
                '60s': px.colors.qualitative.Bold[4],
                '70s': px.colors.qualitative.Bold[5],
                '80s': px.colors.qualitative.Bold[6],
                '90s and above': px.colors.qualitative.Bold[7],},
    color='age_group',
    line_dash='age_group')
fig15.update_layout(
    title={
        'text': 'Summed Deposits from June 2019 to Oct 2019<br><sup>Divided by Age Group</sup>',
        'x':0.4,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title='Month',
    yaxis_title='Summed Deposits ($mil)',
    legend_title='Age Groups',
    font=dict(
        family='Arial',
        size=16,
        color='#210D2B'
        ),
    xaxis=dict(
        tickvals=[1,2,3,4,5],
        ticktext=['Jun', 'Jul', 'Aug', 'Sep', 'Oct']
        ), 
    plot_bgcolor='#F5F5F5' 

    )
fig15.add_vrect(x0=2.5, x1=3.5, line_width=0, fillcolor="purple", opacity=0.2)
fig15.show()

# pivot df
df_age_sum_pct = df_age_sum.pivot(index='age_group', 
    columns='month', values='deposit_amount').add_prefix('month_').reset_index().rename_axis(None, axis=1)

# calculate percent change
df_age_sum_pct['pct_change'] = round((df_age_sum_pct['month_3'] - \
    df_age_sum_pct['month_2']) / df_age_sum_pct['month_2'] * 100,2)

# select relevent columns
df_age_sum_pct = df_age_sum_pct.iloc[:,[0,-1]] \
    .sort_values(by='pct_change', ascending=False)

df_age_sum_pct

	age_group	pct_change
0	20s	76.05
1	30s	42.81
2	40s	37.88
3	50s	31.84
4	60s	26.19
5	70s	24.29
6	80s	19.30
7	90s and above	-12.31

Results

The two success measures we focused on were:

Increase in number of deposits during the campaign month
Increase in total amount of deposits during the campaign month

In this section we'll consider how those goals measured up in the full client group as well as in client subsets.

Campaign Success on Full Client Set

In the full client set, there was a distinct increase in both the number of deposits and summed total of deposits between months two (pre-campaign) and three (campaign period).

The following graph, Deposits from June 2019 to October 2019, demonstrates a 74.4% increase in the total number of deposits between months two and three, with month three being our campaign month (highlighted in purple).

The increase in the number of deposits wasn't just clients making more deposits of smaller amount, though. The total dollar amount deposited increased by a similar percent between months two and three, as well.

The two following graphs, Summed Deposits from June 2019 to October 2019 and Summed Deposits from June 2019 to October 2019 (Limited to Deposits Under $1k), show 74.14% and 76.77% increases in the dollar amount of deposits between months two and three, respectively.

The fact that each of the three measures had an increase of approximately 75% during the month of the campaign versus the preceeding months is a strong indicator that the campaign was a relative success. Of course, we will want to take into consideration the fact that these values again dropped after the campaign, though not to pre-campaign numbers. This will be discussed in the recommended next steps.

Campaign Success on Client Subsets

This section reviews the campaign efficacy on the following client subsets:

Client Geographical Regions (Northeast, Midwest, West, South)
Client Residence Status (Rent, Own)
Client Age Group (divided by decade)

As with the prior review, we'll consider efficacy in terms of an increase in the number of deposits and an increase in the total dollar amount of deposits between months two and three.

It is important to note, though, that intersectionality is not taken into account within these subsets. As such, it is possible for confounding to occur. This is discussed in recommended next steps.

Success Measured by Deposit Count

When reviewing the increase in deposit counts between months two and three, every sub-group in every category had an increase. However, not all increases are created equal. These are the spreads for each of the subsets:

Client Geographical Region: 6.33% (Low: 31.23%; High: 37.56%)
Client Residence Status: 8.08% (Low: 29.12%; High: 37.20% )
Client Age Group: 61.23% (Low: 17.12%; High: 78.35%)

Region	% Change in Count
Northeast	37.56%
West	36.26%
Midwest	33.87%
South	31.23%

Residence Status	% Change in Count
Own	37.20%
Rent	29.12%

Age Group	% Change in Count
20s	78.35%
30s	43.19%
40s	36.40%
50s	32.47%
60s	27.68%
70s	28.65%
80s	21.28%
90s +	17.12%

From this, we can posit that individual regions and residence statuses did not play a huge role in the campaign success. However, client age had a direct impact on the overall success. Specifically, there's a strong trend that younger clients were more likely to act on the campaign than older clients.

Success Measured by Dollar Amount of Deposits

For increases in the dollar amount deposited between months two and three, though, there was not an increase in every sub-group. Rather, while clients in their 90's and above did increase the number of deposits, there was actually a decrease in the total dollar amount deposited. The rest of the subsets did show an increase, though, with the following ranges represented:

Client Geographical Region: 6.64% (Low: 29.09%; High: 35.73%)
Client Residence Status: 7.07% (Low: 27.59%; High: 34.66% )
Client Age Group^*: 56.75% (Low: 19.30%; High: 76.05%)

^*Removing clients in their 90's and above, who showed a decrease in dollar amounts deposited

Region	% Change in Amount
Northeast	35.73%
West	35.15%
Midwest	30.31%
South	29.09%

Residence Status	% Change in Amount
Own	34.66%
Rent	27.59%

Age Group	% Change in Amount
20s	76.05%
30s	42.81%
40s	37.88%
50s	31.84%
60s	26.19%
70s	24.29%
80s	19.30%
90s +	-12.31%

The trends shown in the number of deposits continued in the total dollar amount deposited per sub-group: geographical region and residence status had minimal imact on the campaign efficacy, while client ages had a clear impact on campaign success; younger age groups had better response to the campaign than older age groups.

Recommended Next Steps

While the campaign was generally successfull, there are three things mentioned in this analysis which require further review:

Decreases in both deposit counts and dollar amounts deposited after the campaign
The impact of intersectionality on campaign success
Identifying priority clients to get the best ROI on campaign expenses

Post-Campaign Decreases

While the campaign showed initial success we would want to consider long-term effects on clients. An influx in payments over a single month is beneficial to both Company A and clients, but if the accounts quickly revert to historical payment schedules then the overall benefit of the campaign is reduced. As such, we would want to identify a plan to ensure ongoing success.

Impact of Intersectionality

In general, it seems as if client age groups played the bigged role in the overall success of the campaign. However, some deeper analysis to identify the combined subsets of customers on whom the campaign was most effective would benefit the final point of identifying priority clients.

Priority Clients

Finally, after deeper analysis to hone in on the combined client subsets who had the best camapgin reponse, we can refine the campaign to further focus on those client groups. This would help improve the overall efficacy of the campaign. Alogn with that, we could also get a better understanding of the worst-performing subsets to design a plan to invigorate those groups using their own specialized campaigns.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search