functions module

The functions module contains core program routines related to building and working with the data model and associated files. General definitions: dataset “month_form” is length n months in model “short_form” data has a length equal to the number of employees “long_form” data is the length of the cumulative sum non-retired employees for all months in the data model (could be millions of rows, depending on workgroup size and age)

functions.add_zero_col(arr)[source]

Add a column of zeros as the first column in a 2d array. Output will be a numpy array.

example:

input array:

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

output array:

array([[ 0,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 0, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [ 0, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [ 0, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

inputs

arr (array): 2-dimensional numpy array

functions.age_correction(month_nums_array, ages_array, retage)[source]

Long_Form

Returns a long_form (all months) array of employee ages by incrementing starting ages according to month number.

Note: Retirement age increases are handled by the build_program_files script by incrementing retirement dates and by the clip_ret_ages function within the make_skeleton script.

inputs

month_nums_array (array): gen_month_skeleton function output (ndarray)
ages_array (array): starting_age function output aligned with long_form (ndarray) i.e. s_age is starting age (aligned to empkeys) repeated each month.
retage (integer or float): output clip upper limit for retirement age

Output is s_age incremented by a decimal month value according to month_num (this is candidate for np.put refactored function)

functions.align_fill_down(l, u, long_indexed_df, long_array)[source]

Data align current values to all future months (short array segment aligned to long array) This function is used to set the values from the last standalone month as the initial data for integrated dataset computation when a delayed implementation exists.

uses pandas df auto align - relatively slow TODO (for developer) - consider an all numpy solution

inputs

l, u (integers): current month slice indexes (from long df)
long_indexed_df (dataframe): empty long dataframe with empkey indexes
long_array (array): long array of multiple month data (orig_job, fur_codes, etc)

declare long indexed df outside of function (input). grab current month slice for array insertion (copy). chop long df to begin with current month (copy). assign array to short df. data align short df to long df (chopped to current month and future). copy chopped df column as array to long_array return long_array

functions.align_next(this_index_arr, next_index_arr, these_vals_arr)[source]

“Carry forward” data from one month to the next. Compare indexes (empkeys) from one month to the next month. When matching index is found, assign corresponding index value to new result array. Effectively finds the remaining employees (not retired) in the next month and copies the target column data values for them from current month data into the next months data.

inputs

this_index_arr (array): current month index of unique employee keys
next_index_arr (array): next month index of unique employee keys (a subset of this_index_arr)
these_vals_arr (array): the data column segment (attribute) to carry forward

functions.anon_dates(df, date_col_list, max_adj=5, positive_only=True, inplace=False)[source]

Add (or optionally, add or subtract) a random number of days to each element of a date attribute column.

inputs

df (dataframe)

short-form (master list) pandas dataframe containing a date attribute column

date_col_list (list)

name(s) of date attribute column(s) to be adjusted (as a list of strings)

Example:

['ldate', 'doh', 'dob']

max_adj (integer)

the maximum number of days to add (or optionally subtract) from each element within the date column

positive_only (boolean)

if True limit the range of adjustment days from zero to the max_adj value. If False, limit the range of adjustment from negative max_adj value to positive max_adj value.

inplace (boolean)

if True, insert the results directly into the date column(s) of the input dataframe. Caution: make a copy first!

functions.anon_empkeys(df, seq_start=10001, frame_num=10000000, inplace=False)[source]

Produce a list of unique, randomized employee numbers, catogorized by employee group number code. Output may be used to anonymize a dataset empkey column.

Dataframe input (df) must contain an employee group (eg) column.

inputs

df (dataframe): short-form (master list) pandas dataframe containing an employee group code column
seq_start (integer): this number will be added to each employee group cumulative count to “seed” the random employee numbers. These numbers will be shuffled within employee groups by the function for the output
frame_num (integer): This number will be multiplied by each employee code and added to the employee group cumulative counts (added to the seq_start number), and should be much larger than the data model population to provide a constant length employee number (empkey) for all employees.
inplace (boolean): if True, insert the results directly into the “empkey” column of the input dataframe. Caution: make a copy first!

functions.anon_master(case, empkey=True, name=True, date=False, sample=False, seq_start=10001, frame_num=10000000, min_seg=3, max_seg=3, add_rev=False, date_col_list=['ldate', 'doh'], max_adj=5, positive_only=True, date_col_list_sec=['dob'], max_adj_sec=5, positive_only_sec=True, n=None, frac=None, reset_index=False)[source]

Specialized function to anonymize selected columns from a master.xlsx file and/or select a subset. All operations are inplace. The original master file is copied and saved as master_orig.xlsx.

The default parameters will replace last names and employee keys with substitute values. Date columns, (doh, ldate, dob) will also be adjusted if the date input is set True and the proper column names are set as column list inputs.

The function reads the original excel file, copies and saves it, modifies the original file as directed, and writes the results back to the original file. Subsequent dataset creation runs will use the modified data. The output master list will be sorted according to the original master list order.

inputs

case (string): the case study name
empkey (boolean): if True, anonymize the empkey column
name (boolean): if True, anonymize the lname column
date (boolean): if True, anonymize date columns as disignated with the date_col_list and the date_col_list_sec inputs
sample (boolean): if True, sample the dataframe if the n or frac inputs is/are not None
seq_start (integer): beginning anonymous employee number portion of empkey
frame_num (integer): large frame number which will contain all generated employee numbers. This number will be adjusted to begin with the appropriate employee group code
min_seg (integer): minimum number of 2-character segments to include in the generated substitute last names.
max_seg (integer): maximum number of 2-character segments to include in the generated substitute last names.
add_rev (boolean): if True, add reversed, non-duplicated 2-character segments to the pool of strings for name construction. This is normally not necessary and will construct output strings with multiple consecutive consonants/vowels.
date_col_list (list): list of date value columns to adjust. All columns in this list will be adjusted in a syncronized fashion, meaning a random day adjustment for each row will be applied to each row member of all columns.
max_adj (integer): maximum random adjustment deviation, in days, from the original date(s)
positive_only (boolean): if True, only adjust dates forward in time
date_col_list_sec (list): a secondary list of date column(s) which will be adjusted independently from the date columns in the date_col_list
max_adj_sec (integer): maximum random adjustment deviation, in days, from the original date(s) in the date_col_list_sec columns
positive_only_sec (boolean): if True, only adjust dates forward in time (for secondary cols)
n (integer or None): number of rows to sample if the sample input is True. This input will override the frac input
frac (float (0.0 - 1.0) or None): decimal fraction (0.0 to 1.0) of the master list to sample, if the sample input is True and the n input is None
reset_index (boolean): if True, reset the index of the output master list (zero-based integer index). Do not use this option normally because it will wipe out the empkey index of the master list.

functions.anon_names(length=10, min_seg=3, max_seg=3, add_rev=False, df=None, inplace=False)[source]

Generate a list of random strings

Output may be used to anonymize a dataset name column

The length of the output strings will be determined by the min_seg and max_seg inputs. The segments (seg) are random 2-letter combinations of a consonant and a vowel. An additional random consonant or vowel will be added to the segment combinations, so the length of the output strings will always be an odd number. The min and max may be the same value to produce a list of strings of uniform length.

Example:

If the min_seg input is 1 and the max_seg input is 3, the output list will contain strings from 3 (2-letter seg + 1 random letter) to 7 characters.

inputs

length (integer): the length of the output list
min_seg (integer): the minimum number of 2 letter segments to include in the output list
max_seg (integer): the maximum number of 2 letter segments to include in the output list (must be => “min_seg” input)
add_rev (boolean): add vowel-consonant combinations to the consonant-vowel segments. (this is not normally needed to produce random and readable strings)
df (dataframe): optional short-form pandas dataframe input. If not None, use the length of the dataframe as the “length” input value
inplace (boolean): if the “df” input is not None, insert the results directly into the input “lname” column. Caution: make a copy first!

functions.anon_pay(df, proportional=True, mult=1.0, inplace=False)[source]

Substitute pay table baseline rate information a proportional method or with a non-linear, non-proportional method.

inputs

df (dataframe): pandas dataframe containing pay rate date (dataframe representation of the “rates” worksheet from the pay_tables.xlsx workbook)
proportional (boolean): if True, use the mult input to increase or decrease all of the “rates” worksheet pay data proportionally. If False, use a fixed algorithm to disproportionally adjust the pay rates.
mult (integer or float): if the proportional input is True, multiply all pay rate values by this input value
inplace (boolean): if True, replace the values within the original dataframe with the “anonomized” values.

functions.anon_pay_table(case, proportional=True, mult=1.0)[source]

Anonymize the “rates” worksheet of the “pay_tables.xlsx” input file. The rates may be proportionally adjusted (larger or smaller) or disproportionally adjusted with a fixed algorithm.

A copy of the original excel file is copied and saved as “pay_tables_orig.xlsx”.

All modifications are inplace.

inputs

case (string): the case name
proportional (boolean): if True, use the mult input to increase or decrease all of the “rates” worksheet pay data proportionally. If False, use a fixed algorithm to disproportionally adjust the pay rates.
mult (integer or float): if the proportional input is True, multiply all pay rate values by this input value

functions.assign_cond_ratio(job, this_job_count, ratio_dict, orig_range, assign_range, eg_range, fur_range, cap=None)[source]

Apply a job ratio assignment condition

The main job assignment function calls this function at the appropriate months.

This function applies a ratio for job assignment between ratio groups. Each ratio group may contain one or more employee groups. The number of jobs affected may be limited with the “cap” input.

The ratio for job assignment is set with the inputs on the “ratio_cond” worksheet of the settings.xlsx input spreadsheet, using the “group” columns and the corresponding “weights” columns.

Optionally, the ratio of jobs which exists during the “month_start” spreadsheet input may be captured and used for job assignment during the data model months when the ratio job assignment condition is applicable (“month_start” through “month_end”). The existing ratios are captured and used by setting the “snapshot” input cell to True for the appropriate basic job row. When using the snapshot option, any weightings designated within the “weights” columns will be ignored.

There may be a mix of snapshot ratios and ratios set by the “weight” columns for use within the program. There may also be count-capped ratio assignments and straight ratio assignments within the same data model as long as the effective months and jobs do not overlap, but there may only be one row of ratio data for a job level within the same input worksheet.

No bump, no flush rules apply when assigning jobs by ratio, meaning only job openings due to retirements, increases in job counts, or other openings will be assigned according to the ratio schedule. Employees previously holding a job affected by the ratio condition will not be displaced to allow an employee from a different ratio group to have that job when the ratio assignment period begins. Therefore, it may take some time for the desired ratio of job assignments to be achieved if it differs significantly from the actual ratio(s) when the time period of conditional job assignment begins.

inputs

job (integer or float): job level number
this_job_count (integer or float): number of jobs available
ratio_dict (dictionary): ratio condition dictionary, constructed with the build_program_files script and possibly modified by the set_snapshot_weights function if the “snapshot” option is set to True on the “ratio_cond” worksheet of the “settings.xlsx” input spreadsheet.
orig_range (1d array): original job range Month slice of the orig_job column array (normally pertaining a specific month).
assign_range (1d array): job assignment range Month slice of the assign_range column array
eg_range (1d array): employee group range Month slice of the eg_range column array
fur_range (1d array): furlough range Month slice of the fur_range column array
cap (integer (or whole float)): if a count ratio job assignment is being used, this number represents the number of jobs affected by the conditional assignment. Available jobs above this amount are not affected.

functions.assign_job_counts(job_count_range, assign_range, job, this_job_count)[source]

assign job counts to job count array month slice

inputs

job_count_range (array): month slice of long job count array
assign_range (array): month slice of long job assignment array
job (integer): job level number
this_job_count (integer): job count alloted for job level

functions.assign_jobs_full_flush_job_changes(nonret_counts, job_counts, num_job_levels)[source]

(Long_Form)

Using the nonret counts for each month:

determine the long form slice for assignment, and
slice the jobs list from the top to create job assignment column
create a corresponding furlough column
create a job count column

Uses the job_counts (job_gain_loss_table function)[0] to build stovepiped job lists allowing for job count changes each month and a furlough status column.

Unassigned employees (not enough jobs), are left at job number zero This is the full bump and full flush version

inputs

nonret_counts (numpy array): array containing the number of non-retired employees for each month
job_counts (numpy array): array containing the monthly counts of jobs for each job level
num_job_levels (integer): the number of job levels in the model (excluding furlough level)

functions.assign_jobs_nbnf_job_changes(df, lower, upper, total_months, job_reduction_months, start_month, condition_list, sdict, tdict, fur_return=False)[source]

(Long_Form)

Uses the job_gain_or_loss_table job count array for job assignments. Jobs counts may change up or down in any category for any time period. Handles furlough and return of employees, prior rights/conditions and restrictions and recall of initially furloughed employees.

Inputs are precalculated outside of function to the extent possible. Returns tuple (long_assign_column, long_count_column, orig jobs, fur_data)

inputs

df (dataframe): long-form dataframe with [‘eg’, ‘sg’, ‘fur’, ‘orig_job’] columns.
lower (array): ndarry from make_lower_slice_limits function (calculation derived from cumsum of count_per_month function)
upper (array): cumsum of count_per_month function
total_months (integer or float): sum of count_per_month function output
job_reduction_months (list): months in which the number of jobs is decreased from the get_job_reduction_months function
start_month (integer): integer representing the month number to begin calculations, likely month of integration when there exists a delayed integration (from settings dictionary)
condition_list (list): list of special job assignment conditions to apply, example: [‘prex’, ‘count’, ‘ratio’]
sdict (dictionary): the program settings dictionary (produced by the build_program_files script)
tdict (dictionary): job tables dictionary (produced by the build_program_files script)
fur_return (boolean): model employee recall from furlough if True using recall schedule from settings dictionary (allows call to mark_for_recall function)

Assigns jobs so that original standalone jobs are assigned each month (if available) unless a better job is available through attrition of employees.

Each month loop starts with the lowest job number.

For each month and for each job level:

assigns nbnf (orig) job if job array (long_assign_column) element is zero (unassigned) and orig job number is less than or equal to the job level in current loop, then
assigns job level in current loop to unassigned slots from top to bottom in the job array (up to the count of that job level remaining after step one above)

Each month range is determined by slicing using the lower and upper inputs. A comparison is made each month between the original job numbers and the current job loop number.

Job assignments are placed into the monthly segment (assign_range) of the long_assign_column.

The long_assign_column eventually becomes the job number (jnum) column in the dataset.

Original job numbers of 0 indicate no original job and are treated as furloughed employees. No jobs are assigned to furloughees unless furlough_return option is selected.

functions.assign_standalone_job_changes(eg, df_align, lower, upper, total_months, job_counts_each_month, total_monthly_job_count, nonret_each_month, job_change_months, job_reduction_months, start_month, sdict, tdict, apply_sg_cond=True)[source]

(Long_Form)

Uses the job_gain_or_loss_table job count array for job assignments. Jobs counts may change up or down in any category for any time period. Handles furlough and return of employees, prior rights/conditions and restrictions and recall of initially furloughed employees.

Inputs are precalculated outside of function to the extent possible. Returns tuple (long_assign_column, long_count_column, held_jobs, fur_data, orig_jobs)

inputs

eg (integer): input from an incremental loop which is used to select the proper employee group recall scedule
df_align (dataframe): dataframe with [‘sg’, ‘fur’] columns
num_of_job_levels (integer): number of job levels in the data model (excluding a furlough level)
lower (1d array): ndarry from make_lower_slice_limits function (calculation derived from cumsum of count_per_month function)
upper (1d array): cumsum of count_per_month function
total_months (integer or float): sum of count_per_month function output
job_counts_each_month (array): output of job_gain_loss_table function[0] (precalculated monthly count of jobs in each job category, size (months,jobs))
total_monthly_job_count (array): output of job_gain_loss_table function[1] (precalculated monthly total count of all job categories, size (months))
nonret_each_month (1d array): output of count_per_month function
job_change_months (list): the min start month and max ending month found within the array of job_counts_each_month inputs (find the range of months to apply consideration for any job changes - prevents unnecessary looping)
job_reduction_months (list): months in which the number of jobs is decreased (list). from the get_job_reduction_months function
start_month (integer): starting month for calculations, likely implementation month from settings dictionary
sdict (dictionary): the program settings dictionary (produced by the build_program_files script)
tdict (dictionary): job tables dictionary (produced by the build_program_files script)
apply_sg_cond (boolean): compute with pre-existing special job quotas for certain employees marked with a one in the sg column (special group) according to a schedule defined in the settings dictionary

Assigns jobs so that original standalone jobs are assigned each month (if available) unless a better job is available through attrition of employees.

Each month loop starts with the lowest job number.

For each month and for each job level:: 1. assigns nbnf (orig) job if job array (long_assign_column) element is zero (unassigned) and orig job number is less than or equal to the job level in current loop, then 2. assigns job level in current loop to unassigned slots from top to bottom in the job array (up to the count of that job level remaining after step one above)

Each month range is determined by slicing using the lower and upper inputs.

A comparison is made each month between the original job numbers and the current job loop number.

Job assignments are placed into the monthly segment (assign_range) of the long_assign_column.

The long_assign_column eventually becomes the job number (jnum) column in the dataset.

Original job numbers of 0 indicate no original job and are treated as furloughed employees - no jobs are assigned to furloughees unless furlough_return option is selected.

functions.career_months(ret_input, start_date)[source]

(Short_Form)

Determine how many months each employee will work including retirement partial month

“ret_input” (retirement dates) may be in the form of a pandas dataframe, pandas series, array, list, or string

Output is a numpy array of integers containing the number of months between the start_date and each date in the ret_input (months from start date to retirement for each employee)

inputs

ret_input (dataframe, series, array, list, or string): retirement dates input
start_date (string date): comparative date for the retirement dates input, normally the data model starting date

functions.clear_dill_files()[source]: remove all files from ‘dill’ folder. used when changing case study, avoids possibility of file from previos calculations being used by new study

functions.clip_ret_ages(ret_age_dict, init_ret_age, dates_long_arr, ages_long_arr)[source]

Clip employee ages in employee final month to proper retirement age if the model includes an increasing retirement age over time

inputs

ret_age_dict (dictionary): dictionary of retirement increase date to new retirement age as defined in settings dictionary
init_ret_age: initial retirement age prior to any increase
dates_long_arr (numpy array): array of month dates (long form, same value during each month)
ages_long_arr (numpy array): array of employee ages (long form)

functions.contract_year_and_raise(df, settings_dict)[source]

(Month_Form)

Generate the contract pay year for indexing into the pay table. Pay year is clipped to last year of contract.

Also create an annual assumed raise column applicable to the time period beyond the contract duration. This is a multiplier column with a compounded value each subsequent year. If no raise is elected (via the settings.xlsx input file, “scalars” worksheet), then this column will be all ones. The annual raise percentage is designated on the same worksheet. The input df must be a single column dataframe containing end-of-month dates, one for each month of the data model.

NOTE: (this function can accept any number of pay exception periods through the pay_exceptions dictionary, populated by the “pay_exceptions” worksheet values within the settings.xlsx input file, see the program documentation for more information)

inputs

df (dataframe): a single column dataframe containing end-of-month dates, one for each month of the data model
settings_dict (dictionary): dictionary of program settings generated by the build_program_files script

functions.convert_to_datetime(date_data, attribute)[source]

Convert a dataframe column, series, list, or string input into an array of datetimes

inputs

data_data (dataframe, series, array, list, or string): pandas dataframe with a date column containing string dates or datetime objects, pandas series of dates (strings or datetime objects), a list/array of date strings or datetime objects, or a single comma-separated string containing date information.
attribute (string): if the date_data type is a dataframe, the name of the column containing the date information. Otherwise, this input is ignored.

functions.convert_to_enhanced(eg_job_counts, j_changes, job_dict)[source]

Convert employee basic job counts to enhanced job counts (includes full-time and part-time job level counts) and convert basic job change schedules to enhanced job change schedules. Returns tuple (enhanced_job_counts, enhanced_job_changes)

inputs

eg_job_counts

A list of lists of the basic level job counts for each employee group. Each nested list has a length equal to the number of basic job levels.

example:

[[197, 470, 1056, 412, 628, 1121, 0, 0],
[80, 85, 443, 163, 96, 464, 54, 66],
[0, 26, 319, 0, 37, 304, 0, 0]]

j_changes

input from the settings dictionary describing change of job quantity over months of time (list)

example:

[1, [35, 64], 87, [80, 7, 0]]

[[job level, [start and end month], total job count change, [eg allotment of change for standalone calculations]]

job_dict

conversion dictionary for an enhanced model. This is the “jd” key value from the settings dictionary. It uses the basic job levels as the keys, and lists as values which containin the new full- and part-time job level numbers and the percentage of basic job counts to be converted to full-time jobs.

example:

{1: [1, 2, 0.6],
[3, 5, 0.625],
[4, 6, 0.65],
[7, 8, 0.6],
[9, 12, 0.625],
[10, 13, 0.65],
[11, 14, 0.65],
[15, 16, 0.65]}

functions.convert_to_hex(rgba_input)[source]

convert float rgba color values to string hex color values rgba = color values expressed as:

red, green, blue, and (optionally) alpha float values rgba_input may be:

a single rgba list or tuple

a list or tuple containing rgba lists or rgba tuples

a dictionary of key: rgba values

output is string hex color values in place of rgba values

Examples:

input single rgba value:

sample_value = (.5, .3, .2)
convert_to_hex(sample_value)
'#7f4c33'

input list:

sample_list = [[0.65, 0.81, 0.89, 1.0],
               [0.31, 0.59, 0.77, 1.0],
               [0.19, 0.39, 0.70, 1.0],
               [0.66, 0.85, 0.55, 1.0]]

convert_to_hex(sample_list)
['#a5cee2', '#4f96c4', '#3063b2', '#a8d88c']

input dict:

sample_dict = {1: (.65, .45, .45, 1.),
               2: [.60, .45, .45, 1.],
               3: (.55, .45, .45, 1.)}

convert_to_hex(sample_dict)
{1: '#a57272', 2: '#99593a', 3: '#8c7249'}

inputs

rgba_input (tuple, list, or dictionary): input may be a single list or tuple OR a list of float rgba values as lists or tuples OR a dictionary with values as lists or tuples. Valid string hex values may also be passed as inputs and will be returned unchanged.

functions.copy_excel_file(case, file, return_path_and_df=False, revert=False, verbose=True)[source]

Copy an excel file and add ‘_orig’ to the file name, or restore an excel file from the ‘_orig’ copy.

inputs

case (string): the data model case name
file (string): the excel file name without the .xlsx extension
return_path_and_df (boolean): if True, return a tuple containing the file path as a string and the worksheet designated by the “file” input as a dataframe within a dictionary
revert (boolean): if False, copy the excel file and add ‘_orig’ to the file name. if True, restore the copied file and drop the ‘_orig’ suffix
verbose (boolean): if True, print a brief summary of the operation result

functions.count_avail_jobs(assign_range, job, this_job_count)[source]

use numba to loop through the job assignment range and count the number of jobs in a specified job level previously assigned from the previous month, then subtract result from the total job level positions count. This result identifies the number of openings available for the current month.

inputs

assign_range (array): monthly slice of job assignment array
job (integer): job level being tested
this_job_count (integer): total job positions count for the job being tested

functions.count_per_month(career_months_array)[source]

Month_Form

Returns number of employees remaining for each month (not retired). Cumulative sum of career_months_array input (np array) that are greater or equal to each incremental loop month number.

Note: alternate method to this function is value count of mnums: df_actives_each_month = pd.DataFrame(df_idx.mnum.value_counts()) df_actives_each_month.columns = [‘count’]

input

career_months_array: output of career_months function. This input is an array containing the number of months each employee will work until retirement.

functions.create_snum_and_spcnt_arrays(jnums, job_level_count, monthly_population_counts, monthly_job_counts, lspcnt_remaining_only)[source]

Calculates: long_form seniority number (‘snum’, only active employees), seniority percentage (‘spcnt’, only active employees), list number (‘lnum’, includes furlougees), list percentage (‘lspcnt’, includes furloughees).

Iterate through monthly jobs count data, capturing monthly_job_counts to be used as the denominator for percentage calculations.

This function produces four ndarrays which will make up four columns in the long_form pandas dataset.

Returns tuple (long_snum_array, long_spcnt_array, long_list_array, long_lspcnt_array)

inputs

jnums: the long_form jnums result
job_level_count: number of job levels in model
monthly_population_counts: count_per_month function output
monthly_job_counts: total of all jobs each month derived from job_gain_loss_table function (table) >>> np.add.reduce(table, axis=1)
lspcnt_remaining_only: calculate list percentage based on employees remaining in each month including furloughees, otherwise percentage calculation denominator is the greater of employees remaining (incl fur) or jobs available

functions.cross_val(col1, value, col2)[source]

functions.distribute(available, weights, cap=None)[source]

proportionally distribute ‘available’ according to ‘weights’

usage example:

distribute(334, [2.48, 1])

returns distribution as a list, rounded as integers:

[238, 96]

inputs

available (integer): the count (number) to divide
weights (list): relative weighting to be applied to available count for each section. numbers may be of any size, integers or floats. the number of resultant sections is the same as the number of weights in the list.
cap (integer): limit distribution total to this amount, if less than the “available” input.

functions.eg_quotas(quota, actual, cap=None, this_job_count=None)[source]

determine the job counts to be assigned to each ratio group during a ratio condition job assignment routine

inputs

quota (list or list-like): the desired job counts for each employee group
actual (list or list-like): the actual job counts for each employee group
cap (integer (or whole float)): if a count ratio routine is being used, this is the total count of jobs to be affected by the ratio
this_job_count (integer (or whole float)): the monthly count of the applicable job

functions.find_index_val(df1, df2, df2_vals, col1=None, col2=None)[source]

find a value in another dataframe with the same index of another given value in a dataframe.

df1 index, df2 index, and the value columns must contain unique values.

inputs

df1 (dataframe): the first dataframe containing values to index match in another dataframe
df2 (dataframe): the second dataframe with corresponding index values
df2_vals (list): values to match

functions.find_nearest(col_arr, value)[source]

functions.gen_month_skeleton(month_count_array)[source]

Long_Form

Create an array of month numbers with each month number repeating n times for n non-retired employees in each month. i.e. the first month section of the array will be all zeros (month: 0) repeating for the number of non-retired employees. The next section of the array will be all ones (month: 1) repeating for the number of employees remaining in month 1. Output is a 1d ndarray.

This funtion creates the first column and the basic form of the skeleton dataframe which is the basis for the dataset dataframes.

inputs

month_count_array: a numpy array containing the number of employees remaining or not retired for each month. This input is the result of the count_per_month function.

functions.gen_skel_emp_idx(monthly_count_array, career_mths_array, empkey_source_array)[source]

Long_Form

For each employee who remains for each month, grab that employee index number.

This index will be the key to merging in other data using data alignment. Input is the result of the count_per_month function (np.array) and the result of the career_months function

inputs

monthly_count_array (numpy array): count of non-retired active employees for each month in the model, the ouput from the count_per_month function.
career_mths_array (numpy array): career length in months for each employee, output of career_months functions.
empkey_source_array (numpy array): empkey column data as array

Returns tuple (skel_idx_array, skel_empkey_array)

functions.get_indexes(in_arr)[source]

functions.get_job_change_months(job_changes)[source]

extract a sorted list of only the unique months containing a change in any job count as defined within the settings dictionary job change schedules

input

job_changes: list of job change schedule lists, normally equal to the j_changes variable from the settings dictionary

functions.get_job_reduction_months(job_changes)[source]

extract a sorted list of only the unique months containing a reduction in any job count as defined within the settings dictionary job change schedules

input

job_changes: list of job change schedule lists, normally equal to the j_changes variable from the settings dictionary

functions.get_month_slice(df, l, h)[source]

Convenience function to extract data for a particular month. Input is low and high indexes of target month data (within dataset containing many months)

The input may also be an array (not limited to a dataframe).

inputs

df: dataframe (or array) to be sliced
l: lower index of slice
h: upper index of slice

functions.get_recall_months(list_of_recall_schedules)[source]

extract a sorted list of only the unique months containing a recall as defined within the settings dictionary recall schedules.

input

list_of_recall_schedules: list of recall schedule lists, normally equal to the recalls variable from the settings dictionary

functions.hex_dict()[source]: returns a color name to hex code dictionary (no inputs)

functions.index_duplicates(df)[source]

return a pandas dataframe containing all rows which share the same index value. If the dataframe has no duplicated indexes, and empty dataframe is returned.

inputs: df (dataframe with at least one column)

functions.index_dups_exist(df)[source]

determine if any index values are duplicated within a dataframe and return True or False

inputs: df (dataframe with at least one column)

functions.job_gain_loss_table(months, job_levels, init_job_counts, job_changes, standalone=False)[source]

Make two arrays of job tally information.

The first array has a row for each month in the model, and a column for each job level (excluding furlough). This array provides a count for each job for each month of the model accounting for changes provided by the job change schedules defined by the settings dictionary. The second array is a one-dimensional array containing the sum for all jobs for each month of the model.

inputs

months (integer): number of months in model
job_levels (integer): number of job levels in model (excluding furlough level)
init_job_counts (tuple of two numpy arrays): initial job counts. Output from the make_jcnts function, essentially an array of the job count lists for each employee group and an array of the combined counts.
job_changes (list): The list of job changes from the settings dictionary.
standalone (boolean): if True, use the job count lists for the separate employee groups, otherwise use the combined job count

Returns tuple (job_table, monthly_job_totals)

functions.load_datasets(other_datasets=['standalone', 'skeleton', 'edit', 'hybrid'])[source]

Create a dictionary of proposal names to corresponding datasets. The datasets are generated with the RUN_SCRIPTS notebook. This routine reads the names of the case study proposals from a pickled dataframe (‘dill/proposal_names.pkl’), created by the build_program_files.py script. It then looks for the matching stored datasets within the dill folder. The datasets are loaded into a dictionary, using the proposal names as keys.

The dictionary allows easy reference to datasets from the Jupyter notebook and from within functions.

input

other_datasets (list): list of datasets to load in addition to those computed from the proposals (from the case-specific proposals.xlsx Excel file)

functions.longevity_at_startdate(ldate_input, start_date, return_as_months=False)[source]

(Short_Form)

determine how much longevity (years) each employee has accrued as of the start date

float output is longevity in years (+1 added to reflect current 1-based pay year)

inputs

ldate_input (dataframe, series, list, or string): list of longevity dates in datetime format
start_date (string date): comparative date for longevity dates, normally the data model starting date
return_as_months (boolean): option to return result as month value instead of year value

functions.make_cat_order(ds, table)[source]

make a long-form “cat_order” (global job ranking) column This function assigns a global job position value to each employee, considering the modeled job level hierarchy and the job count within each level. For example, if a case study contains 3 job levels with 100 jobs in each level, an employee holding a job in the middle of job level 2 would be assigned a cat_order value of 150.

Category order for standalone employee groups is “normalized” to an integrated scale by applying standalone job level percentage (relative position within a job level) to the integrated job level counts. This process allows “apples to apples” comparison between standalone and integrated job progression.

Standalone cat_order will only reflect job levels available within the standalone scenario. If the integrated model contains job levels which do not exist within a standalone employee group model, standalone cat_order results will exclude the respective job level rank segments and will rank the existing standalone data according to the integrated ranking scale.

The routine creates numpy array lookup tables from integrated job level count data for each month of the model. The tables are the source for count and additive information which is used to calculate a rank number within job level and cumulative job count additives.

Month number and job number arrays (from the input ds (dataset)) are used to index into the numpy lookup arrays, producing the count and additive arrays.

A simple formula is then applied to the percentage, count, and additive arrays to produce the cat_order array.

inputs

ds (dataframe): a dataset containing [‘jobp’, ‘mnum’, ‘jnum’] columns
table (numpy array): the first output from the job_gain_loss_table function which is a numpy array with total job counts for each job level for each month of the data model

functions.make_decile_bands(num_bands=40, num_returned_bands=10)[source]

creates an array of lower and upper percentile values surrounding a consistent schedule of percentile markers. If the user desires to sample data at every 10th percentile, this function provides selectiable bottom and top percentile limits surrounding each 10th percentile, or variable width sample ranges.

num_bands input must be multiple of 5 greater than or equal to 10 and less than 10000.

num_returned_bands input must be multiple of 5, equal to or less than the num_bands input, and num_bands/num_returned_bands must have no remainder.

Used for selecting sample employees surrounding deciles (0, 10, 20 etc. percent levels).

Top and bottom bands will be half of normal size.

inputs

num_bands: Width of bands in percentage is determined by num_bands input. Input of 40 would mean bands 2.5% wide. (100/40) Top and bottom bands would be 1.25% wide. Ex. 0-1.25%, 8.75-11.25%, … 98.75-100%
num_returned_bands: number of returned delineated sections. Must be a multiple of 5 less than or equal to the num_bands value with no remainder when divided into the num_bands value. (note: an input of 10 would result in 11 actual segments, one-half band at the top and bottom of list (0% and 100%), and 9 full bands surrounding each decile, 10% to 90%)

functions.make_delayed_job_counts(imp_month, delayed_jnums, lower, upper)[source]

Make an array of job counts to be inserted into the long_form job counts array of the job assignment function. The main assignment function calls this function prior to the implementation month. The array output of this function is inserted into what will become the job count column. These jobs are from the standalone job results. The job count column displays a total monthly count of the job in the corresponding jnum (job number) column.

inputs

imp_month (integer): implementation month, defined by settings dictionary
delayed_jnums (numpy array): array of job numbers, normally data from the start of the model through the implementation month
lower (numpy array): array of indexes marking the beginning of data for each month within a larger array of stacked, multi-month data
upper (numpy array): array of indexes marking the end of data for each month

functions.make_dict_from_columns(df, key_col, value_col)[source]

Make a dictionary from two dataframe columns. One column will be the keys and the other the values.

Unique key column values will be assigned dictionary values. If the key_col input contains duplicates, only the last duplicate key-value pair will exist within the returned dictionary.

inputs

df (dataframe): pandas dataframe containing the columns
key_col (string (or possibly integer)): dataframe column which will become dictionary keys
value_col (string (or possibly integer)): dataframe column which will become dictionary values

functions.make_eg_pcnt_column(df, recalc_each_month=False, mnum=0, inplace=True, trim_ones=True, fixed_col_name='eg_start_pcnt', running_col_name='eg_pcnt')[source]

make an array derived from the input df reflecting one of the following options:

Option A:: The percentage of each employee within his/her original employee group for a selected month. The array values will be data-aligned with the df input index. This option is useful for tracking percentile cohorts throughout the model.
Option B:: The percentage of each employee within his/her original employee group recalculated each month. This has the effect of adjusting each group relative percentage for population changes due to retirements, furlough, etc. This option is useful for monthly percentile cohort comparisons.

This function either adds a column to the input dataframe or returns an array of values, the same length as the input dataframe.

Note: This function calculations include any furloughed employees assign to long-form dataframe (with default month 0 values aligned):

make_eg_pcnt_column(df)

inputs

df (dataframe)

pandas dataframe containing an employee group code column (‘eg’) and a month number column (‘mnum’). The dataframe must be indexed with employee number code integers (‘empkey’)

recalc_each_month (boolean)

if True:: recalculate separate employee group percentage each month of data model
if False:: calculate values for one month only - align those values by employee number (empkey) to the entire data model

mnum (integer)

if recalc_each_month is True, calculate values for this selected month number

inplace (boolean)

if True, add a column to the input dataframe with the calculated values. If False, return a numpy array of the calculated values.

trim_ones (boolean)

if True, replace 100% values (1.0) with a value slightly under 1.0 (.9999). This action assists construction of percentile quantiles for membership grouping purposes.

exclude_fur (boolean)

if True, remove furloughed employees from percentage calculations

fixed_col_name (string)

manually designated name for dataframe column when recalc_each_month input is False and inplace input is True.

running_col_name (string)

manually designated name for dataframe column when recalc_each_month input is True and inplace input is True.

functions.make_group_lists(df, column_name)[source]

this function is used with Excel input to convert string objects and integers into Python lists containing integers. This function is used with the count_ratio_dict dictionary construction. The function works with one column at a time.

Output is a list of lists which may be reinserted into a column of the dataframe.

example:

A

B

C

D

1

6

0

“2,3”

8

4

5

“5”

make_group_lists(df, ["D"])

[[2, 3], [5]]

This function allows the user to type the string 2,3 into an Excel worksheet cell and have it interpreted by seniority_list as [2, 3]

inputs

df (dataframe): dataframe containing Excel employee group codes
column_name: dataframe column name to convert

functions.make_intgrtd_from_sep_stove_lists(job_lists_arr, eg_arr, fur_arr, eg_total_jobs, num_levels, skip_fur=True)[source]

Month_Form

Compute an integrated job list built from multiple independent eg stovepiped job lists.

This function is for multiple egs (employee groups) - multiple lists in one job_lists_arr.

Creates an ndarray of job numbers.

Function takes independent job number lists and an array of eg codes which represent the eg ordering in the proposed list.

Job numbers from the separate lists are added to the result array according to the eg_arr order. Jobs on each list do not have to be in any sort of order. The routine simply adds items from the list(s) to the result array slots in list order.

inputs

job_lists_arr: array of the input job number arrays. represents the jobs that would be assigned to each employee in a list form. each list within the array will be the length of the respective eg.
eg_arr: short_form array of eg codes (proposal eg ordering)
fur_arr: short_form array of fur codes from proposal
eg_total_jobs: list length n egs sums of total jobs available for each eg, form: [n,n,n]
num_levels: number of job levels in model (excluding furlough level)
skip_fur (boolean): ignore or skip furloughs when assigning stovepipe jobs. If True, employees who are originally marked as furloughed are assigned the furlough level number which is 1 greater than the number of job levels. If False, jobs are assigned within each employee group in a stovepipe fashion, including those employees who are marked as furloughed

functions.make_jcnts(job_count_lists)[source]

Make two arrays: 1. array of n lists of job counts for n number of eg job count input lists 2. array of one summation list of first array (total count of all eg jobs) The arrays above will not contain a furlough count. Returns tuple (eg_job_counts, combined_job_count)

inputs

job_count_lists: list of the employee job count list(s). If the program is using the enhanced jobs option, this input will be the output of the convert_jcnts_to_enhanced function. Otherwise, it will be the eg_counts variable from the settings dictionary.

Example return:

(array([
[ 237,  158,  587, 1373,  352,  739,  495,  330,  784,
 1457,    0,  471,  785,    0,    0,    0],
[  97,   64,  106,  575,   64,  310,  196,  130,  120,
 603,    71,   72,  325,   38,   86,   46],
[  0,     0,   33,  414,   20,  223,    0,    0,   46,
 395,     0,   28,  213,    0,    0,    0]]),
array(
[ 334,  222,  726, 2362,  436, 1272,  691,  460,  950,
 2455,   71,  571, 1323,   38,   86,   46]))

functions.make_lists_from_columns(df, columns, remove_zero_values=False, try_integers=False, as_tuples=False)[source]

combine columns row-wise into separate lists, return a list of lists

example:

A	B	C	D
1	6	0	2
8	4	5	3

make_lists_from_columns(df, ["A", "B", "C"])

[[1, 6, 0], [8, 4, 5]]

make_lists_from_columns(df, ["A", "B", "C"],
                        remove_zero_values=True,
                        as_tuples=True)

[(1, 6), (8, 4, 5)]

inputs

df (dataframe): pandas dataframe containing columns to combine
columns (list): list of column names
try_integers (boolean): if True, if all column values are numerical, the output will be converted to integers
remove_zero_values (boolean): if True, remove zero values from list or tuple outputs. The routine checks for zeros as a zero value or a list with a single zero value
as_tuples (boolean): if True, output will be a list of tuples instead of a list of lists

functions.make_lower_slice_limits(month_counts_cumsum)[source]

for use when working with unique month data within larger array (slice).

The top of slice is cumulative sum, bottom of each slice will be each value of this function output array. Output is used as input for nbnf functions.

inputs

month_counts_cumsum (numpy array): cumsum of count_per_month function output (employee count each month)

functions.make_original_jobs_from_counts(jobs_arr_arr, eg_array, fur_array, num_levels)[source]

Short_Form

This function grabs jobs from standalone job count arrays (normally stovepiped) for each employee group and inserts those jobs into a proposed integrated list, or a standalone list. Each eg (employee group) is assigned jobs from their standalone list in order top to bottom.

Result is a combined list of jobs with each eg maintaining ordered independent stovepipe jobs within the combined list of jobs jobs_arr_arr is an array of arrays, likely output[0] from make_array_of_job_lists function.

Order of job count arrays within jobs_arr_arr input must match emp group codes order (1, 2, 3, etc.).

If total group counts of job(s) is less than slots available to that group, remaining slots will be assigned (remain) a zero job number (0).

eg_array is list (order sequence) of employee group codes from proposed list with length equal to length of proposed list.

Result of this function is ultimately merged into long form for no bump no flush routine.

employees who are originally marked as furloughed are assigned the furlough level number which is 1 greater than the number of job levels.

inputs

jobs_arr_arr (numpy array of arrays): lists of job counts for each job level within each employee group, each list in order starting with job level one.
eg_array (numpy array): employee group (eg) column data from master list source
fur_array: furlough (fur) column data from master list source
num_levels: number of job levels (without furlough level) in the model

functions.make_preimp_array(ds_stand, ds_integrated, imp_high, compute_cat, compute_pay)[source]

Create an ordered numpy array of pre-implementation data gathered from the pre-calculated standalone dataset and a dictionary to keep track of the information. This data will be joined with post_implementation integrated data and then copied into the appropriate columns of the final integrated dataset.

inputs

ds_stand (dataframe): standalone dataset
ds_integrated (dataframe): dataset ordered for proposal
imp_high: highest index (row number) from implementation month data (from long-form dataset)
compute_cat (boolean): if True, compute and append a job category order column
compute_pay (boolean): if True, compute and append a monthly pay column and a career pay column

functions.make_starting_val_column(df, attr, inplace=True)[source]

make an array of values derived from the input dataframe which will reflect the starting value (month zero) of a selected attribute. Each employee will be assigned the zero-month attribute value specific to that employee, duplicated in each month of the data model.

This column allows future attribute analysis with a constant starting point for all employees. For example, retirement job position may be compared to initial list percentage.

assign to long-form dataframe:

df['start_attr'] = make_starting_val_column(df, attr)

input

df (dataframe): pandas dataframe containing the attr input column and a month number coulumn. The dataframe must be indexed with employee number code integers (‘empkey’)
attr (column name in df): selected zero-month attribute (column) from which to assign values to the remaining data model months

functions.make_stovepipe_jobs_from_jobs_arr(jobs_arr, total_emp_count=0)[source]

Month_Form

Compute a stovepipe job list derived from the total count of jobs in each job level.

This function is for one eg (employee group) and one jobs_arr (list).

Creates an array of job numbers from a job count list (converted to np.array).

Result is an array with each job number repeated n times for n job count. - job count list like : job_counts = [334, 222, 701, 2364] - jobs_array = np.array(job_counts)

inputs

jobs_arr (numpy array): job counts starting with job level 1
total_emp_count: if zero (normal input), sum of jobs_arr elements, otherwise user-defined size of result_jobs_arr

functions.make_stovepipe_prex_shortform(job_list, sg_codes, sg_rights, fur_codes)[source]

Short_Form

Creates a ‘stovepipe’ job assignment within a single eg including a special job assignment condition for a subgroup. The subgroup is identified with a 1 in the sg_codes array input, originating with the sg column in the master list.

This function applies a pre-existing (prior to the merger) contractual job condition, which is likely the result of a previous seniority integration.

The subset group will have proirity assignment for the first n jobs in the affected job category, the remainding jobs are assigned in seniority order.

The subgroup jobs are assigned in subgroup stovepipe order. This function is applicable to a condition with known job counts. The result of this function is used with standalone calculations or combined with other eg lists to form an integrated original job assignment list.

inputs

job_list: list of job counts for eg, like [23,34,0,54,…]
sg_codes: ndarray eg group members entitled to job condition (marked with 1, others marked 0) length of this eg population
sg_rights: list of lists (from settings dictionary) including job numbers and job counts for condition. Columns 2 and 3 are extracted for use.
fur_codes: array of ones and zeros, one indicates furlough status

functions.make_tuples_from_columns(df, col_list, return_as_list=True, date_cols=[], return_dates_as_strings=False, date_format='%Y-%m-%d')[source]

Combine row values from selected columns to form tuples. Returns a list of tuples which may be assigned to a new column. The length of the list is equal to the length of the input dataframe. Date columns may be first converted to strings before adding to output tuples if desired.

inputs

df (dataframe): input dataframe
col_list (list): columns from which to create tuples
return_as_list (boolean): if True, return a list of tuples
date_cols (list): list of columns to treat as dates
return_dates_as_strings (boolean): if True, for columns within the data_cols input, convert date values to string format
date_format (string): string format of converted date columns

functions.mark_for_furlough(orig_range, fur_range, month, jobs_avail, num_of_job_levels)[source]

Assign fur code to employees when count of jobs is less than count of active employees in inverse seniority order and assign furloughed job level number. note: normally only called during a job change month though it will do no harm if called in other months

inputs

orig_range: current month slice of jobs held
fur_range: current month slice of fur data
month: current month (loop) number
jobs_avail: total number of jobs for each month array, job_gain_loss_table function output [1]
num_of_job_levels: from settings dictionary, used to mark fur job level as num_of_job_levels + 1

functions.mark_for_recall(orig_range, num_of_job_levels, fur_range, month, recall_sched, jobs_avail, standalone=True, eg_index=0, method='sen_order', stride=2)[source]

change fur code to non-fur code for returning employees according to selected method (seniority order, every nth furloughee, or random) note: function assumes it is only being called during a recall month

inputs

orig_range

original job range

num_of_job_levels

number of job levels in model, normally from settings dictionary

fur_range

current month slice of fur data

month

current month (loop) number

recall sched

list(s) of recall schedule (recall amount/month, recall start month, recall end month)

jobs_avail

total number of jobs for each month array, job_gain_loss_table function output [1]

standalone (boolean)

This function may be used with both standalone and integrated dataset generation. Set this variable to True for use within standalone dataset calculation, False for integrated dataset calculation routine.

eg_index (integer)

selects the proper recall schedule for standalone dataset generation, normally from a loop increment. The recall schedule is defined in the settings dictionary. set to zero for an integrated routine (integrated routine uses a global recall schedule)

method

means of selecting employees to be recalled default is by seniority order, most senior recalled first other options are:

stride:: i.e. every other nth employee. (note: could be multiple strides per month if multiple recall lists are designated).
random:: use shuffled list of furloughees

stride

set stride if stride option for recall selected. default is 2.

functions.mark_fur_range(assign_range, fur_range, job_levels)[source]

apply fur code to current month fur_range based on job assignment status

inputs

assign_range: current month assignment range (array of job numbers, 0 indicates no job)
fur_range: current month fur status (1 means furloughed, 0 means not furloughed)
job_levels: number of job levels in model (from settings dictionary)

functions.max_of_nested_lists(nested_list, return_min=False)[source]

Find the maximum value within a list of lists (or tuples or arrays). The function may optionally return the minimum value within nested containers.

inputs

nested_list (list, tuple, or array): nested container input
return_min (boolean): if True, return minimum of nested_list input (vs. max)

functions.monotonic(sequence)[source]

test for stricly increasing array-like input May be used to determine when need for no bump, no flush routine is no longer required. If test is true, and there are no job changes, special rights, or furlough recalls, then a straight stovepipe job assignment routine may be implemented (fast).

inputs

sequence: array-like input (list or numpy array ok)

functions.print_settings()[source]: grab settings dictionary data settings and put it in a dataframe and then print it for a quick summary of scalar settings dictionary inputs

functions.remove_zero_groups(ratio_dict)[source]

remove data related to a “dummy” group represented by a zero

example:

{2: [([2], [0], [1]), [0, 2, 6], 34, 120]}

becomes:

{2: [([2], [1]), [0, 6], 34, 120]}

inputs

ratio_dict (dictionary): the ratio dictionary produced by the build_program_files script originating from the “ratio_cond” worksheet of the settings.xlsx input file

functions.sample_dataframe(df, n=None, frac=None, reset_index=False)[source]

Get a random sample of a dataframe by rows, with the number of rows in the returned sample defined by a count or fraction input.

inputs

df (dataframe): pandas dataframe for sampling
n (integer): If not None, the count of the rows in the returned sample dataframe. The “n” input will override the “frac” input if both are not None. Will be clipped between zero and len(df) if input exceeds these boundries.
frac (float): If not None, the size of the returned sample dataframe relative to the input dataframe. Will be ignored if “n” input is not None. Will be clipped between 0.0 and 1.0 if input exceeds these boundries. An input of .3 would randomly select 30% of the rows from the input dataframe.
reset_index (boolean): If True, reset the output dataframe index

If both the “n” and “frac” inputs are None, a random single row will be returned.

The rows in the output dataframe will be sorted according to original order.

functions.save_and_load_dill_folder(save_as=None, load_case=None, print_saved=False)[source]

Save the current “dill” folder to the “saved_dill_folders” folder, or load a saved dill folder as the “dill” folder if it exists.

This function allows calculated case study pickle files (including the calculated datasets) to be saved to or loaded loaded from a “saved_dill_folders” folder.

The “saved_dill_folders” folder is created if it does not already exist. The load_case input is a case study name. If the load_case input is set to None, the function will only save the current dill folder and do nothing else. If a load_case input is given, but is incorrect or no matching folder exists, the function will only save the current dill folder and do nothing else.

The user may print a list of available saved dill folders (for loading) by setting the print_saved input to True. No other action will take place when this option is set to True.

If an award has conditions which differ from proposed conditions, the settings dictionary must be modified and the dataset rebuilt.

This function allows previously calculated datasets to be quickly retrieved and eliminates continual adjustment of the settings spreadsheet if the user switches between case studies (assuming the award has been determined and no more input adjustment will be made).

inputs

save_as (string)

A user-specified folder prefix. If None, the current “dill” folder will be saved using the current case study name as a prefix. If set to a string value, the current dill folder will be saved with the “save_as” string value prefix.

Example with the save_as variable set to “test1”. The existing dill folder would be saved as:

saved_dill_folders/test1_dill_folder

load_case (string)

The name of a case study. If None, the only action performed will be to save the current “dill” folder to the “saved_dill_folders” folder. If the load_case variable is a valid case study name and a saved dill folder for that case study exists, the saved dill folder will become the current dill folder (contents of the saved dill folder will be copied into the current dill folder). This action will occur after the contents of the current dill folder are copied into the “saved_dill_folders” folder.

print_saved (boolean)

option to print the saved folder prefixes only. This provides a quick check of the folders available to be loaded. No other action will take place with this option set to True.

functions.set_snapshot_weights(job, ratio_dict, orig_rng, eg_range)[source]

Determine the job distribution ratios to carry forward during the ratio condition application period using actual jobs held ratios. likely called at implementation month by main job assignment function Count the number of jobs held by each of the ratio groups for each of the affected job level numbers. Set the weightings in the distribute function accordingly.

inputs

ratio_dict (dictionary): dictionary containing job levels as keys and ratio groups, weightings, month_start and month end as values.
orig_rng (numpy array): month slice of original job array
eg_range (numpy array): month slice of employee group code array

functions.squeeze_increment(data, eg, low_num, high_num, increment)[source]

Move members of a selected eg (employee group) within a list according to an increment input (positive or negative) while retaining relative ordering within all eg groups.

inputs

data (dataframe): dataframe with empkey as index which at minimum includes an order column and an eg column
eg (integer): employee group number code
low_num and high_num: indexes for the beginning and end of the list zone to be reordered
increment (integer): the amount to add or subrtract from the appropriate eg order number increment can be positive (move down list) or negative (move up list - toward zero)

Selected eg order numbers within the selected zone (as a numpy array) are incremented - then the entire group order numbers are reset within the zone using scipy.stats.rankdata.

The array is then assigned to a dataframe with empkeys as index.

functions.squeeze_logrithmic(data, eg, low_value, high_value, log_factor=1.5, put_segment=1, direction='d')[source]

perform a log squeeze (logrithmic-based movement of one eg (employee group), determine the closest matching indexes within the rng to fit the squeeze, put the affected group in those indexes, then fill in the remaining slots with the other group(s), maintaining orig ordering within each group at all times

inputs

data (dataframe): a dataframe indexed by empkey with at least 2 columns: employee group (eg) and order (order)
eg (employee code integer): the employee group to move
low_val and high_val (integers): integers marking the boundries (rng) for the operation (H must be greater than L)
log_factor (float): determines the degree of ‘logrithmic packing’
put_segment (float): allows compression of the squeeze result (values under 1)
direction (string): squeeze direction: “u” - move up the list (more senior) “d” - move down the list (more junior)

functions.starting_age(dob_input, start_date)[source]

Short_Form

Returns decimal age at given date.

“dob_input” (birth dates) may be in the form of a pandas dataframe, pandas series, list, or string

inputs

dob_list (dataframe, series, list, or string): birth dates input
start_date: comparative date for the birth dates, normally the data model starting date

functions.update_excel(case, file, ws_dict={}, sheets_to_remove=None)[source]

Read an excel file, optionally remove worksheet(s), add worksheets or overwrite worksheets with a dictionary of ws_name, dataframe key, value pairs, and write the excel file back to disk

inputs

case (string): the data model case name
file (string): the excel file name without the .xlsx extension
ws_dict (dictionary): dictionary of worksheet names as keys and pandas dataframes as values. The items in this dictionary will be passed into the excel file as worksheets. The worksheet name keys may be the same as some or all of the worksheet names in the excel file. In the case of matching names, the data from the input dict will overwrite the existing data (worksheet) in the excel file. Non-overlapping worksheet names/dataframe values will be added as new worksheets.
sheets_to_remove (list): a list of worksheet names (strings) representing worksheets to remove from the excel workbook. It is not necessary to remove sheets which are being replaced by worksheet with the same name.