matplotlib_charting module¶
-
matplotlib_charting.
add_pad
(list_in, pad=100)[source]¶ Separate all elements in a monotonic list by a minimum pad value.
Used by plotting functions to prevent overlapping tick labels.
- inputs
- list_in (list)
a monotonic list of numbers
- pad (integer)
the minimum separation required between list elements
If the function is unable to produce a list with the pad between all elements (excluding the last list spacing), the original list is returned. The function will permit the final list padding (between the last two elements) to be less than the pad value.
-
matplotlib_charting.
age_kde_dist
(df, color_list, p_dict, max_age, ds_dict=None, mnum=0, title_size=14, min_age=25, chart_style='darkgrid', xsize=12, ysize=10, image_dir=None, image_format='png')[source]¶ From the seaborn website: Fit and plot a univariate or bivariate kernel density estimate.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- color_list (list)
list of colors for employee group plots
- p_dict (dictionary)
eg to string dict for plot labels
- max_age (float)
maximum age to plot (x axis limit)
- ds_dict (dictionary)
output from load_datasets function
- mnum (integer)
month number to analyze
- title_size (integer or float)
text size of chart title
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
age_vs_spcnt
(df, eg_list, mnum, color_list, p_dict, ret_age, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, chart_style='darkgrid', size=20, alpha=0.8, suptitle_size=14, title_size=12, legend_size=12, xsize=10, ysize=8, image_dir=None, image_format='png')[source]¶ scatter plot with age on x axis and list percentage on y axis. note: input df may be prefiltered to plot focus attributes, i.e. filter to include only employees at a certain job level, hired between certain dates, with a particular age range, etc.
- inputs
- df (string or dataframe)
text name of input proposal dataset, also will accept any dataframe variable (if a sliced dataframe subset is desired, for example) Example: input can be ‘proposal1’ (if that proposal exists, of course, or could be df[df.age > 50])
- eg_list (list)
list of employee groups to include example: [1, 2]
- mnum (int)
month number to study from dataset
- color_list (list)
color codes for plotting each employee group
- p_dict (dict)
dictionary, numerical eg code to string description
- ret_age (integer or float)
chart xaxis limit for plotting
- ds_dict (dict)
variable assigned to the output of the load_datasets function, reqired when string dictionary key is used as df input
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- chart_style (string)
any valid seaborn plotting style
- size (integer)
size of scatter points
- alpha (float)
scatter point alpha (0.0 to 1.0)
- suptitle_size (integer or font)
text size of chart super title
- title_size (integer or float)
text size of chart title
- legend_size (integer or float)
text size of chart legend
- xsize, ysize (integer or float)
plot size in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
build_subplotting_order
(rows, cols)[source]¶ build a list of integers to permit passing through subplots by columns note: only used when looping completes one vertical column before continuing to next column
- inputs
- rows, cols (integer)
number of rows and columns in multiple chart output
-
matplotlib_charting.
cohort_differential
(ds, base, sdict, cdict, adict, measure='ldate', compare_value='2010-12-31', mnum=None, ds_dict=None, single_eg_compare=None, sort_xax_by_measure=False, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, pos_color='g', neg_color='r', pos_alpha=0.25, neg_alpha=0.25, bg_color=None, zero_line_color='m', title_size=16, label_size=14, tick_size=12.5, legend_size=12.5, xsize=14, ysize=10, image_dir=None, image_format='png')[source]¶ Compare proposed integrated list locations of employees from different groups who share a similar attribute value.
This function is best used with date-type attributes, such as longevity date or date of hire.
The comparative list locations are a continuous list of index locations determined by finding the last list position within an attribute column from another employee group which is less than or equal to a corresponding column from the base employee group. A variance or differential is calculated by comparing the base and comparative locations.
Attributes (measures) are sorted within each employee group prior to comparison. The x axis may be arranged to display proposed list ordering or the attribute value range (typically a date range).
Differences in list position are shown with a line above or below zero. One employee group (base) is compared to other group(s) in the proposed list within a selected month. When the line is above zero, it means that the base group cohort at a particular x axis position is on the list ahead of another group cohort by an amount equal to the y displacement of the line. The line colors correspond to the employee group color codes.
The default behavior is to compare the base group with all other groups at once, but single group comparison may be accomplished as well.
When the x axis is set to display list location (not attribute values), the user may designate a compare value. The list location of employees from each group who share the comparison attribute value will be marked on the chart with a color-coded vertical line.
- inputs
- ds (dataframe)
dataset for analysis
- base (integer)
employee group number code
- sdict (dictionary)
program settings dictionary
- cdict (dictionary)
program color dictionary
- adict (dictionary)
program attribute dictionary
- measure (string)
attribute column for list location comparison, likely ‘ldate’ or ‘doh’
- compare_value (type to match measure input dtype)
value to mark on chart if “sort_xax_by_measure” input is False. Likely a date string, such as “2001-01-31”
- mnum (integer)
data model month number to study
- ds_dict (dictionary)
dictionary of datasets, likely generated by the “load_datasets” function
- single_eg_compare (integer)
if not None, compare base employee group to this group only
- sort_xax_by_measure (boolean)
if True, use an x axis for the chart based on the selected measure. if False, use list location for the x axis
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- pos_color, neg_color (color value string)
color used for the positive and negative area shading
- pos_alpha, neg_alpha (integer or float)
transparency value assigned to the positive and negative color shading areas (0.0 to 1.0)
- bg_color (color value string)
if not None, the color for the chart background
- zero_line_color (color value string)
color for the zero line
- title_size (integer or float)
text size for the chart title
- label_size (integer or float)
text size for the chart axis labels
- tick_size (integer or float)
text size for the chart tick labels
- legend_size (integer or float)
text size for the chart legend
- xsize, ysize (integer or float)
size of the chart in inches (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
cond_test
(df, grp_sel, enhanced_jobs, job_colors, job_dict, basic_jobs=None, ds_dict=None, plot_all_jobs=False, min_mnum=None, max_mnum=None, limit_to_jobs=None, use_and=False, print_count_months=None, print_all_counts=False, plot_job_bands_chart=True, only_target_bands=False, legend_size=14, title_size=16, xsize=8, ysize=8, image_dir=None, image_format='png')[source]¶ visualize selected job counts over time applicable to computed condition with optional printing of certain data.
Primary usage is validation of job assignment conditions by charting the count(s) of job(s) assigned by the program to particular employee groups over time.
The function may also be used to evaluate distribution of jobs with various proposals. Career progression of employees who enjoy special job rights may be understood particularily well by utilizing the print_all_counts option.
The output is 2 charts. The first chart is a line chart displaying selected job count information over time. The second is a stacked area chart displaying all job counts for the selected group(s) over time.
There are additional optional print outputs. The print_all_counts option will print a dataframe containing job count totals for each month. The print_count_months input is a list of months to print the only the plotted job counts, primarily for testing purposes.
- inputs
- df (dataframe)
dataset(dataframe) to examine
- grp_sel (list)
integer input(s) representing the employee group code(s) to select for analysis. This argument also will accept the string ‘sg’ to select a special job rights group(s). Multiple inputs are normally handled as ‘or’ filters, meaning an input of [1, ‘sg’] would mean employee group 1 or any special job rights group, but can be modified to mean only group 1 and special job rights employees with the ‘use_and’ input.
- enhanced_jobs (boolean)
if True, basic_jobs input job levels will be converted to enhanced job levels with reference to the job_dictionary input, otherwise basic_jobs input job levels will be used
- job_colors (list)
list of color values to use for job plots
- job_dict (dictionary)
dictionary containing basic to enhanced job level conversion data. This is likely the settings dictionary “jd” value.
- basic_jobs (list)
basic job levels to plot. This list will be converted to the corresponding enhanced job list if the enhanced_jobs input is set to True. Defaults to [1] if not assigned.
- ds_dict (dictionary)
dataset dictionary which allows df input to be a string description (proposal name)
- plot_all_jobs (boolean)
option to plot all of the job counts within the input dataset vs only those selected with the basic_jobs input (or as converted to enhanced jobs if enhanced_jobs input is True). The jobs plotted may be filtered by the limit_to_jobs input.
- min_mnum (integer)
integer input, only plot data including this month forward(mnum). Defaults to zero.
- max_mnum (integer)
integer input, only plot data through selected month (mnum). Defaults to maximum mnum for input data
- limit_to_jobs (list)
a list of jobs to plot, allowing focus on target jobs. Should be a subset of normal output, otherwise no filtering of normal output occurs
- use_and (boolean)
when the grp_sel input has more than one element, require filtered dataframe for analysis to be part of all grp_sel input sets.
- print_count_months (list)
list of month(s) for printing job counts
- print_all_counts (boolean)
if True, print the entire job count dataframe.
- plot_job_bands_chart (boolean)
if True, plot an area chart beneath the job count chart. The area chart will display all of the jobs available to the selected employee group(s) over time with job band areas
- only_target_bands (boolean)
if True, plot area chart of jobs from job count chart only, vs the default of all job levels
- legend_size (integer or float)
text size of legend labels
- title_size (integer or float)
text size of chart title
- xsize, ysize (integer or float)
size of chart display in inches (width and height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
determine_dataset
(ds_def, ds_dict=None, return_label=False)[source]¶ this function permits either a dictionary key (string) or a dataframe variable to be used in functions as a dataframe object.
- inputs
- ds_def (dataframe or string)
A pandas dataframe or a string representing a key for a dictionary which contains dataframe(s) as values
- ds_dict (dictionary)
A dictionary containing string to dataframes, used if ds_def input is not a dataframe
- return_label (boolean)
If True, return a descriptive dataframe label if the ds_dict was referenced, otherwise return a generic “Proposal” string
-
matplotlib_charting.
diff_range
(df_list, dfb, measure, eg_list, attr_dict, ds_dict=None, cm_name='Set1', attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, year_clip=2042, show_range=False, range_alpha=0.25, show_mean=True, normalize_y=False, suptitle_size=16, title_size=16, tick_size=13, label_size=16, legend_size=14, chart_style='whitegrid', ysize=6, xsize=11, image_dir=None, image_format='png')[source]¶ Plot a range of differential attributes or a differential average over time. Individual employee groups and proposals may be selected. Each chart indicates the results for one group with color bands or average lines indicating the results for that group under different proposals. This is different than the usual method of different groups being plotted on the same chart.
- inputs
- df_list (list)
list of datasets to compare, may be ds_dict (output of load_datasets function) string keys or dataframe variable(s) or mixture of each
- dfb (dataframe, can be proposal string name)
baseline dataset, accepts same input types as df_list above
- measure (string)
differential data to compare
- eg_list (list)
list of integers for employee groups to be included in analysis. example: [1, 2, 3] A chart will be produced for each employee group number.
- eg_colors (list)
list of colors to represent different proposal results
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
output from load_datasets function
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- year_clip (integer)
only plot data up to and including this year
- show_range (boolean)
show a transparent background on the chart representing the range of values for each measure for each proposal
- range_alpha (float)
transparancy level for range plotting (0.0 to 1.0)
- show_mean (boolean)
plot a line representing the average of the measure values for the group under each proposal
- normalize_y (boolean)
if measure is ‘spcnt’ or ‘lspcnt’, equalize the range of the y scale on all charts (-.5 to .5)
- suptitle_size (integer or font)
text size of chart super title
- title_size (integer or font)
text size of chart title
- tick_size (integer or font)
text size of chart tick labels
- label_size (integer or font)
text size of chart x and y axis labels
- legend_size (integer or font)
text size of the legend labels
- chart_style (string)
any valid seaborn plotting style (string)
- xsize, ysize (integer or font)
size of chart in inches (width and height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
differential_scatter
(df_list, dfb, measure, eg_list, attr_dict, color_dict, p_dict, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, prop_order=True, show_scatter=True, show_lin_reg=True, show_mean=True, mean_len=50, dot_size=15, lin_reg_order=15, ylimit=False, ylim=5, suptitle_size=14, title_size=12, legend_size=14, tick_size=11, label_size=12, bright_bg=False, bright_bg_color='#faf6eb', chart_style='whitegrid', xsize=12, ysize=8, image_dir=None, image_format='png')[source]¶ plot an attribute differential between datasets.
datasets may be filtered by other attributes if desired.
Example: plot the difference in cat_order (job rank number) between all integrated datasets vs. standalone for all employee groups, applicable to month 57. (optionally add a pre-filter(s), such as all employees hired prior to a certain date)
The chart may be set to use proposal order or native list percentage for the x axis.
The scatter markers are selectable on/off, as well as an average line and a linear regression line.
- inputs
- df_list (list)
list of datasets to compare, may be ds_dict (output of load_datasets function) string keys or dataframe variable(s) or mixture of each
- dfb (string or variable)
baseline dataset, accepts same input types as df_list above
- measure (string)
attribute to analyze
- eg_list (list)
list of employee group codes
- attr_dict (dictionary)
dataset column name description dictionary
- color_dict (dictionary)
dictionary containing color list string titles to lists of color values generated by the build_program_files script
- p_dict (dictionary)
employee group code number to description dictionary
- ds_dict (dictionary)
output from load_datasets function
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- eg_list (list)
a list of employee groups to analyze
- prop_order (boolean)
if True, organize x axis by proposal list order, otherwise use native list percent
- show_scatter (boolean)
if True, draw the scatter chart markers
- show_lin_reg (boolean)
if True, draw linear regression lines
- show_mean (boolean)
if True, draw average lines
- mean_len (integer)
moving average length for average lines
- dot_size (integer or float)
scatter marker size
- lin_reg_order (integer)
regression line is actually a polynomial regression lin_reg_order is the degree of the fitting polynomial
- ylimit (boolean)
if True, set chart y axis limit to ylim (below)
- ylim (integer or float)
y axis limit positive and negative if ylimit is True
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- legend_size (integer or float)
text size of chart legend labels
- tick_size (integer or float)
text size of x and y tick labels
- label_size (integer or float)
text size of x and y descriptive labels
- bright_bg (boolean)
use a custom color chart background
- bright_bg_color (color value)
chart background color if bright_bg input is set to True
- chart_style (string)
style for chart, valid inputs are any seaborn chart style
- xsize, ysize (integer or float)
size of chart (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
display_proposals
()[source]¶ print out a list of the proposal names which were generated and stored in the dill folder by the build_program_files script
no inputs
-
matplotlib_charting.
eg_attributes
(ds, xmeasure, ymeasure, sdict, adict, cdict, eg_list=None, mnum=None, ret_only=False, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, q_eglist_only=True, xquant_lines=True, x_quantiles=10, xl_alpha=1, xl_ls='dashed', xl_lw=1, xl_color='.7', x_bands=True, xb_fc='.3', xb_alpha=0.09, yquant_lines=True, y_quantiles=10, yl_alpha=1, yl_ls='dashed', yl_lw=1, yl_color='.7', y_bands=True, yb_fc='#66ffb3', yb_alpha=0.09, linestyle='', linewidth=0, markersize=5, marker_alpha=0.7, grid_alpha=0.25, chart_style='ticks', full_xpcnt=True, full_ypcnt=True, xax_rotate=70, label_size=13, qtick_size=12, tick_size=12, border_size=0.5, legend_size=14, title_size=18, y_title_pos=1.12, box_height=0.95, xsize=15, ysize=11, image_dir=None, image_format='png')[source]¶ Plot selected employee group(s) attribute data.
Chart x and y axes may be any dataset attributes, including date attributes.
Quantile membership for the x and/or y attribute may also be displayed. Membership may be relative to the entire integrated population or only to the employee group(s) selected for display (q_eglist_only input).
- inputs
- ds (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- xmeasure (string)
attribute to plot on x axis
- ymeasure (string)
attribute to plot on y axis
- sdict (dictionary)
program settings dictionary
- adict (dictionary)
dataset column name description dictionary
- cdict (dictionary)
program colors dictionary
- eg_list (list)
list of employee groups to plot (integer codes)
- mnum (integer)
month number for analysis
- ret_only (boolean)
if True, mnum input is ignored and results are displayed for all employees at retirement
- ds_dict (dictionary)
output of the load_datasets function, dictionary. This keyword argument must be set if a string key is used as the df input.
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- q_eglist_only (boolean)
if set to True:
if quantile bands are displayed, show membership based on selected employee groups (eg_list input).
if set to False:
if quantile bands are displayed, show membership based on the integrated group population (all groups).
- xquant_lines (boolean)
if True, show quantile membership for x axis attribute
- x_quantiles (integer)
number of quantiles to display if xquant_lines input is True
- xl_alpha (float)
transparency value of x axis quantile lines (0.0 to 1.0)
- xl_ls (string)
x axis quantile lines linestyle (‘dashed’, ‘dotted’, etc.)
- xl_lw (integer or float)
x axis quantile lines line width
- xl_color (string color value)
x axis quantile lines color
- x_bands (boolean)
if True, show a background color within every other x axis quantile membership area
- xb_fc (string color value)
x axis quantile bands background color
- xb_alpha (float)
x axis quantile bands color transparency value (0.0 to 1.0)
- yquant_lines (boolean)
if True, show quantile membership for y axis attribute
- y_quantiles (integer)
number of quantiles to display if yquant_lines input is True
- yl_alpha (float)
transparency value of y axis quantile lines (0.0 to 1.0)
- yl_ls (string)
y axis quantile lines linestyle (‘dashed’, ‘dotted’, etc.)
- yl_lw (integer or float)
y axis quantile lines line width
- yl_color (string color value)
y axis quantile lines color
- y_bands (boolean)
if True, show a background color within every other y axis quantile membership area
- yb_fc (string color value)
y axis quantile bands background color
- yb_alpha (float)
y axis quantile bands color transparency value (0.0 to 1.0)
- markersize (integer or float)
size of chart scatter points
- marker_alpha (integer or float)
transparency setting for plot lines or points (0.0 to 1.0)
- grid_alpha (float)
transparency value for the chart grid corresponding to the x and y attribute values (not the quantile membership lines)
- chart_style (string)
any valid seaborn chart style name
- full_xpcnt (boolean)
if True, show full range percentage (0 to 100 percent) when a percentage attribute is displayed on the x axis
- full_ypcnt (boolean)
if True, show full range percentage (0 to 100 percent) when a percentage attribute is displayed on the y axis
- xax_rotate (integer)
rotation value (in degrees) for the x axis tick labels
- qtick_size (integer or float)
text size of the quantile membership tick labels
- tick_size (integer or float)
text size of the x and y attribute tick labels
- label_size (integer or float)
text size of x and y axis labels
- border_size (integer or float)
width of the chart border line (chart spines)
- legend_size (integer or float)
text size of chart legend
- title_size (integer or float)
text size of chart title
- y_title_pos (float)
vertical position of the chart title when attribute filtering has been applied. (typical values are 1.1 to 1.2)
- box_height (float)
chart height multiplier which slightly shrinks vertical chart area for proper printing (saving) purposes. This input does not affect the displayed values.
- xsize, ysize (integer or float)
plot size in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
eg_boxplot
(df_list, eg_list, eg_colors, job_clip, attr_dict, measure='spcnt', ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, year_clip=2035, exclude_fur=False, saturation=0.8, chart_style='dark', width=0.7, notch=True, show_whiskers=True, show_xgrid=True, show_ygrid=True, grid_alpha=0.4, grid_linestyle='solid', whisker=1.5, fliersize=1.0, linewidth=0.75, suptitle_size=14, title_size=12, tick_size=11, label_size=12, xsize=12, ysize=8, image_dir=None, image_format='png')[source]¶ create a box plot chart displaying ACTUAL attribute values (vs. differential values) from a selected dataset(s) for selected employee group(s).
- inputs
- df_list (list)
list of datasets to compare, may be ds_dict (output of load_datasets function) string keys or dataframe variable(s) or mixture of each
- eg_list (list)
list of integers for employee groups to be included in analysis example: [1, 2, 3]
- measure (string)
attribute for analysis
- eg_colors (list)
list of colors for plotting the employee groups
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
output from load_datasets function
- job_clip (float)
if measure is jnum or jobp, limit max y axis range to this value
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- year_clip (integer)
only present results through this year
- exclude_fur (boolean)
remove all employees from analysis who are furloughed within the data model at any time (boolean)
- chart_style (string)
chart styling (string), any valid seaborn chart style
- width (float)
plotting width of boxplot or grouped boxplots for each year. a width of 1 leaves no gap between groups
- notch (boolean)
If True, show boxplots with a notch at median point
- show_xgrid (boolean)
include vertical grid lines on chart
- show_ygrid (boolean)
include horizontal grid lines on chart
- grid_alpha (float)
opacity value for grid lines
- grid_linestyle (string)
examples: ‘solid’, ‘dotted’, ‘dashed’
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- tick_size (integer or float)
text size of x and y tick labels
- label_size (integer or float)
text size of x and y descriptive labels
- xsize, ysize (integer or float)
width and hieght of plot in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
eg_diff_boxplot
(df_list, dfb, eg_list, eg_colors, job_levels, job_diff_clip, attr_dict, measure='spcnt', comparison='baseline', ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, suptitle_size=14, title_size=12, tick_size=11, label_size=12, year_clip=None, exclude_fur=False, width=0.8, chart_style='dark', notch=True, linewidth=1.0, xsize=12, ysize=8, image_dir=None, image_format='png')[source]¶ create a DIFFERENTIAL box plot chart comparing a selected measure from computed integrated dataset(s) vs. a baseline (likely standalone) dataset or with other integrated datasets.
- inputs
- df_list (list)
list of datasets to compare, may be ds_dict (output of load_datasets function) string keys or dataframe variable(s) or mixture of each
- dfb (string or variable)
baseline dataset, accepts same input types as df_list above
- eg_list (list)
list of integers for employee groups to be included in analysis example: [1, 2, 3]
- eg_colors (list)
corresponding plot colors for eg_list input
- job_levels (integer)
number of job levels in the data model (excluding furlough)
- job_diff_clip (integer)
if measure is jnum or jobp, limit y axis range to +/- this value
- attr_dict (dictionary)
dataset column name description dictionary
- measure (string)
differential data to compare
- comparison (string)
if ‘p2p’ (proposal to proposal), will compare proposals within the df_list to each other, otherwise will compare proposals to the baseline dataset (dfb)
- ds_dict (dictionary)
output from load_datasets function
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- tick_size (integer or float)
text size of x and y tick labels
- label_size (integer or float)
text size of x and y descriptive labels
- year_clip (integer)
only present results through this year if not None
- exclude_fur (boolean)
remove all employees from analysis who are furloughed within the data model at any time
- use_eg_colors (boolean)
use case-specific employee group colors vs. default colors
- width (float)
plotting width of boxplot or grouped boxplots for each year. a width of 1 leaves no gap between groups
- chart_style (string)
chart styling (string), any valid seaborn chart style
- notch (boolean)
If True, show boxplots with a notch at median point vs. only a line
- xsize, ysize (integer or float)
plot size in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
eg_multiplot_with_cat_order
(df, mnum, measure, xax, job_strs, job_level_colors, job_levels, settings_dict, attr_dict, color_dict, egs=[], ds_dict=None, fur_color=None, exclude_fur=False, plot_scatter=True, s=20, a=0.7, lw=0, job_bands_alpha=0.3, title_size=14, tick_size=12, label_pad=110, chart_style='whitegrid', remove_ax2_border=True, lgd_h_adj=None, xsize=13, ysize=10, image_dir=None, image_format='png')[source]¶ plot any dataset attributes as x or y values for comparison
when “cat_order” is selected as measure, show job category bands
- inputs
- df (dataframe)
pandas dataframe input
- mnum (integer)
month number for analysis
- measure (string)
dataframe column name (attribute for analysis)
- xax (string)
x axis attribute
- job_strs (list)
list of job descriptions for labels (normally sdict[‘job_strs’])
- job_level_colors (list)
list of colors for job level zones (normally cdict[‘job_colors’])
- job_levels (integer)
number of job levels in model (sdict[‘num_of_job_levels’])
- settings_dict (dictionary)
program job settings dictionary
- attr_dict (dictionary)
program attribute name to attribute description dictionary
- color_dict (dictionary)
color dictionary
- egs (list)
list of employee groups for plotting
- ds_dict (dictionary)
output from load_datasets function
- fur_color (string color value)
if not None, color for furlough span color
- exclude_fur (boolean)
if True, remove furloughed employees from input data
- plot_scatter (boolean)
if True (default), plot a scatter chart, otherwise plot a line chart
- s (integer or float)
size of scatter markers if a plot_scatter input is True
- a (float)
transparency value for both line plots and scatter plots (0.0 to 1.0)
- lw (integer or float)
width of maker edge lines with a scatter plot
- job_bands_alpha (float)
transparency value for job level color spans
- title_size (integer or float)
text size of chart title
- tick_size (integer or float)
text size of chart tick labels
- label_pad (integer)
minimum padding between job description labels that would otherwise overlap
- chart_style (string)
any seaborn plotting style name
- remove_ax2_border (boolean)
if True, remove axis 2 (ax2) chart spines
- xsize, ysize (integer or float)
width and height of chart
- lgd_h_adj (float)
set to a small float value (for example: .02, -.01) to adjust the horizontal position of the chart legend if required. Use negative values to move left, positive values to move right
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
emp_quick_glance
(empkey, df, ds_dict=None, title_size=14, tick_size=13, lw=4, chart_style='dark', xsize=8, ysize=48, image_dir=None, image_format='png')[source]¶ view basic stats for selected employee and proposal
A separate chart is produced for each measure.
- inputs
- empkey (integer)
employee number (in data model)
- df (dataframe)
dataset to study, will accept string proposal name
- ds_dict (dictionary)
variable assigned to load_datasets function output
- title_size (integer or float)
text size of chart title
- tick_size (integer or font)
text size of chart tick labels
- lw (integer or float)
line width of plot lines
- chart_style (string)
any valid seaborn charting style
- xsize, ysize (integer or float)
size of chart display
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
filter_ds
(ds, attr1=None, oper1=None, val1=None, attr2=None, oper2=None, val2=None, attr3=None, oper3=None, val3=None, return_title_string=True)[source]¶ Filter a dataset (dataframe) by attribute(s).
Filter process is ignored if attr(n) input is None. All attr, oper, and val inputs must be strings. Up to 3 attribute filters may be combined.
Attr, oper, and val inputs are combined and then evaluated as expressions.
If return_title_string is set to True, returns tuple (ds, title_string), otherwise returns ds.
- inputs
- ds (dataframe)
the dataframe to filter
- attr(n) (string)
an attribute (column) to filter. Example: ‘ldate’
- oper(n) (string)
an operator to apply to the attr(n) input. Example: ‘<=’
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- return_title_string (boolean)
If True, returns a string which dexcribes the filter(s) applied to the dataframe (ds)
-
matplotlib_charting.
group_average_and_median
(dfc, dfb, eg_list, eg_colors, measure, job_levels, settings_dict, attr_dict, ds_dict=None, attr1=None, oper1='>=', val1='0', attr2=None, oper2='>=', val2='0', attr3=None, oper3='>=', val3='0', plot_median=False, plot_average=True, compare_to_dfb=True, use_filtered_results=True, show_full_yscale=False, job_labels=True, max_date=None, chart_style='whitegrid', xsize=14, ysize=8, image_dir=None, image_format='png')[source]¶ Plot group average and/or median for a selected attribute over time for compare and/or base datasets. Standalone data may be used as compare or baseline data.
Results may be further filtered/sliced by up to 3 constraints, such as age, longevity, or job level.
This function can plot basic data such as average list percentage or could, for example, plot the average job category rank for employees hired prior to a certain date who are over or under a certain age, for a selected integrated dataset and/or standalone data (or for two integrated datasets).
- inputs
- dfc (string or dataframe variable)
comparative dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- dfb (string or dataframe variable)
baseline dataset to plot (likely use standalone dataset here for comparison, but may plot and compare any dataset), may be a dataframe variable or a string key from the ds_dict dictionary object
- eg_list (list)
list of integers representing the employee groups to analyze (i.e. [1, 2])
- eg_colors (list)
list of colors for plotting the employee groups
- measure (string)
attribute (column) to compare, such as ‘spcnt’ or ‘jobp’
- job_levels (integer)
number of job levels in the data model
- settings_dict (dictionary)
program settings dictionary generated by the build_program_files script
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
dataset dictionary (variable assigned to the output of load_datasets function)
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- plot_meadian (boolean)
plot the median of the measure for each employee group
- plot_average (boolean)
plot the average(mean) of the measure for each employee group
- compare_to_dfb (boolean)
plot average dfb[measure] data as dashed line. (likely show standalone data with dfb, or reverse and show standalone as primary and integrated as dfb) (dfb refers to baseline dataframe or dataset)
- use_filtered_results (boolean)
if True, use the same employees from the filtered proposal list. For example, if the dfc list is filtered by age only, the dfb list could be filtered by the same age and return the same employees. However, if the dfc list is filtered by an attribute which diverges from the dfb measurements for the same attribute, a different set of employees could be returned. This option ensures that the same group of employees from both the dfc (filtered first) list and the dfb list are compared. (dfc refers to the comparison proposal, dfb refers to baseline)
- show_full_yscale (boolean)
if measure input is one of these: ‘jnum’, ‘nbnf’, ‘jobp’, ‘fbff’, if True, show all job levels on chart. Otherwise, allow chart to autoscale with plotted data
- job_labels (boolean)
if measure input is one of these: ‘jnum’, ‘nbnf’, ‘jobp’, ‘fbff’, use job text description labels vs. number labels on the y axis of the chart (boolean)
- max_date (date string)
maximum chart date. If set to ‘None’, the maximum chart date will be the maximum date within the list data.
- chart_style (string)
option to specify alternate seaborn chart style
- xsize, ysize (integer or float)
x and y size of chart in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
job_count_bands
(df_list, eg_list, job_colors, settings_dict, ds_dict=None, emp_list=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, fur_color=None, show_grid=True, max_date=None, plot_alpha=0.75, legend_alpha=0.9, legend_xadj=1.3, legend_yadj=1.0, legend_size=11, title_size=14, tick_size=12, label_size=13, chart_style='darkgrid', xsize=13, ysize=8, image_dir=None, image_format='png')[source]¶ area chart representing count of jobs available over time
This chart displays the future job opportunities for each employee group with various list proposals.
This is not a comparative chart (for example, with standalone data), it is simply displaying job count outcome over time. However, the results for the employee groups may be compared and measured for equity.
- Inputs
- df_list (list)
list of datasets to compare, may be ds_dict (output of load_datasets function) string keys or dataframe variable(s) or mixture of each
- eg_list (list)
list of integers for employee groups to be included in analysis example: [1, 2, 3]
- job_colors (list)
list of colors to represent job levels
- settings_dict (dictionary)
program settings dictionary generated by the build_program_files script
- ds_dict (dictionary)
output from load_datasets function
- emp_list (list)
optional list of employee number(s) to plot (empkey attribute)
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- fur_color (color code in rgba, hex, or string style)
custom color to signify furloughed job level band (otherwise, last color in job_colors input will be used)
- max_date (date string)
only include data up to this date example input: ‘1997-12-31’
- plot_alpha (float, 0.0 to 1.0)
alpha value (opacity) for area plot (job level bands)
- legend_alpha (float, 0.0 to 1.0)
alpha value (opacity) for legend markers
- legend_xadj, legend_yadj (floats)
adjustment input for legend horizontal and vertical placement
- legend_size (integer or float)
text size of legend labels
- title_size (integer or float)
text size of chart title
- tick_size (integer or float)
text size of x and y tick labels
- label_size (integer or float)
text size of x and y descriptive labels
- chart_style (string)
chart styling (string), any valid seaborn chart style
- xsize, ysize (integer or float)
plot size in inches (width and height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
job_count_charts
(dfc, dfb, settings_dict, eg_colors, eg_list=None, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, plot_egs_sep=False, plot_total=True, xax='date', year_max=None, chart_style='darkgrid', base_ls='-', prop_ls=':', base_lw=1.6, prop_lw=2.5, suptitle_size=14, title_size=12, total_color='g', xsize=5, ysize=4, image_dir=None, image_format='png')[source]¶ line-style charts displaying job category counts over time.
optionally display employee group results on separate charts or together
- inputs
- dfc (dataframe)
proposal (comparison) dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- dfb (dataframe)
baseline dataset; proposal dataset is compared to this dataset, may be a dataframe variable or a string key from the ds_dict dictionary object
- settings_dict (dictionary)
program settings dictionary generated by the build_program_files script
- eg_colors (list)
list of color values for plotting the employee groups, length is equal to the number of employee groups in the data model
- eg_list (list)
list of employee group codes to plot Example: [1, 2]
- ds_dict (dictionary)
variable assigned to load_datasets function output
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- plot_egs_sep (boolean)
if True, plot each employee group job level counts separately
- plot_total (boolean)
if True, include the combined job counts on chart(s)
- xax (string)
x axis groupby attribute, options are ‘date’ or ‘mnum’, default is ‘date’
- year_max (integer)
maximum year to include on chart Example: if input is 2030, chart would display data from beginning of data model through 2030 (integer)
- base_ls (string)
line style for base job count line(s)
- prop_ls (string)
line style for comparison (proposal) job count line(s)
- base_lw (float)
line width for base job count line(s)
- prop_lw (float)
line width for comparison (proposal) job count lines
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
chart title(s) font size
- total_color (color value)
color for combined job level count from all employee groups
- xsize, ysize (integer or float)
size of chart display in inches (width and height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
job_grouping_over_time
(df, eg_list, jobs, job_colors, p_dict, plt_kind='bar', ds_dict=None, rets_only=True, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, time_group='A', display_yrs=40, legend_loc=4, chart_style='darkgrid', suptitle_size=14, title_size=12, legend_size=13, tick_size=11, label_size=13, xsize=12, ysize=10, image_dir=None, image_format='png')[source]¶ Inverted bar chart display of job counts by group over time. Various filters may be applied to study slices of the datasets.
The ‘rets_only’ option will display the count of employees retiring from each year grouped by job level.
developer TODO: fix x axis scaling and labeling when quarterly (“Q”) or monthly (“M”) time group option selected.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- eg_list (list)
list of unique employee group numbers within the proposal Example: [1, 2]
- jobs (list)
list of job label strings (for plot legend)
- job_colors (list)
list of colors to be used for plotting
- p_dict (dictionary)
employee group to string description dictionary
- plt_kind (string)
‘bar’ or ‘area’ (bar recommended)
- ds_dict (dictionary)
output from load_datasets function
- rets_only (boolean)
calculate for employees at retirement age only
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- time_group (string)
group counts/percentages by year (‘A’), quarter (‘Q’), or month (‘M’)
- display_years (integer)
when using the bar chart type display, evenly scale the x axis to include the number of years selected for all group charts
- legend_loc (integer)
matplotlib legend location number code
2
9
1
6
10
7
3
8
4
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- legend_size (integer or float)
text size of chart legend labels
- tick_size (integer or float)
text size of x and y tick labels
- label_size (integer or float)
text size of x and y descriptive labels
- xsize, ysize (integer or float)
size of each chart in inches (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
job_level_progression
(df, emp_list, through_date, settings_dict, color_dict, eg_colors, band_colors, ds_dict=None, rank_metric='cat_order', chart_style='white', show_implementation_date=True, job_bands_alpha=0.1, max_plots_for_legend=5, xgrid_alpha=0.65, xgrid_linestyle='dotted', ygrid_alpha=0.5, ygrid_linestyle='dotted', tick_size=13, job_descr_size=12.5, job_descr_pad=115, label_size=15, title_size=18, xsize=12, ysize=10, image_dir=None, image_format='png')[source]¶ show employee(s) career progression through job levels regardless of actual positioning within integrated seniority list.
This x axis of this chart represents rank within job category. There is an underlying stacked area chart representing job level bands, adjusted to reflect job count changes over time.
This chart reveals actual career path considering no bump no flush, special job assignment rights/restrictions, and furlough/recall events.
Actual jobs held may not be correlated to jobs normally associated with a certain list percentage for many years due to job assignment factors.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- emp_list (list)
list of empkeys to plot
- through_date (date string)
string representation of y axis date limit, ex. ‘2025-12-31’
- settings_dict (dictionary)
program settings dictionary generated by the build_program_files script
- color_dict (dictionary)
dictionary containing color list string titles to lists of color values generated by the build_program_files script
- eg_colors (list)
colors to be used for employee line plots corresponding to employee group membership
- band_colors (list)
list of colors to be used for stacked area chart which represent job level bands
- ds_dict (dictionary)
output from load_datasets function
- rank_metric (string)
column name for y axis chart ranking. Currently only ‘cat_order’ is valid.
- chart_style (string)
any valid seaborn plotting chart style name
- show_implementation_date (boolean)
plot a vertical dashed line at the implementation date
- job_bands_alpha (float)
opacity level of background job bands stacked area chart
- max_plots_for_legend (integer)
if number of plots more than this number, reduce plot linewidth and remove legend
- xgrid_alpha, ygrid_alpha (float)
transparency value for grid. x and y axis may be set independently
- xgrid_linestyle, ygrid_linestyle (string)
matplotlib line style for grid, such as “dotted” or “dashed”. x and y axis may be set independently
- job_descr_size (integer or float)
font size of job description text labels on right side of chart
- job_descr_pad (integer)
padding to add between job description labels when they would otherwise overlap
- tick_size (intger or float)
font size of tick labels
- job_descr_size (integer or float)
font size of job description labels
- label_size (integer or float)
font size of axis labels
- title_size (integer or label)
font size of title
- xsize, ysize (integer or float)
plot size in inches (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
job_time_change
(ds_list, ds_base, eg_list, job_colors, job_strs_dict, job_levels, attr_dict, xax, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, marker='o', edgecolor='k', linewidth=0.05, size=25, alpha=0.95, bg_color='#ffffff', x_max=1.02, limit_yax=False, ylimit=40, zeroline_color='m', zeroline_width=1.5, pos_neg_face=True, pos_neg_face_alpha=0.03, legend_job_strings=True, legend_position=1.18, legend_marker_size=130, suptitle_size=16, title_size=14, tick_size=12, chart_style='whitegrid', label_size=13, xsize=12, ysize=10, image_dir=None, image_format='png', experimental=False)[source]¶ Plots a scatter plot displaying monthly time in job differential, by proposal and employee group. X axis percentage reflects first month within each comparative dataset, which will be the same as standalone for all groups unless the data model implementation date occurs at month zero.
- inputs
- ds_list (list)
list of datasets to compare, may be ds_dict (output of load_datasets function) string keys or dataframe variable(s) or mixture of each
- ds_base (string or variable)
baseline dataset, accepts same input types as ds_list above
- eg_list (list)
list of integers for employee groups to be included in analysis example: [1, 2, 3]
- job_levels (integer)
number of job levels in the data model
- job_colors (list)
list of color values for job level plotting
- job_strs_dict (dictionary)
dictionary of job code (integer) to job description label
- attr_dict (dictionary)
dataset column name description dictionary
- xax (string)
list percentage attrubute, i.e. spcnt or lspcnt
- ds_dict (dictionary)
output from load_datasets function
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- job_colors (list)
list of color values for the job level plotting
- job_strs_dict (dictionary)
job number to job label dictionary
- marker (string)
scatter chart matplotlib marker type
- edgecolor (color value)
matplotlib marker edge color
- linewidth (integer or float)
matplotlib marker edge line size
- size (integer or float)
size of markers
- alpha (float)
marker alpha (transparency) value
- bg_color (color value)
background color of chart if not None
- x_max (integer or float)
high limit of chart x axis
- limit_yax (integer or float)
if True, restrict plot y scale to this value may be used to prevent outliers from exagerating chart scaling
- ylimit (integer or float)
y axis limit if limit_yax is True
- zeroline_color (color value)
color for zeroline on chart
- zeroline_width (integer or float)
width of zeroline
- pos_neg_face (boolean)
if True, apply a light green tint to the chart area above the zero line, and a light red tint below the line
- legend_job_strings (boolean)
if True, use job description strings in legend vs. job numbers
- legend_position (float)
controls the horizontal position of the legend
- legend_marker_size (integer or float)
adjusts the size of the legend markers
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- tick_size (integer or float)
text size of chart tick labels
- xsize, ysize (integer or float)
x and y size of each plot in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
- experimental (boolean)
show additional output under development consisting of a table, heatmap, and bar chart
-
matplotlib_charting.
job_transfer
(dfc, dfb, eg, job_colors, job_levels, job_strs, p_dict, ds_dict=None, gb_period='M', min_date=None, max_date=None, tgt_jobs_list=None, job_alpha=0.85, chart_style='whitegrid', fur_color=None, draw_face_color=False, draw_grid=True, grid_alpha=0.2, zero_line_color='m', ytick_interval=None, y_limit=None, title_size=14, legend_size=12, xsize=14, ysize=9, image_dir=None, image_format='png')[source]¶ plot a differential stacked area chart displaying color-coded job transfer counts over time.
Output chart is actually 2 area charts (one for positive values and one for negative values) displayed on a shared axis.
- inputs
- dfc (dataframe)
proposal (comparison) dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- dfb (dataframe)
baseline dataset; proposal dataset is compared to this dataset, may be a dataframe variable or a string key from the ds_dict dictionary object
- eg (integer)
integer code for employee group
- job_colors (list)
list of colors for job levels, may be value from color dictionary
- job_levels (integer)
number of job levels in data model
- job_strs (list)
list of job descriptions (labels)
- p_dict (dictionary)
dictionary of employee number codes to verbose string description, (normally “p_dict_verbose” from the settings dictionary)
Example:
{0: 'Standalone', 1: 'Acme', 2: 'Southern'}
- ds_dict (dictionary)
output from load_datasets function
- gb_period (string)
group_by period. default is ‘M’ for monthly, other options are ‘Q’ for quarterly and ‘A’ for annual
- min_date (string date format)
if set, analyze job transfer data from this date forward
- max_date (string date format)
if set, analyze job transfer data up to this date
- tgt_jobs_list (list)
if not None, only plot job level(s) in this list
- job_alpha (float)
chart alpha level for job transfer plotting (0.0 - 1.0)
- chart_style (string)
seaborn plotting library style
- fur_color (color code in rgba, hex, or string style)
custom color to signify furloughed employees (otherwise, last color in job_colors input will be used)
- draw_face_color (boolean)
apply a transparent background to the chart, red below zero and green above zero
- draw_grid (boolean)
show major tick label grid lines
- grid_alpha (float)
opacity setting for grid lines (0.0 - 1.0)
- zero_line_color (color value)
color of the horizontal line a zero
- ytick_interval (integer)
optional manual ytick spacing setting (function has auto-spacing built in)
- y_limit (integer)
optional manual y axis chart limit (enter positive value only). This input may be used to “lock” vertical scaling (shut off auto_scaling) for comparing gains and losses between proposals and employee groups.
- title_size (integer or float)
chart title text size
- legend_size (integer or float)
chart legend text size
- xsize (integer or float)
horizontal size of chart
- ysize (integer or float)
vertical size of chart
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
make_color_list
(num_of_colors=10, start=0.0, stop=1.0, exclude=None, reverse=False, cm_name_list=['Set1'], return_list=True, return_dict=False, print_all_names=False, palplot_cm_name=False, palplot_all=False)[source]¶ Utility function to generate list(s) of colors (rgba format), any length and any from any section of any matplotlib colormap.
The function can return a list of colors, a dictionary of colormaps to color lists, plot result(s) as seaborn palplot(s), and print out the names of all of the colormaps available.
The end goal of this function is to provide customized color lists for plotting.
- inputs
- num_of_colors (integer)
number of colors to produce for the output color list(s), used within the cm_subsection data calculation
- start (float)
the starting point within the selected colormap to begin the spectrum color selection (0.0 to 1.0), used within the cm_subsection data calculation
- stop (float)
the ending point within the selected colormap to end the spectrum color selection (0.0 to 1.0), used within the cm_subsection data calculation
- exclude (list)
list of 2 floats representing a section of the colormap(s) to remove before calculating the result list(s).
- reverse (boolean)
reverse the color list order which reverses the color spectrum
- cm_name_list (list)
any matplotlib colormap name(s)
- return_list (boolean)
if True, return a list of rgba color codes for the cm_name_list colormap input only, or (if the return_dict input is set to True) a dictionary of all colormap names to all of the resultant corresponding calculated color lists using the cm_subsection data
- return_dict (boolean)
if True (and return_list is True), return a dictionary of all colormap names to all of the resultant corresponding calculated color lists
- print_all_names (boolean)
if True (and return_list is False), print all the names of available matplotlib colormaps
- palplot_cm_name (boolean)
if True (and return_list is set to False), plot a seaborn palplot of the color list produced with the cm_name_list colormap input using the cm_subsection data
- palplot_all (boolean)
if True (and return_list and palplot_cm_name are False), plot a seaborn palplot for all of the color lists produced from all available matplotlib colormaps using the cm_subsection data
-
matplotlib_charting.
mark_quantiles
(df, quantiles=10)[source]¶ add a column to the input dataframe identifying quantile membership as integers (the column is named “quantile”). The quantile membership (category) is calculated for each employee group separately, based on the employee population in month zero.
The output dataframe permits attributes for employees within month zero quantile categories to be be analyzed throughout all the months of the data model.
The number of quantiles to create within each employee group is selected by the “quantiles” input.
The function utilizes numpy arrays and functions to compute the quantile assignments, and pandas index data alignment feature to assign month zero quantile membership to the long-form, multi-month output dataframe.
This function is used within the quantile_groupby function.
- inputs
- df (dataframe)
Any pandas dataframe containing an “eg” (employee group) column
- quantiles (integer)
The number of quantiles to create.
example:
If the input is 10, the output dataframe will be a column of integers 1 - 10. The count of each integer will be the same. The first quantile members will be marked with a 1, the second with 2, etc., through to the last quantile, 10.
-
matplotlib_charting.
multiline_plot_by_emp
(df, measure, xax, emp_list, job_levels, ret_age, color_list, job_str_list, sdict, attr_dict, ds_dict=None, plot_jobp=False, show_implementation_date=True, through_date=None, pcnt_ylimit=1.0, chart_style='ticks', linewidth=3, line_alpha=0.7, grid_linestyle='dotted', grid_alpha=0.75, legend_size=14, label_size=13, tick_size=13, title_size=18, xsize=12, ysize=9, image_dir=None, image_format='png')[source]¶ select example individual employees and plot career measure from selected dataset attribute, i.e. list percentage, career earnings, job level, etc.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe or a string key with the ds_dict dictionary object
- measure (string)
dataset attribute to plot. Usually only one attribute to plot, but may be more than one, such as ‘jnum’ and ‘jobp’
- xax (string)
dataset attribute for x axis
- emp_list (list)
list of employee numbers or ids
- job_levels (integer)
number of job levels in model
- ret_age (float)
retirement age (example: 65.0)
- color list (list)
list of colors for plotting
- job_str_list (list)
list of string job descriptions corresponding to number of job levels
- sdict (dictionary)
program settings dictionary
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
output of the load_datasets function, dictionary. This keyword argument must be set if a string key is used as the df input.
- plot_jobp (boolean)
if measure input is ‘jnum’, also plot ‘jobp’ if set to True
- show_implementation_date (boolean)
if True and “xax” input is “date”, plot a vertical line at the implementation date
- chart_style (string)
any seaborn plotting style name
- linewidth (integer or float)
width of chart solid lines
- line_alpha (float)
transparency value of the plotted lines (0.0 to 1.0)
- grid_linestyle (string)
matplotlib line style for grid, such as “dotted” or “solid”
- grid_alpha
transparency value for grid (0.0 to 1.0)
- legend_size (integer or float)
text size of chart legend
- label_size (integer or float)
font size of x and y axis labels
- tick_size (integer or float)
font size of chart tick labels
- title_size (integer or float)
font size of chart title
- xsize, ysize (integer or float)
plot size in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
numeric_test
(value)[source]¶ determine if a variable is numeric
returns a boolean value
- input
- value
any variable
-
matplotlib_charting.
parallel
(df_list, dfb, eg_list, measure, month_list, job_levels, eg_colors, dict_settings, attr_dict, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, left=0, stride_list=None, chart_style='whitegrid', grid_color='.7', suptitle_size=14, title_size=12, facecolor='w', xsize=6, ysize=8, image_dir=None, image_format='png')[source]¶ Compare positional or value differences for various proposals with a baseline position or value for selected months.
The vertical lines represent different proposed lists, in the order from the df_list list input.
- inputs
- df_list (list)
list of datasets to compare, may be ds_dict (output of load datasets function) string keys or dataframe variable(s) or mixture of each
- dfb (string or variable)
baseline dataset, accepts same input types as df_list above. The order of the list is reflected in the chart x axis lables
- eg_list (list)
list of employee group integer codes to compare example: [1, 2]
- measure (string)
dataset attribute to compare
- month_list (list)
list of month numbers for analysis. the function will plot comparative data from each month listed
- job_levels (integer)
number of job levels in data model
- eg_colors (list)
list of colors to represent the employee groups
- dict_settings (dictionary)
program settings dictionary generated by the build_program_files script
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
output from load_datasets function
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- left (integer)
integer representing the list comparison to plot on left side of the chart(s). zero (0) represents the standalone results and is the default. 1, 2, or 3 etc. represent the first, second, third, etc. dataset results in df_list input order
- stride_list (list)
optional list of dataframe strides for plotting every other nth result (must be same length and correspond to eg_list)
- grid_color (string)
string name for horizontal grid color
- facecolor (color value)
chart background color
- xsize, ysize (integer or float)
size of individual subplots (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
percent_bins
(eg, base, compare, measure='spcnt', by_year=True, quantiles=20, time_col='date', agg_method='median')[source]¶ Return a tuple of two dataframes containing differential percentage bin counts, one containing positive counts and another containing negative counts.
This function first compares list percentage between two datasets on a grouped time period basis (annual or monthly), then counts the number of employees within specified percentage gain or loss quantiles.
The counts are returned in dataframes with indexes reflecting the quantiles and columns representing the grouped time period.
This function is used in the percent_diff_bins plotting function.
- inputs
- eg (integer)
employee group code
- base (dataframe)
baseline dataframe (dataset) containing a list percentage column
- compare (dataframe)
comparison dataframe (dataset) containing a list percentage column
- measure (string)
dataset percentage attribute column (‘spcnt’ or ‘lspcnt’)
- by_year (boolean)
if True, group employee percentage differentials by year, otherwise by time_col input
- quantiles (integer)
number of quantiles to measure. An input of 20 would translate to quantiles of 5% each (100 / 20).
- time_col (string)
if by_year is False, group percentage differentials by this time unit. Inputs may be “mnum” or “date”.
- agg_method (string)
quantile bin aggregation method. Inputs may be “mean” or “median”
-
matplotlib_charting.
percent_diff_bins
(compare, base, eg, measure='spcnt', kind='bar', quantiles=40, num_display_colors=25, area_xax='date', ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, man_plotlim=None, invert_barh=False, chart_style='ticks', cmap_pos='tab20c', cmap_neg='tab20c', zero_line_color='m', bright_bg=False, bg_color='#ffffe6', title_size=14, legend_size=12.5, xsize=16, ysize=10, image_dir=None, image_format='png')[source]¶ Display employee group counts within differential list percentage bins over time.
Chart style options include bar, barh, and area.
Selectable inputs include the number of percentile bins, chart colors and the number of colors in the color cycle representing the bins.
The analysis groups may be targeted by up to three attribute value filters.
- inputs
- compare (dataframe)
comparison dataframe (dateset)
- base (dataframe)
baseline dataframe (dataset)
- eg (integer)
employee group code
- measure (string)
list percentage attribute for comparison (‘spcnt’ or ‘lspcnt’)
- kind (string)
chart style (‘bar’, ‘barh’, or ‘area’)
- quantiles (integer)
the number of differential percentage bins. If the input is 40, each bin width will be 2.5% (100 / 40)
- num_display_colors (integer)
the number of distinct colors to create from the cmap inputs. If the input is less than the number of bins found for display, the colors display will cycle or repeat as necessary.
- area_xax (string)
attribute to use for the chart when the kind input is set to ‘area’. Inputs may be ‘mnum’ or ‘date’.
- ds_dict (dictionary)
variable assigned to the output of the load_datasets function. This keyword variable must be set if string dictionary keys are used as inputs for the dfc and/or dfb inputs.
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- man_plotlim (integer)
if not None, restrict chart differential axis to this value. Otherwise, limit is set by an algorithm.
- invert_barh (boolean)
If ‘kind’ input is set to ‘barh’, if True, invert the chart y axis
- chart_style (string)
any valid seaborn plotting style name
- cmap_pos (string)
any matplotlib colormap name representing colors to be applied to positive chart values
- cmap_neg (string)
any matplotlib colormap name representing colors to be applied to negative chart values
- zero_line_color (color value)
color to be applied to the chart zero line
- bright_bg (boolean)
if True, color the chart background with the ‘bg_color’ color value
- bg_color (color value)
color to use for the chart background if ‘bright_bg’ is True
- title_size (integer or float)
text size for the chart title
- legend_size (integer or float)
text size for the chart legend
- xsize, ysize (integers or floats)
Width and height of chart in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
pprint_dict
(dct, marker1='#', marker2='', skip_line=True)[source]¶ print the key-value pairs in a horizontal, organized fashion.
- inputs
- dct (dictionary)
the dictionary to print
- marker1, marker2
prefix and suffix for the dictionary key headers
-
matplotlib_charting.
quantile_bands_over_time
(df, eg, measure, bins=20, ds_dict=None, year_clip=None, kind='area', quantile_ticks=False, cm_name='tab20c', chart_style='ticks', quantile_alpha=0.75, grid_alpha=0.4, custom_start=0.0, custom_finish=1.0, alt_bg_color=False, bg_color='#faf6eb', legend_size=13, label_size=13, xsize=14, ysize=8, image_dir=None, image_format='png')[source]¶ Visualize quantile distribution for an employee group over time for a selected proposal.
This chart answers the question of where the different employee groups will be positioned within the seniority list for future months and years.
Note: this is not a comparative study. It is simply a presentation of resultant percentage positioning.
The chart contains a background grid for reference and may display quantiles as integers or percentages, using a bar or area type display, and includes several chart color options.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- eg (integer)
employee group number
- measure (string)
a list percentage input, either ‘spcnt’ or ‘lspcnt’
- bins (integer)
number of quantiles to calculate and display
- ds_dict (dictionary)
output from load_datasets function
- year_clip (integer)
maximum year to display on chart (requires ‘clip’ input to be True)
- kind (string)
type of chart display, either ‘area’ or ‘bar’
- quantile_ticks (boolean)
if True, display integers along y axis and in legend representing quantiles. Otherwise, present percentages.
- cm_name (string)
colormap name (string), example: ‘Set1’
- chart_style (string)
style for chart output, any valid seaborn plotting style name
- quantile_alpha (float)
alpha (opacity setting) value for quantile plot
- grid_alpha (float)
opacity setting for background grid
- custom_start (float)
custom colormap start level (a section of a standard colormap may be used to create a custom color mapping)
- custom_finish (float)
custom colormap finish level
- alt_bg_color (boolean)
if True, set the background chart color to the bg_color input value
- bg_color (color value)
color for chart background if ‘alt_bg_color’ is True (string)
- legend_size (integer or float)
text size for chart legend
- label_size (intger or float)
text size for chart x and y axis labels
- xsize, ysize (integer or float)
chart size inputs in inches (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
quantile_groupby
(dataset_list, eg_list, measure, quantiles, eg_colors, band_colors, settings_dict, attr_dict, job_dict, groupby_method='median', xax='date', ds_dict=None, num_cat_order_yticks=10, through_date=None, verbose_title=True, plot_total=True, show_job_bands=True, show_grid=True, plot_implementation_date=True, draw_reserve_levels=False, custom_color=False, cm_name='Set1', start=0.0, stop=1.0, exclude=None, reverse=False, chart_style='whitegrid', remove_ax2_border=True, line_width=1, use_dashed_line_compare=True, bg_color='.98', job_bands_alpha=0.15, line_alpha=0.7, grid_alpha=0.3, title_size=14, tick_size=12, label_size=13, label_pad=110, xsize=12, ysize=10, image_dir=None, image_format='png')[source]¶ Plot representative values of a selected attribute measure for each employee group quantile over time.
Multiple employee groups may be plotted at the same time. Job bands may be plotted as a chart background to display job level progression when the measure input is set to “cat_order”.
Two data models may be plotted and compared on the same chart. Only the first employee group found within the eg_list input will be compared when plotting more than one dataset.
Example use case: plot the average job category rank of each employee quantile group, from the start date though the life of the data model.
The quantile group attribute may be analyzed with any of the following methods:
[mean, median, first, last, min, max]
If the eg_list input list contains a single employee group code and the custom_color input is set to “True”, the color of the plotted quantile result lines will be a spectrum of colors. The following inputs are related to the custom color generation:
[cm_name, start, stop, exclude, reverse]
The above inputs will be used by the make_color_list function located within this module to produce a list of colors with a length equal to the quantiles input. (Please see the docstring for the make_color_list function for further explaination). If the quantiles input is set to a relatively high value (100-200), the impact on the career profiles of the employee groups is easily discernible when using a qualitative color map.
- inputs
- dataset_list (dataframes)
A list of long-form dataframes, each of which contains “date” (and “mnum” if xax input is set to “mnum”) and “eg” columns and at least one attribute column for analysis. The normal input is a list of calculated datasets with many attribute columns. The list may only hold one or two datasets.
- eg_list (list)
List of eg (employee group) codes for analysis. The order of the employee codes will determine the z-order of the plotted lines, last employee group plotted on top of the others.
- measure (string)
Attribute column name
- quantiles (integer)
The number of quantiles to create and plot for each employee group in the eg_list input.
- eg_colors (list)
list of color values for plotting the employee groups
- band_colors (list)
list of color values for plotting the background job level color bands when the using a measure of ‘cat_order’ with the ‘show_job_bands’ variable set to True
- settings_dict (dictionary)
program settings dictionary generated by the build_program_files script
- attr_dict (dictionary)
dataset column name description dictionary
- job_dict (dictionary)
dictionary containing basic to enhanced job level conversion data. This is likely the settings dictionary “jd” value.
- groupby_method (string)
The method applied to the attribute data within each quantile. The allowable methods are listed in the description above. Default is ‘median’.
- xax (string)
The first groupby level and x axis value for the analysis. This value defaults to “date” which represents each month of the model. Alternatively, “mnum” may be used.
- ds_dict (dictionary)
A dictionary containing string to dataframes, used if df input is not a dataframe but a string key (examples: ‘standalone’, ‘p1’)
- num_cat_order_yticks (int)
approiximate number of y axis ticks to display on the lefthand side of the chart when “cat_order” is selected as the “measure” input. The actual number of ticks displayed will be adjusted to match an optimal numerical interval between tick values. This input does not have a linear relationship with the output and may require a significant input change to affect the chart display.
- through_date (date string)
If set as a date string, such as ‘2020-12-31’, only show results up to and including this date.
- verbose_title (boolean)
If True, chart title will use the long descriptions for each employee group from the settings.xlsx input file, proposal_dictionary worksheet. Otherwise, the eg number codes will be used in the title
- plot_total (boolean)
If True, plot a dotted gray line representing the total count of active pilots over time (only when “measure” input is set to “cat_order” and “show_job_bands” input is True)
- show_job_bands
If measure is set to “cat_order”, plot properly scaled job level color bands on chart background
- show_grid (boolean)
If True, plot a grid on the chart
- plot_implementation_date
If True and the xax argument is set to “date”, plot a dashed vertical line at the implementation date.
- draw_reserve_levels (boolean)
If True and basic job levels have been selected via the settings.xlsx “scalars” worksheet, “enhanced jobs” setting, draw a horizontal red dashed line within each basic job category level representing the upper limit of reserve status
- custom_color (boolean)
If set to True, will permit a custom color spectrum to be produced for plotting a single employee group “cat_order” result (color map is selected with the cm_name input)
- cm_name (string)
The colormap name to be used for the custom color option
- start (float)
The starting point of the colormap to begin a custom color list generation (0.0 to less than 1.0)
- stop (float)
The ending point of the colormap to finish a custom color list generation (greater than 0.0 to 1.0)
- exclude (list)
A list of 2 floats between 0.0 and 1.0 describing a section of the original colormap to exclude from a custom color list generation. (Example [.45, .55], the middle of the list excluded)
- reverse (boolean)
If True, reverse the sequence of the custom color list
- chart_style (string)
set the chart plot style for ax1 from the avialable seaborn plotting themes:
[“darkgrid”, “whitegrid”, “dark”, “white”, and “ticks”]
The default is “whitegrid”.
- remove_ax2_border (boolean)
if “cat_order” is set as the measure input and the show_job_bands input is set True, a second axis is generated to be the container for the job level labels. The chart style for ax2 is “white” which avoids unwanted grid lines but includes a black solid chart border by default. This ax2 border may be removed if this input is set to True. (The border may be displayed if the chart_style input (for ax1) is set to “white” or “ticks”).
- line_width (float)
The width of the plotted lines. Default is .75
- use_dashed_line_compare (boolean)
If True, when comparing output from 2 datasets, plot the second dataset output with a dashed line, otherwise use a solid line
- bg_color (color value)
The background color for the chart. May be a color name, color abreviation, hex value, or decimal between 0 and 1 (shades of black)
- job_bands_alpha (float)
If show_job_bands input is set to True and measure is set to “cat_order”, this input controls the alpha or transparency of the background job level bands. (0.0 to 1.0)
- line_alpha (float)
Transparency value of plotted lines (0.0 to 1.0)
- grid_alpha (float)
Transparency value of grid lines (0.0 to 1.0)
- title_size (integer or float)
Font size value for title
- tick_size (integer or float)
Font size value for chart tick (value) labels
- label_size (integer or float)
Font size value for x and y unit labels
- xsize, ysize (integers or floats)
Width and height of chart in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
quantile_years_in_position
(dfc, dfb, job_levels, num_bins, job_str_list, p_dict, color_list, style='bar', plot_differential=True, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, chart_style='darkgrid', grid_alpha=None, custom_color=False, cm_name='Dark2', start=0.0, stop=1.0, fur_color=None, flip_x=False, flip_y=False, rotate=False, gain_loss_bg=False, bg_alpha=0.05, normalize_yr_scale=False, year_clip=30, suptitle_size=14, title_size=12, xsize=12, ysize=12, image_dir=None, image_format='png')[source]¶ stacked bar or area chart presenting the time spent in the various job levels for quantiles of a selected employee group.
- inputs
- dfc (string or dataframe variable)
text name of proposal (comparison) dataset to explore (ds_dict key) or dataframe
- dfb (string or dataframe variable)
text name of baseline dataset to explore (ds_dict key) or dataframe
- job_levels (integer)
the number of job levels in the model
- num_bins (integer)
the total number of segments (divisions of the population) to calculate and display
- job_str_list (list)
a list of strings which correspond with the job levels, used for the chart legend example: jobs = [‘Capt G4’, ‘Capt G3’, ‘Capt G2’, ….]
- p_dict (dictionary)
dictionary used to convert employee group numbers to text, used with chart title text display
- color_list (list)
a list of color codes for the job level color display
- style (string)
option to select ‘area’ or ‘bar’ to determine the type of chart output. default is ‘bar’.
- plot_differential (boolean)
if True, plot the difference between dfc and dfb values
- ds_dict (dictionary)
variable assigned to the output of the load_datasets function. This keyword variable must be set if string dictionary keys are used as inputs for the dfc and/or dfb inputs.
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- chart_style (string)
any valid seaborn plotting style name
- custom_color, cm_name, start, stop (boolean, string, float, float)
if custom color is set to True, create a custom color map from the cm_name color map style. A portion of the color map may be selected for customization using the start and stop inputs.
- fur_color (color code in rgba, hex, or string style)
custom color to signify furloughed employees (otherwise, last color in color_list input will be used)
- flip_x (boolean)
‘flip’ the chart horizontally if True
- flip_y (boolean)
‘flip’ the chart vertically if True
- rotate (boolean)
transpose the chart output
- gain_loss_bg (boolean)
if True, apply a green and red background to the chart in the gain and loss areas
- bg_alpha (float)
the alpha of the gain_loss_bg (if selected)
- normalize_yr_scale (boolean)
set all output charts to have the same x axis range
- yr_clip (integer)
max x axis value (years) if normalize_yr_scale set True
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- xsize, ysize (integer or float)
size of chart display
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples: ‘svg’, ‘png’
-
matplotlib_charting.
rows_of_color
(df, mnum, measure_list, eg_colors, jnum_colors, dict_settings, ds_dict=None, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, cols=150, eg_list=None, job_only=False, jnum=1, shrink_to_fit=False, cell_border=True, eg_border_color='.2', job_border_color='.2', chart_style='whitegrid', fur_color=None, empty_color='#737373', suptitle_size=14, title_size=12, legend_size=14, xsize=15, ysize=9, image_dir=None, image_format='png')[source]¶ plot a heatmap with the color of each rectangle representing an employee group, job level, or status.
This chart will show a position snapshot indicating the distribution of employees within the entire population, employees holding a certain job, or a combination of the two.
For example, all employees holding a certain job in month 36 may be plotted with original group delineated by color. Or, all employees from one group may be shown with the different jobs for that group displayed with different colors.
Also will display any other category such as a special group such as furloughed employees. Input dataframe must have a numerical representation of the selected measure, i.e. furloughed indicated by a 1, and others with a 0.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- mnum (integer)
month number of dataset to analyze
- measure_list (list)
list form input, ‘categorical’ only such as employee group number or job number, such as [‘jnum’], or [‘eg’] [‘eg’, ‘fur’] is also valid when highlighting furloughees
- eg_colors (list)
colors to use for plotting the employee groups. the first color in the list is used for the plot ‘background’ and is not an employee group color
- jnum_colors (list)
job level plotting colors, list form
- ds_dict (dictionary)
output from load_datasets function
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- cols (integer)
number of columns to construct for the heatmap plot
- eg_list (list)
employee group integer code list (if used), example: [1, 2]
- job_only (boolean)
if True, plot only employees holding the job level identified with the jnum input
- jnum (integer)
job level distribution to plot if job_only input is True
- shrink_to_fit (boolean)
if True, adjust the size of the heatmap to match the size of the filtered monthly data. If False, maintain the number of cells in the heatmap to be equal to the starting size of the employee population
- cell_border (boolean)
if True, show a border around the heatmap cells
- eg_border_color (color value)
color of cell border if measure_list includes ‘eg’ (employee group)
- job_border_color (color value)
color of cell border when plotting job information
- chart_style (string)
underlying chart style, any valid seaborn chart style (string)
- fur_color (color code in rgba, hex, or string style)
custom color to signify furloughed employees (otherwise, last color in jnum_colors input will be used)
- empty_color (color value)
cell color for cells with no data
- suptitle_size (integer or float)
text size of chart super title
- title_size (integer or float)
text size of chart title
- legend_size (integer or float)
text size of chart legend
- xsize, ysize (integer or float)
size of chart in inches (width, height)
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
single_emp_compare
(emp, measure, df_list, xax, job_strs, eg_colors, p_dict, job_levels, attr_dict, ds_dict=None, chart_style='whitegrid', standalone_color='#ff00ff', title_size=14, tick_size=12, label_size=13, legend_size=14, xsize=12, ysize=8, image_dir=None, image_format='png')[source]¶ Select a single employee and compare proposal outcome using various calculated measures.
- inputs
- emp (integer)
empkey for selected employee
- measure (string)
calculated measure to compare examples: ‘jobp’ or ‘cpay’
- df_list (list)
list of calculated datasets to compare
- xax (string)
dataset column to set as x axis
- job_strs (list)
string job description list
- eg_colors (list)
list of colors to be assigned to line plots
- p_dict (dictionary)
dictionary containing eg group integer to eg string descriptions
- job_levels (integer)
number of jobs in the model
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
output from load_datasets function
- chart_style (string)
any valid seaborn plotting style
- standalone_color (color value)
color of standalone plot (This function assumes one proposal from each group, any additional proposal is assumed to be standalone)
- title_size (integer or float)
text size of chart title
- tick_size (integer or float)
text size of chart tick labels
- label_size (integer or float)
text size of x and y axis chart labels
- legend_size (integer or float)
text size of chart legend
- xsize, ysize (integer or float)
width and height of output chart in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
slice_ds_by_filtered_index
(df, ds_dict=None, mnum=0, attr='age', attr_oper='>=', attr_val=50)[source]¶ filter an entire dataframe by only selecting rows which match the filtered results from a target month. In other words, zero in on a slice of data from a particular month, such as employees holding a specific job in month 25. Then, using the index of those results, find only those employees within the entire dataset as an input for further analysis within the program.
The output may be used as an input to a plotting function or for other analysis. This function may also be used repeatedly with various filters, using output of one execution as input for another execution.
- inputs
- df (dataframe, can be proposal string name)
the dataframe (dataset) to be filtered
- ds_dict (dictionary)
A dictionary containing string to dataframes, used if ds_def input is not a dataframe
- mnum (integer)
month number of the data to filter
- attr (string)
attribute (column) to use during filter
- oper (string)
operator to use, such as ‘<=’ or ‘!=’
- attr_val (integer, float, date as string, string (as appropriate))
attr1 limiting value (combined with oper) as string
- Example filter:
jnum >= 7 (in mnum month)
-
matplotlib_charting.
stripplot_dist_in_category
(df, job_levels, full_time_pcnt, eg_colors, band_colors, job_strs, attr_dict, p_dict, ds_dict=None, rank_metric='cat_order', mnum=None, attr1=None, oper1='>=', val1='0', attr2=None, oper2='>=', val2='0', attr3=None, oper3='>=', val3='0', bg_alpha=0.12, fur_color=None, show_part_time_lvl=True, size=3, alpha=1, title_size=14, label_pad=110, label_size=13, tick_size=12, xsize=4, ysize=12, image_dir=None, image_format='png')[source]¶ visually display employee group distribution concentration within accurately sized job bands for a selected month.
This chart reveals how evenly or unevenly the employee groups share the jobs available within each job category.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- job_levels (integer)
number of job levels in the data model
- full_time_pcnt (float)
percentage of each job level which is full time
- eg_colors (list)
list of colors for eg plots
- band_colors (list)
list of colors for background job band colors
- job_strs (list)
list of job strings for job description labels
- attr_dict (dictionary)
dataset column name description dictionary
- p_dict (dictionary)
eg to group string label
- ds_dict (dictionary)
output from load_datasets function
- rank_metric (string)
rank attribute (currently only accepts ‘cat_order’)
- mnum (integer)
month number - if not None, analyze data from this month
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- bg_alpha (float)
color alpha for background job level color
- fur_color (color code in rgba, hex, or string style)
custom color to signify furloughed job band area (otherwise, last color from band_colors list will be used)
- show_part_time_lvl (boolean)
if True, draw a line within each job band representing the boundry between full and part-time jobs when using a basic jobs only data model (set this input to False when using an enhanced job data model)
- size (integer or float)
size of density markers
- alpha (float)
alpha of density markers (0.0 to 1.0)
- title_size (integer or float)
text size of chart title
- label_size (integer or float)
text size of x and y descriptive labels
- tick_size (integer or float)
text size of x and y tick labels
- xsize, ysize (integer or float)
width and height of chart in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
stripplot_eg_density
(df, mnum, eg_colors, ds_dict=None, mnum_order=True, attr1=None, oper1='>=', val1=0, attr2=None, oper2='>=', val2=0, attr3=None, oper3='>=', val3=0, dot_size=3, chart_style='whitegrid', bg_color='white', title_size=12, suptitle_size=14, xsize=5, ysize=10, image_dir=None, image_format='png')[source]¶ plot a stripplot showing density distribution for non-retired employees for each employee group separately at the selected month. The stripplot displays remaining employees positioned according to the selected month or initial month integrated list order (controlled by the “mnum_order” input).
Note: To analyze job category distribution density, use the “stripplot_dist_in_category” plotting function.
The input dataframe (df) may be a dictionary key (string) or a pandas dataframe.
The input dataframe may be filtered by attributes using the attr(x), oper(x), and val(x) inputs.
- inputs
- df (string or dataframe)
text name of input proposal dataset, also will accept any dataframe variable (if a sliced dataframe subset is desired, for example) Example: input can be ‘proposal1’ (if that proposal exists, of course, or could be df[df.age > 50])
- mnum (integer)
view data for employees remaining (not yet retired) within this data model month number
- eg_colors (list)
color codes for plotting each employee group
- ds_dict (dictionary)
output from load_datasets function
- mnum_order (boolean)
if True, plot list position in month selected with the “mnum” input, otherwise plot according to initial integrated list position
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (integer, float, date as string, string (as appropriate))
attr(n) limiting value (combined with oper(n)) as string
- dot_size (integer or float)
size of stripplot markers
- bg_color (color value)
chart background color
- title_size (integer or float)
chart title text size
- suptitle_size (integer or float)
chart text size of suptitle
- xsize, ysize (integer or float)
size of chart width and height in inches
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’
-
matplotlib_charting.
to_percent
(decimal, position, precision=0)[source]¶ Custom format for matplotlib axis as a percentage.
Ignores the passed in position variable. This has the effect of scaling the default tick locations.
- inputs
- decimal (axis values)
no user input
- position
ignored
- precision (integer)
number of decimals in output percentage labels
-
matplotlib_charting.
violinplot_by_eg
(df, measure, ret_age, cdict, attr_dict, ds_dict=None, mnum=0, linewidth=1.5, attr1=None, oper1='>=', val1='0', attr2=None, oper2='>=', val2='0', attr3=None, oper3='>=', val3='0', scale='count', saturation=1.0, title_size=12, chart_style='darkgrid', xsize=12, ysize=10, image_dir=None, image_format='png')[source]¶ From the seaborn website: Draw a combination of boxplot and kernel density estimate.
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.
- inputs
- df (dataframe)
dataset to examine, may be a dataframe variable or a string key from the ds_dict dictionary object
- measure (string)
attribute to plot
- ret_age (float)
retirement age (example: 65.0)
- cdict (dictionary)
color dictionary for plotting palatte
- attr_dict (dictionary)
dataset column name description dictionary
- ds_dict (dictionary)
output from load_datasets function
- mnum (integer)
month number to analyze
- linewidth (integer or float)
width of line surrounding each violin plot
- attr(n) (string)
filter attribute or dataset column as string
- oper(n) (string)
operator (i.e. <, >, ==, etc.) for attr(n) as string
- val(n) (string, integer, float, date as string as appropriate)
attr(n) limiting value (combined with oper(n)) as string
- scale (string)
From the seaborn website: The method used to scale the width of each violin. If ‘area’, each violin will have the same area. If ‘count’, the width of the violins will be scaled by the number of observations in that bin. If ‘width’, each violin will have the same width.
- saturation (float)
Proportion of the original color saturation. Large patches often look better with slightly desaturated colors, but set this to 1.0 if you want the plot colors to perfectly match the input color spec.
- title_size (integer or float)
text size of chart title
- image_dir (string)
if not None, name of a directory in which to save an image of the chart output. If the directory does not exist, it will be created.
- image_format (string)
file extension string for a saved chart image if the image_dir input is not None
Examples:
‘svg’, ‘png’