October 8th, 2018
This version adds functionality within many of the scripts and plotting functions, updates the plotting functions for compatibility with matplotlib 3.0, adjusts the editor tool code for compatibility with the bokeh plotting library, and corrects a few bugs.
Script and non-plotting functions updates:
- modify build_program_files.py script to allow edited list order from proposals.xlsx to be constructed properly with a “new_order” column vs an “idx” column
- modify compute_measures.py script to accept edited proposal orderings from proposals.xlsx
- update reports.py script functions retirement_charts and annual_charts to be compatible with matplotlib 2.2 (this prevents the previous behavior of automatic plotting of the final calculated charts within jupyter notebook)
- corrected bug in build_program_files.py script when using basic jobs (non-enhanced)
- update comment cells in RUN_SCRIPTS.ipynb notebook
- update the anon_master and anon_pay_table functions (in the functions module) to use the “sheet_name” keyword parameter with pandas read_excel functions. this is due to a revision within pandas
- add docstring to hex_dict function
- remove ipywidgets from program requirements
Plotting function updates:
- improve quantile_groupby plotting function. Now two datasets may be compared for the same employee group. Update STATIC_PLOTTING.ipynb notebook with correct variable inputs and new plotting example. Add chart example to documentation gallery.
- update stripplot_eg_density plotting function (removed “attr_dict” input and improved chart title labels)
- update quantile_groupby plotting function (add “verbose_title option, add “plot_total” option, correct bug when “through_date” input was greater than maximum data model date)
- update job_transfer plotting function so that title shows verbose employee group name instead of an employee group code number
- update stripplot_eg_density plotting function to permit display of list order relative to selected month order or initial integrated list order and also improve the chart labels
- enhance title display in the quantile_years_in_position plotting function
- add code to handle situation when filtering results are an empty dataset in the differential_scatter plotting function
- update quantile_groupby plotting function to include auto-yscale tick spacing when “cat_order” is selected for the “measure” input. This prevents the plotting library from picking random tick spacing.
Editor tool updates:
- update animate function callbacks within the editor_function to align with change in bokeh api version 0.12.17+
- adjust height of bokeh textinput widget within the editor_function.py module to less than optimal height to maintain usability. The bokeh textinput widget is missing functionality for proper sizing. When the functionality is implemented, the txt_height variable will be readjusted.
- remove the global variable from callbacks within the editor_function and the bk_basic_interactive function and replace with a new class object
- update editor tool layout spacers
- refactor editor tool periodic_callback code for compatibiltiy with bokeh update
April 18th, 2018
This a minor update with changes for compatibility with matplotlib 2.2 and minor code tweaks to allow a wider range of user scenarios.
- change references to “Vega20c” matplotlib colormap to “tab20c”
- change matplotlib tick parameters from “on” and “off” to “True” and “False”
- add ax.margins(x=0) to plotting code where needed
- update build_program_files.py to allow cases without any furloughed employees
- update contract_year_and_raise function to allow compenstation data without any pay exceptions
- update distribute function
- update group_average_and_median plotting function to permit proper plotting when default job level scaling interval is less than one
February 26th, 2018
This update refactored the job assignment routine used when a ratio condition is applied, added a time in job differential study to the reports module, and applied miscellaneous code and docstring cleanup.
Users may elect to capture an existing job distribution ratio (between the employee groups) to be applied during the effective condition time period for both capped and unrestricted ratio job assignment. The input spreadsheet settings.xlsx “ratio cond” and “ratio_cond_capped_count” worksheets now contain an additional column (“snapshot”) for selecting this option. The “excel input files” section of the documentation has been updated. Code changes related to the new ratio job assignment routine:
- update set_snapshot_weights function
- update assign_cond_ratio function
- update distribute function
- remove assign_cond_ratio_capped function
- add eg_quotas function
- update build_program_files script
- update converter script
- refactor remove_zero_groups function
This version adds to the built-in reporting capability of seniority_list with the new job_diff_to_excel function. The function will calculate the time difference (in months) each employee would spend in each job level between data models. The results are presented as a formatted spreadsheet stored within the reports folder. The hex_dict function was added to support the formatting requirements for the spreadsheet output.
The NumPy “in1d” function has been replaced with the NumPy “isin” function throughout as recommended by NumPy.
Hard code used during development was removed/updated within the violinplot_by_eg and the eg_multiplot_with_cat_order functions.
Some formatting of function docstrings was updated to improve the output format of the web and pdf documentation.
January 12th, 2018
The documentation has now been updated for the new editor tool and the old version of the editor has been removed.
A new interactive_plotting.py module has been added to the program, along with a companion INTERACTIVE_PLOTTING.ipynb notebook file. Only one interactive chart is included at this point.
Revision highlights include:
- the editor zone delineation for each chart area has been changed from a bokeh rect glyph to a box annotation. The vertical spread of the zone will now alway extend to the limits of the chart areas
- a correction was made to the edit zone cursor line conversion calculation when using a “running” xtype x axis
- the “proposal” dropdown selection on the “proposal_save” panel will now automatically change to “edit” when a squeeze is performed
- added styling control for the edit zone
- added code to handle data model months with no data when extra filters have been applied
- renamed the PLOTTING notebook to STATIC_PLOTTING to accomodate the new INTERACTIVE_PLOTTING notebook
December 23rd, 2017
The editor tool has been completely rewritten and is now implemented as a local web server application within the notebook using the Bokeh plotting library. This first release version is now included with the program but is not yet supported with documentation. A revised user guide will be forthcoming soon. The documentation related to the editor tool will be incrementally revised over the next several weeks. Much of the current documention can be applied to the new tool.
Other improvements with this revision include:
- updated assign_standalone_job_changes function
- fixed old editor tool display functionality following ipywidgets update, though performance when using the cursor sliders is less than ideal
- changed all pandas “read_excel” parameters from “sheetname” to “sheet_name” for compatibility with future versions of pandas
- added editor_dict to the build_program_files.py script which provides initial values for the new editor tool display and will store editor tool values during and between sessions
- added convert_to_hex function which converts rgba values (such as those produced by the make_color_list function) to string hex color values
- added the find_nearest and cross_val functions for use with the editor tool p1 and p2 cursor equivalent position feature (p1 and p2 are the bokeh chart figures)
September 25th, 2017
This update includes coding updates which improve the computational efficiency of the program, resulting in a 10-15% reduction in the time required to compute a dataset.
General changes were made through entire code base to increase computational speed wherever possible:
- numpy.arange() to range()
- numpy.sum(<condition>) to numpy.count_nonzero(<condition>)
- numpy.array(dataframe_column) to dataframe_column.values
- max(array) to array.max()
Applied fast numba jit (just in time compiling) to the following refactored functions:
Replaced standard numpy expressions used for job counting and job count column assignment with two new numba-optimized functions:
Improved the performance of the following functions through the use of line profiling and refactoring:
Updated the standalone.py script to use the create_snum_and_spcnt_arrays function for faster generation of the snum, spcnt, lnum, and lscpnt columns.
Other improvements were made to the program which are not related to reducing computation time:
Added the find_squeeze_vals function and incorporated it within the editor tool. The new function permits editor squeezing (a visual exercise based on displayed data) when future month data is displayed to the user. Future month cursor line postion is converted to the equivalent original list positions for use within the squeeze algorithm.
Added an experimental section to the job_time_change plotting function. The PLOTTING notebook was updated accordingly.
Removed the no longer used “orig” output from assign_jobs_nbnf_job_changes function.
Changed code reference from “qtr” to the semantically correct “qntl” for use within the summary reports charts output.
Restored “full_flush” job assignement functionality with updates to the assign_jobs_full_flush_job_changes function.
Added a sort routine to the eg_count settings dictionary value creation routine within the build_program_files script to ensure continuity with other program calculations.
Removed functions which have been superceded and are no longer used:
August 23rd, 2017
This update includes a major editor tool upgrade.
added editor tool absolute value display
Previously, only a differential comparison of attribute values between a baseline and comparative dataset was possible. Now the actual values, initially from the comparative and then the edited dataset (after the first edit), may be displayed. This option allows the user to directly analyze the distribution of equity and opportunity within the merged operation of integration proposals.
added editor tool additional display filtering
The user may now show only results for targeted subsets of the merged population, allowing rapid analysis of certain list attribute cohorts. For example, this feature permits additional outcome evaluation for employees who may have limited years remaining in their careers or employees belonging to a special job assignment category.
extensive updates to the editor tool documentation and the editor tool function docstring
updeated EDITOR_TOOL notebook to incorporate the new editor tool functionality
added find_index_val function to functions module
improved excel input file documentation
- added sections on job level hierarchy and the “hours” worksheet preparation, both within the “pay_tables.xlsx format guide”.
June 21st, 2017
editor tool stylistic update
- replaced the independent “junior” and “senior” slider controls with a single, easier to use range selector slider tool
- increased the width of the sliders for easy value selection
- applied a “flex” sizing method to the controls which allows the tool to auto-adjust the width of the controls to match the available screen size
- various other styling added
May 23rd, 2017
new dataset reports capability
This update includes a new reports module. General statistics may be generated quickly for all calculated datasets, providing a broad overview of how each proposed integrated list will affect employees from each work group. This process provides useful absolute and comparative information for targeted attributes. The statistics are converted to excel spreadsheets and chart images, stored within the reports folder.
Data is produced for the targeted metrics both at retirement and on an annual basis.
The charts are smaller and of lower image quality than the charts produced by the dedicated plotting functions included with seniority_list. This is done to reduce the time required to generate the hundreds of charts in the output. If the user desires better quality charts for the general overview charts, a larger chart size may be designated through a function input.
added a new “quick report” section to the documentation covering the new reporting capability
added a new example REPORTS notebook to the program. This notebook provides code examples for the new reporting capability and will generate summary spreadsheets and chart images for the current case study when it is executed.
updated the ds_dictionary creation routine - output is now dataset name/dataset key-value pair vs. the previous dataset name/(dataset, dataset name) tuple dictionary values.
May 13th, 2017
- combined career_months_df_in and career_months_list_in into one function, career_months
- add convert_to_datetime function
- add pcnt_format function and update plotting code to incorporate the change
- improve code relating to saving chart images
- consolidate “imp_date” and “implementation_date” references
- update the code that groups data according to “empkey” attribute due to a version change in the pandas library
- update pandas “parallel coordinates” import due to a version change in the pandas library
- add the eg_attributes plotting function. This function replaces the multiline_plot_by_eg plotting function. This new function is able to plot any attribute (including date attributes) on either the x or y axis and introduces quantile membership lines and bands.
- remove multiline_plot_by_eg plotting function and eval_strings function
- docstring updates throughout
April 30th, 2017
Improvements with charts, plotting:
- updated the multiline_plot_by_eg, multiline_plot_by_emp, job_level_progression, and quantile_years_in_position plotting functions
- numerous updates and improvements to chart styling control for many plotting functions
Expanded pay exception capability:
- refactor contract_year_and_raise fuction to permit any number of pay exception periods
- add new “pay_exceptions” worksheet in the settings.xlsx input file
- update make_skeleton.py script to use the new pay_exceptions method
New anonymizing functions:
added capability to anonymize input data with the following new functions:
Each of the above functions generates random substitute data for the related input data column. These “helper” functions were combined into the following functions which can anonymize the master.xlsx and pay_tables.xlsx files all at once, inplace.
New sampling ability:
- added the sample_dataframe function, which returns a random sample of a dataframe (by rows), with the quantity of rows selected by the user
New excel-related functions:
- added downloadable pdf version of the program documentation
- formatted function definitions for proper presentation within the pdf document
Program coding improvement:
- added “if __name__ == “__main__”:” execution protection to all scripts
There were some older developemental files and references to settings remaining within the code base that were not needed any longer.
- removed several developemental functions
- remove several items from the settings dictionary
- remove several rows from the “scalars” worksheet within the settings.xlsx file
April 19th, 2017
This update version focused on updating the visualization capabilities of seniority_list.
refactor job_transfer plotting function for speed and added features
- updated function is approximately 25 times faster
- added ability to plot only targeted job level(s)
- new y scale limit option
- new min and max date options
add new percent_bins function and corresponding percent_diff_bins plotting function
- plots count of employees in list percentile change bins over time
add new cohort_differential plotting function
- analyze differences between list locations for employees with equivalent attribute values but from different groups
add code to all notebooks for an automatic wide display
update multiline_plot_by_emp plotting function to permit simultaneous display of “jnum” and “jobp” attributes
update multiline_plot_by_eg plotting function to permit plotting of values at retirement for all employees
add ability to plot individual employee progression lines with job_count_bands plotting function
update all plotting function code to matplotlib object-oriented style
update many plotting function chart legend generation routines
add capability to save charts as images (including SVG format)
update PLOTTING notebook to incorporate new plotting functions/features
April 1st, 2017
remove “example_chart” option from plotting functions
add exception types to most try/except blocks throughout program
remove “master_name” argument from join_inactives.py script
update join_inactives.py script to permit input from editor tool output list order
update assign_jobs_nbnf_job_changes function:
- reduce the number of arguments for the main integrated job assignment function
- add job table dictionary to the function arguments
- eliminate the “this_job_col” variable within monthly loop
reduce and simplify the arguments for the assign_standalone_job_changes function, and use settings dictionary and job table dictionary as arguments
add the add_zero_col function to the functions module. This function will add a column of zeros as the first column of a 2D numpy array
move the code to generate the dict_job_tables.pkl dictionary file from the make_skeleton.py script to the build_program_files.py script for consistency with other generated files
add a section within the build_program_files.py script to create a loop_check array. This boolean array will prevent unnecessary looping during the job assignment routine when all remaining employees have already been assigned. Reduces “Sample3” dataset generation times by approximately 5%.
update RUN_SCRIPTS and PLOTTING notebooks
March 20th, 2017
This update improved the flexibility of the ratio-based conditional job assignment routines. Inputs for these routines are now designated on individual worksheets within the settings.xlsx input file. Conditions may include any combination of jobs, weightings, and employee groupings.
refactor build_program_files.py script:
- change ranges relating to month time spans to sets vs ranges
- remove references to condition durations, month ranges as sets have replaced these inputs
- add new dictionary generation routine used with input from the ratio_cond worksheet in settings.xlsx.
- remove code related to count_cond, ratio_cond, and quota_dict.
update converter.py to handle the basic to enhanced conversion of new ratio-condition related dictionaries and remove code no longer needed.
eliminate many arguments for the assign_jobs_nbnf_job_changes function and replace with a settings dictionary argument.
refactor variable preparation sections within the assign_jobs_nbnf_job_changes function for use with the new dictionaries and month sets loaded from the settings dictionary when ratio-based conditions are selected.
refactor the assign_cond_ratio_capped and assign_cond_ratio job assignment functions. The new functions are simpler and more flexible in terms of inputs. Both functions accept a new dictionary argument, built from input worksheets which have been reformatted.
refactor the set_ratio_cond_dict function and rename it as set_snapshot_weights. The function modifies the weightings within the ratio_dict dictionary for all jobs at once to match existing job counts for a target month.
add a “cap” argument to the distribute function. The cap argument allows the function to be used within a ratio count-capped conditional job assignment routine.
modify the distribute_vacancies_by_weights function for simplicity and precision. This function is no longer used and may be removed at a future date.
the quota_dict and count_ratio_condition worksheets were removed from the settings.xlsx input file. These worksheets were replaced with the new ratio_count_capped_cond worksheet.
the format of the ratio_cond worksheet in settings.xlsx was updated for use with the new assign_cond_ratio function.
The job table generation has now been centralized within the make_skeleton.py script. The job tables are now stored as a dictionary within the dill folder permitting one-time calculation and universal program access.
- add create job tables routine to make_skeleton.py and store tables as a dictionary, dill/dict_job_tables.pkl. Additionally, the j_changes and jcnts_arr variables are stored within the dictionary.
- remove job table generation routines from individual plotting functions within the matplotlib_charting.py script, the standalone.py**script, and the **compute_measures.py script. Replace all by reading the stored dill/dict_job_tables.pkl dictionary.
Finally, a new utiliy function was added which prints the contents of dictionaries in an organized, landscape fashion.
- add pprint_dict function to the matplotlib_charting module.
March 9th, 2017
- Change documentation references from configuration file to settings dictionary.
- Remove make_pay_tables_from_excel.py script. This script is now incorporated within the build_program_files.py script
- Change references throughout code from eg_dict to renamed p_dict.
- Create the dill folder with the build_program_files.py script if it does not exist. An empty dill folder is no longer part of the original program files.
- Modify clear_dill_files function to check for the existence of the dill folder before executing.
- Add proposal name argument test and exception messages to compute_measures.py and join_inactives.py scripts.
- Add add_editor_list_to_excel function to matplotlib_charting module. This function will add an edited proposal list order (output of editor tool) to the proposals.xlsx input file, as a new worksheet named edit. The edited proposal list order may be preserved in this fashion and permits an easy way to reproduce the corresponding dataset.
- Add code to remove stored pickle files prior to overwriting for a speed improvement.
- Add a return_min option to the max_of_nested_lists function.
- Extensive updates to the matplotlib_charting and the function modules doctrings defining input types and function descriptions.
- Refactored cond_test plotting function for improved capability and output.
- Add count_ratio_dict worksheet to settings.xlsx input file. This worksheet will eventually replace the count_ratio_condition and the quota_dict worksheets as the count ratio condition code is updated.
- Add code to the build_program_files.py script to read the new count_ratio_dict worksheet.
- Add code to the convert function within the converter module to convert the data from the count_ratio_dict for an enhanced job level model when appropriate.
- Delete function make_intlists_from_columns.
- Modify function make_lists_from_columns to handle deleted function above.
- Add make_group_lists function. This function is used with Excel input (specifically worksheet cells) to convert string objects (ex. “2,3”) and integers into Python lists containing integers. This function is used with the count_ratio_dict dictionary construction.
- Add make_eg_pcnt_column function. Create an array of values which may be added to the input dataframe as a column reflecting the starting percentage of each employee within his/her original employee group at month zero.
- Add make_starting_val_column function. Create an array of values which may be added to the input dataframe as a column reflecting the starting value (month zero) of a selected attribute for each employee for every month (repeating values for successive months, indexed and unchanging for each employee).
- Add save_and_load_dill_folder function. Save the current dill folder to the saved_dill_folders folder (created if it does not already exist). Load a saved dill folder as the dill folder if it exists. This function allows previously calculated pickle files (including the datasets) to be loaded into the dill folder for quick review. All adds up to mean convenient switching between previously calculated case study files.
February 6th, 2017
This version is a major update. All inputs for the program are now read solely from spreadsheet workbooks - the configuration files have been completely eliminated. This change was made to make it easier for non-programmers to interact with seniority_list and to generally simplify the work flow when setting up the program for a particular case study and for further parameter modifications in the course of analysis. The new workbook containing the information previously held within the config files is named settings.xlsx and is located within the excel folder.
The data from the new settings.xlsx spreadsheet is stored in three dictionaries which serve as a fast data source for operations.
- Settings dictionary - essentially contains all of the information previously located in the configuration files.
- Color dictionary - a new source of color lists for plotting.
- Attribute dictionary - a collection of dataset column name descriptions used for plotting titles and labels.
The dictionary generation process has been incorporated within the build_program_files script, adding to the other generated data files and compensation table data. The dictionaries are stored in the dill folder as separate files.
When beginning a new case study, the user will now simply create a new case study folder within the excel folder and paste copies of the sample workbooks into it. The user will then go through each spreadsheet and modify the contents as appropriate to the new case study.
The old case_files folder and its contents are no longer used or needed. The old config.py file in the main seniority_list folder has been eliminated as well.
An added bonus of this update is the availability of a wide-range of chart plotting color schemes. The new color dictionary is created with multiple color lists as values and matplotlib colormap names as keys. All matplotlib colormaps are now available at all times. Each color list is automatically generated with a length equal to the number of job levels in the data model + 1. This supplies a color for each job level plus an additional color for a furlough level.
All scripts and functions were updated to utilize the new dictionaries with many functions receiving additional arguments and additional docstring descriptions for even more control and customization of analysis output.
Four new functions were developed to assist with the spreadsheet to python conversion.
These functions are essentially “helper” functions used within the build_program_files script and are contained within the functions module.
Two new plotting-related functions were built as well.
The make_color_list function is able to perform multiple tasks, from producing a custom color list to plotting an example of every matplotlib colormap. It is used within the build_program_files script to produce the color dictionary.
The add_pad function automatically spaces chart labels when they would otherwise overlap one another. It has been incorporated within several plotting functions.
The new plotting functions are located within the matplotlib_charting module.
January 15th, 2016
- added a metric (attribute) description dictionary, “m_dict”, to general configuration file. This dictionary will provide labels for many of the plotting functions.
- refactored the delayed implementation methodology to use standalone data stored within a numpy array, generated by a new function, make_preimp_array. The new method allows any pre-implementation attributes to be transferred to the integrated dataset and is simpler than earlier code.
- refactored the “cat_order” attribute generation by employing a new function, make_cat_order. The new function is faster than the old method and correctly restricts standalone results to available standalone job levels.
- removed enhanced job level conditional variable assignment from case-specific configuration files and replaced with the new convert function. The new function is contained within a new module, converter.py, which is imported by the case-specific file(s). Only basic job level conditional job assignment data will be entered into the case-specific configuration files now. The basic level data will be automatically converted to enhanced data as appropriate.
December 31st, 2016
added slice_ds_by_index_array function to matplotlib_charting module and example to the PLOTTING notebook (subsequently renamed to slice_ds_by_filtered_index).
- filter an entire dataframe by only selecting rows which match
the filtered results from a target month. In other words, zero in on a slice of data from a particular month, such as employees holding a specific job in month 25. Then, using the index of those results, find only those employees within the entire dataset as an input for further analysis within the program.
- The output may be used as an input to a plotting function or for other analysis. This function may also be used repeatedly with various filters, using output of one execution as input for another execution.
improved the make_decile_bands function and docstring.
updated case_template.py file variable names for simplicity.
refactored some hard-coding found within the pre-existing condition section within the compute_measures.py script. This change will prepare any employee group(s) for special rights calculations.
added numerous function docstring improvements, primarily input variable descriptions.
refactored gen_skel_emp_idx function so that it now generates a long-form employee index array in addition to the idx_array. The make_skeleton.py script was updated to use this new output.
refactored the align_fill_down function, removing one input.
added numerous comments in many of the program files.
combined the convert_jcnts_to_enhanced and convert_job_changes_to_enhanced functions into one new function, convert_to_enhanced. The list_builder.py script was updated to use the new function, along with some plotting functions.
refactored cond_test plotting function, allowing much more flexible job assignment validation.
added mark_quantiles plotting function. This function is used by the quantile_groupby function below.
added quantile_groupby plotting function.
This function permits the user to group the members of a selected employees group(s) into equally-sized sections, or quantiles, and track the attributes of those groups over time using various groupby methods. The available methods are as follows (default is median):
[mean, median, first, last, min, max]
For example, an input of 40 for the quantiles input would equate to 40 sections of the initial employee group population, each representing 2.5% of the group. The progression of these group segments will be calculated and plotted, maintaining the original members of each segment. quantile calculation from separate groups is independent of each other, but can be tracked through an integrated dataset for robust comparison of outcome.
If the user selects “cat_order” (job category numerical ranking), color bands representing the various job levels may be displayed as a chart background. This provides the user with a clear visualization of the way the employee group would progress through the various job levels over time under various list ordering proposals.
Examples of the quantile_groupby plotting function have been added to the PLOTTING notebook.
extensive narrative, definitions, and examples have been added to the “user guide” section of the documentation.
November 27th, 2016
upgraded the editor tool function.
- The editor tool will now automatically use the edited dataset for the recursive editing routine. The initial “compare_ds_text” dataset reference will now only have effect when an edited dataset does not exist.
- The process may be interrupted and reset with a new “reset” argument.
- The function will default to the first dataset proposal if the “compare_ds_text” input is invalid.
- The title of the differential chart will now reference the dataset being compared to the baseline.
many minor code improvements.
continual work on the program documentation, particularly the operational overview and the user guide.
October 20th, 2016
Refactored make_pay_tables_from_excel.py script.
- The requirements for the input Excel workbook related to compensation have been greatly simplified. Only two worksheets are necessary, one containing basic job level hourly rates and another with monthly pay hours per level and job description labels.
- Enhanced job tables are now automatically prepared when appropriate. This is controlled by the config.py enhanced_jobs variable.
- Furlough job levels are now added automatically as the bottom level within each annual grouping of pay data.
- Total monthly compensation tables may be ordered by a select pay year and longevity level.
- The script now creates a new Excel-format file with worksheets containing the calculated pay tables utilized for the case study, pay_table_data.xlsx. Sorted pay tables may be examined and the sort basis changed if desired. The workbook also contains other worksheets pertaining to the job level order used within the model. The file is stored in an auto-generated, case-study named folder within the “reports” folder.
- Other features added as described within the user guide.
The join_inactives.py script now stores its Excel file output within the “reports” folder, next to the pay data file mentioned above.
October 5th, 2016
- Added the job_count_bands plotting function to the library of built-in plotting functions included with seniority_list. This function returns a chart which displays progressive counts of job opportunities available to selected employee group(s) under selected list order proposal(s) as an area chart with bands of different colors representing job levels. The input data may be filtered by up to three attributes, so that analysis may target particular population segments, as described in the previous version summary.
- Continued work developing the “user guide” section of the documentation.
September 28th, 2016
- This version includes a major update to the plotting functions and changes the way datasets are loaded for analysis.
- Most built-in plotting functions now have a three-layer filtering capability. This permits simple drill-down into the dataset for further insight. (Note: This is an added user-friendly convenience feature only. The capability to pre-filter datasets existed prior to this update but required additional programming knowledge to use it.) Analysis of specific subsets of the datasets is now straight-forward and much more convenient. For example, a target filtered attribute dataset with employees above a certain age, with a minimum longevity value, who are holding a certain job would be trivial to select with this new capability. For most plotting functions, a filtered subset could be viewed for a particular model month as well. This added filtering capability is handled with the new filter_ds function. The filter_ds function checks for attribute filtering arguments and uses them to filter the datasets prior to analysis within the various plotting functions.
- The way that pickeled datasets are read for use in the program has been updated. The names of the case study proposal worksheets are read from the source Excel workbook (proposals.xlsx). The program then looks for the matching datasets within the dill folder and loads them into a dictionary, using the proposal names as keys. Labels associated with the datasets are generated at the same time. These labels are used in the plotting functions. This functionality is provided with the new load_datasets function.
- Another new function allows flexible dataset variable input for for nearly all of the plotting functions. The determine_dataset function allows inputs to be a string key referenced to the dictionary output of the load_datasets function, or any variable representing a pandas dataframe.
- All program files and notebooks were updated to handle the new methods described above and the plotting functions documentation was revised.
September 3rd, 2016
- Removed “fur.pkl”, “sg.pkl”, and “active_each_month.pkl” file generation from the build_program_files.py script. These files were no longer needed.
- Consolidated the two standalone dataset scripts into one. This eliminated the standalone_with_job_changes.py and standalone_no_job_changes.py scripts in favor of the new script, standalone.py.
- Refactored config.py to create a job change schedule reflecting no job changes when the compute_with_job_changes option is False. This allows the job changes routine to run with all dataset calculations, adding simplicity and eliminating unnecessary code.
- Updated the pay_tables.xlsx Excel file by removing worksheets which are no longer needed.
- Added a quota_dict section to the basic job level configuration section of sample3.py and the case_template.py files.
- Removed the “actives_only” option in the config.py file. All datasets will now include any furloughed employees and will not incorporate other inactive employees.
- Modifications made with other program files to accommodate the removal of the “actives_only” option.
August 31st, 2016
- Add clear_dill_files function, used by “auto-cleaning” below.
- Add “auto-cleaning” of dill folder when case_study config input is changed. This prevents residual files from a previous study coexisting with new case study files within the “dill” folder.
- Add auto-generated sample employee and employee list to PLOTTING notebook. This will pick median employees from any list(s) for use with sample plotting.
- Moved one-time editor tool ipywidget config command to last cell in EDITOR_TOOL notebook. A recent update to ipywidgets required this command to be run one time. The user will uncomment the code, run the cell, then re-comment the code.
- Updated case_template.py file to match recent upgrades.
- Add documentation for website user guide relating to input file naming conventions and file locations.
August 30th, 2016
- Simplified structure of config.py module. Users will have a much clearer understanding of modifiable vs. imported variables from the case-specific config file. Sections designed for user-modifiable inputs are now clearly delineated.
- Added pay table related configuration file inputs which will be imported from the case-specific config file including option for a future raise and/or temporary pay scale exceptions.
- Modified contract_pay_year_and_raise function to accept the customized pay-related inputs from config file.
- Added the plotting function eg_boxplot. This function will plot actual attribute ranges for employee groups over time as boxplots.
August 26th, 2016
- This update is a collection of minor edits, docstring additions, notebook adjustments, and refactoring.
- EDITOR notebook… recent update to ipywidgets requires a one line configuration command for proper operation. Added a cell within the notebook to accomplish that requirement. (Note: the update broke the button colors)
- PLOTTING notebook… adjusted variable inputs within notebook cells to match minor configuration file color list changes
- Updated chart labeling for the group_average_and_median function
- Improved rows_of_color plotting function, users may now select any job level combined with any employee group(s) for any month, also can display other categories such as furlough or other special group
- Added documentation for several plotting functions
- Removed standalone or furlough colors from case-specific configuration color lists. Now these additional colors are added when needed from within a function.
August 12th, 2016
- Adjusted editor function widget positioning, and minor code adjustment to permit compatibility with anaconda ipywidgets version (which is lagging behind latest version significantly, though the editor retains full functionality).
- Added group_average_and_median plotting function. This function permits plotting of group average and/or median for a selected attribute over time for a main and secondary dataset. Standalone data may be used as main or secondary data. The attributes may be further filtered/sliced by up to 3 constraints, such as age, longevity, or job level. This function can plot basic data such as average list percentage or could, for example, plot the average job category rank for employees hired prior to a certain date who are over or under a certain age, for a selected integrated dataset and/or standalone data (or for two integrated datasets).
August 9th, 2016
- job_time_change function may now display job numbers or custom job labels (from case-specific config file).
- Added EDITOR_TOOL.ipynb notebook to repository.
- Eliminated need for “edit_mode” input within general configuration file. The program will now use edit mode whenever the editor tool is used.
August 4th, 2016
- Refactored parallel plotting function to handle any number of datasets and any number of employee groups.
- Added new plotting function job_time_change. This function compares the amount time in months spent in various jobs under different list proposals. The information is presented only for employees who experience a change. Any number of datasets, employee groups, and job levels may be selected for analysis.
- Added documentation for multiple plotting functions
July 31st, 2016
- Added cat_order attribute (job rank number) to standalone dataset. The cat_order for each independent group is normalized to be accurate for the integrated group. This allows direct comparison with integrated job levels.
- Refactored compute_measures script so that standalone cat_order data is merged with integrated cat_order data when a delayed implementation exists. This new capability can be visualized for individual employees with the job_level_progression plotting function.
July 26th, 2016
- Refactored eg_diff_boxplot function to allow any number of datasets to be compared with standalone data or with each other. Employee groups for analysis may now be selected and the plot colors will be correct for the group(s). Option added to exclude employees who will be furloughed at any point within the data model, reducing or eliminating outlier data for some attribute measures.
July 17th, 2016
- Added additional filtering capability to the editor tool. Filtering may now be accomplished with monthly data combined with with additional attribute selection.
- Added reset_editor function to restore the editor if invalid filter attributes are selected, leading to an exception. Exception handling will be added as my available developer time permits.
July 16th, 2016
- Added option to increase employee retirement age. The retirement age may be raised with specified increments at designated times in the future.
- New function clip_ret_ages sets proper retirement age for employees in their retirement month when the model includes a retirement age increase.
- Added a “ret_mark” column during the skeleton file creation routine which is passed to the calculated datasets. The ret_mark column will indicate “1” when an employee is in their last working month. This is helpful for filtering or plotting retirement data when the datasets contain multiple retirement ages.
July 10th, 2016
- Altered standalone dataset generation scripts to accept any number of employee groups.
- Modified differential scatter plotting function to accept any number of proposals and employee groups
- Replaced numpy “unique” function with pandas “unique” function throughout the code for speed improvement
July 7th, 2016
- Changed “cat_order” attribute calculation method to a groupby operation vs. a sort and resort yielding 10-15 percent reduction in total dataset compilation time.
July 6th, 2016
- Added a case_files folder to the project. This will be the home for data specific files belonging to a particular integration case. The general config file will import the case-specific information and also allow other general options to be added and used by seniority_list. This arrangement will permit multiple cases to be available for analysis, easily selected with one input within the general config file.
- Moved the special condition job assignment data out of the general config file and into the case-specific file(s).
- Moved the notebooks from the notebook folder to the seniority_list folder and deleted the notebooks folder. This will allow the notebooks to run without import issues at this point.
- Added information to the “installation” and “user guide” sections of the documentation. Much more to come.
July 4th, 2016
- Added cond_test function. Used to visualize selected job counts applicable to computed job assignment condition. Primary usage is testing, though the function can chart any job level(s).
- Added single_emp_compare function. Select a single employee and compare proposal outcomes using various calculated measures.
- Add installation page to documentation.
- Add notebooks folder to project with “Plotting” and “Run_Scripts” jupyter notebook files.
- Minor code cleanup.
June 24th, 2016
- Added plotting functions job_count_charts and emp_quick_glance. Updated the quantile_years_in_position plot layout and added helper function build_subplotting_order.
June 18th, 2016
- Initial work for config, job assign, and data source refactor. Initial spreadsheet list data, compensation information, and order proposals will be contained within case-specific folders within the “excel” folder and will be selected with a config file variable. Basic program files are generated from these properly formatted source spreadsheets. Other case data such as job counts, job changes, conditions, and recall schedules will be contained within case-specific python modules.
- Function module docstring cleanup
June 12th, 2016
- Split align function into two functions: align_next and align_fill_down. Month-to-month data alignment is now accomplished with numpy index alignment vs. pandas dataframe alignment. The new align_next function replaces the old align function and is primarily used during the job assignment portion of the dataset generation scripts. Net result is an overall 40-50 percent reduction in the time required for dataset generation.
- Other minor code improvements throughout and additions to function documentation.
June 5th, 2016
- Added find_row_orphans and compare_dataframes functions to the list_builder script. These functions are used to compare dataframe columns and/or entire dataframes. They are able to pinpoint differences within large datasets very quickly, which is particularly helpful during the master list data construction phase.
June 3rd, 2016
- Added sort_eg_attributes, build_list, sort_and_rank, and names_to_integers functions to the list_builder script. List proposals may now be rapidly constructed from sample or case master lists. One or more attribute columns may be selected as list order inputs and a “hybrid” ordering achieved by applying variable weightings to those columns.
May 28th, 2016
- Added list_builder script and the prepare_master_list function. This is the first step toward manual list building using various attribute weighting, merging, and sorting. This feature is considered a convenience tool only. It may be used for initial list building and ordering prior to analysis and further editing.
May 27th, 2016
- Added a sample pay table file to the sample_data folder. Sample pay tables may now be generated from the sample file for use with the sample datasets. The sample file simulates a typical Excel input file with pay scale information.
- Replaced the previous pay table generation script with a modified version. The script converts the Excel workbook to Python pickle files for use within the program, either with real or sample data.
May 26th, 2016
- Added sample master list and sample proposals (both in Excel format) to the sample_data folder. These files can be the source for testing the operation of the program and creating sample datasets when the “sample_mode” option within the configuration file is set to “True”. Sample pay-related files will be added soon. Updated several other scripts, including significant updates to the config file, so they will operate with the sample data.
- Added print_config_selections function. The function provides a quick report of configuration file selections in a dataframe format.
May 19th, 2016
- Added build_files script. Build supporting files from initial Excel file input such as master data list, proposal orderings, last month percent, etc.
- Added standalone_no_job_changes script. Used with most basic dataset creation. This file is rarely used but available if a dataset without any job changes over time is desired.
- Other coding changes to format the program to accept a wider range of list input
May 16th, 2016
- Added join_inactives script. Edited or active employee only lists may now be merged into the original master list which contains all employees including inactive employees (such as sick leave, supervisory, etc.) The inactives may be attached either to the “just senior” employee group active cohort or the “just junior” with an argument option. The resulting list will be sorted and numbered in the new list order.
May 14th, 2016
- Added range_diff plotting function which computes and displays aggregate differential data over time, comparing proposal results with standalone data.
- Modified compute_measures script. A master data file will now be reordered by a specific proposal list order or an order from the editor tool instead of storing separate data files for each proposal.
May 12th, 2016
- Added eg_multiplot_with_cat_order function. Adds flexible x y plotting for most attributes with special color bands and scaling when cat_order is the selected measure. The function is able to select certain employee groups for independent views.
May 2nd, 2016
- Added multiple controls to the editor interface making the tool easier to use. “One click” recalculation with chart updating is now enabled.
May 1st, 2016
- Added editor function. Editor is an interactive, visual list editing tool for use within the Jupyter notebook. This tool can be used to remove list distortions using comparative data.
April 26th, 2016
- Added job_transfer function.
April 22nd, 2016
- Added edit mode to config file in preparation for visual span selector editing tool.
- Minor documentation edits including adding proper table format for documentation format.
- New differential option added to quantile_years_in_position plotting function along with other plot output options.
- Added new quantile_bands_in_position plotting function.
April 15th, 2016
- Initial commit.