Software Implementation

This page describes the bca4abm software implementation and how to contribute.

The implementation starts with the ActivitySim framework, which serves as the foundation for the software. The framework, as briefly described below, includes features for data pipeline management, expression handling, testing, etc. Built upon the framework are additional core components for benefits calculation.

ActivitySim Framework

bca4abm is implemented in the ActivitySim framework. As summarized here, being implemented in the ActivitySim framework means:

  • Overall Design

    • Implemented in Python, and makes heavy use of the vectorized backend C/C++ libraries in pandas and numpy.

    • Vectorization instead of for loops when possible

    • Runs sub-models that solve Python expression files that operate on data tables

  • Data Handling

    • Inputs are in CSV format, with the exception of settings

    • CSVs are read-in as pandas tables and stored in an intermediate HDF5 binary file that is used for data I/O throughout the model run

    • Key outputs are written to CSV files

  • Key Data Structures

    • pandas.DataFrame - A data table with rows and columns, similar to an R data frame, Excel worksheet, or database table

    • pandas.Series - a vector of data, a column in a DataFrame table or a 1D array

    • numpy.array - an N-dimensional array of items of the same type, such as a matrix

  • Model Orchestrator

    • ORCA is used for running the overall model system and for defining dynamic data tables, columns, and injectables (functions). ActivitySim wraps ORCA functionality to make a Data Pipeline tool, which allows for re-starting at any model step.

  • Expressions

    • Model expressions are in CSV files and contain Python expressions, mainly pandas/numpy expression that operate on the input data tables. This helps to avoid modifying Python code when making changes to the model calculations.

  • Code Documentation

  • Testing

    • A protected master branch that can only be written to after tests have passed

    • pytest for tests

    • TravisCI for building and testing with each commit

Common

Software components common to both ABM and four-step model usage.

bca4abm

bca4abm.bca4abm.calc_rows_per_chunk(chunk_size, df, spec, extra_columns=0, trace_label=None)

simple rows_per_chunk calculator for chunking calls to assign_variables

ActivitySim’s chunk.rows_per_chunk method handles the main logic, including a missing/zero chunk size

Parameters
chunk_sizeint
dfpandas DataFrame
specpandas DataFrame
extra_columnsint, optional
trace_labelstr, optional
Returns
num_rowsint
effective_chunk_sizeint
bca4abm.bca4abm.eval_and_sum(assignment_expressions, df, locals_dict, group_by_column_names=None, df_alias=None, chunk_size=0, trace_rows=None)

Evaluate assignment_expressions against df, and sum the results (sum by group if list of group_by_column_names is specified. e.g. group by coc column names and return sums grouped by community of concern.)

Parameters
assignment_expressions
df
locals_dict
group_by_column_namesarray of str

list of names of the columns to group by (e.g. coc_column_names of trip_coc_end)

df_aliasstr

assign_variables df_alias (name of df in assignment_expressions)

chunk_sizeint
trace_rowsarray of bool

array indicating which rows in df are to be traced

bca4abm.bca4abm.read_assignment_spec(fname)

Read a CSV model specification into a Pandas DataFrame or Series.

The CSV is expected to have columns for component descriptions targets, and expressions,

The CSV is required to have a header with column names. For example:

Description,Target,Expression,Silos

Parameters
fnamestr

Name of a CSV spec file.

Returns
specpandas.DataFrame

The description column is dropped from the returned data and the expression values are set as the table index.

bca4abm.bca4abm.scalar_assign_variables(assignment_expressions, locals_dict)

Evaluate a set of variable expressions from a spec in the context of a given data table.

Python expressions are evaluated in the context of this function using Python’s eval function. Users should take care that these expressions must result in a scalar

Parameters
assignment_expressionspandas sequence of str
locals_dictDict

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

Returns
variablespandas.DataFrame

Will have the index of df and columns of exprs.

aggregate_trips

bca4abm.processors.aggregate_trips.aggregate_trips_processor(aggregate_trips_spec, settings, data_dir)

Compute aggregate trips benefits

The data manifest contains a list of trip count files (one for base, one for build) along with their their corresponding in-vehicle-time (ivt), operating cost (aoc), and toll skims.

Since the skims are all aligned numpy arrays , we can express their benefit calculation as vector computations in the aggregate_trips_spec

bca4abm.processors.aggregate_trips.logger = <Logger bca4abm.processors.aggregate_trips (WARNING)>

Aggregate trips processor

ABM Processors

Software components for ABM model usage.

abm_results

auto_ownership

bca4abm.processors.abm.auto_ownership.auto_ownership_processor(persons_merged, auto_ownership_spec, auto_ownership_settings, coc_column_names, chunk_size, trace_hh_id)

Compute auto ownership benefits

bca4abm.processors.abm.auto_ownership.logger = <Logger bca4abm.processors.abm.auto_ownership (WARNING)>

auto ownership processor

demographics

bca4abm.processors.abm.demographics.logger = <Logger bca4abm.processors.abm.demographics (WARNING)>

Demographics processor

person_trips

bca4abm.processors.abm.person_trips.logger = <Logger bca4abm.processors.abm.person_trips (WARNING)>

Person trips processor

bca4abm.processors.abm.person_trips.person_trips_processor(trips_with_demographics, person_trips_spec, person_trips_settings, coc_column_names, settings, chunk_size, trace_hh_id)

Compute disaggregate trips benefits

physical_activity

bca4abm.processors.abm.physical_activity.logger = <Logger bca4abm.processors.abm.physical_activity (WARNING)>

physical activity processor

bca4abm.processors.abm.physical_activity.physical_activity_processor(trips_with_demographics, persons_merged, physical_activity_trip_spec, physical_activity_person_spec, physical_activity_settings, coc_column_names, settings, chunk_size, trace_hh_id)

Compute physical benefits

Physical activity benefits generally accrue if the net physical activity for an individual exceeds a certain threshold. We calculate individual physical activity based on trips, so we need to compute trip activity and then sum up to the person level to calculate benefits. We chunk trips by household id to ensure that all of a persons trips are in the same chunk.

Four Step Processors

Software components for four-step model usage.

aggregate_demographics

bca4abm.processors.four_step.aggregate_demographics.aggregate_demographics_processor(zone_hhs, aggregate_demographics_spec, settings, trace_od)
Parameters
zone_hhsorca table

input zone demographics

bca4abm.processors.four_step.aggregate_demographics.logger = <Logger bca4abm.processors.four_step.aggregate_demographics (WARNING)>

Aggregate demographics processor

each row in the data table to solve is an origin zone and this processor calculates communities of concern (COC) / market segments based on mf.cval.csv

aggregate_zone

bca4abm.processors.four_step.aggregate_zone.aggregate_zone_processor(zones, trace_od)

zones: orca table

zone data for base and build scenario dat files combined into a single dataframe with columns names prefixed with base_ or build_ indexed by ZONE

bca4abm.processors.four_step.aggregate_zone.logger = <Logger bca4abm.processors.four_step.aggregate_zone (WARNING)>

Aggregate zone processor

each row in the data table to solve is an origin zone and this processor calculates zonal auto ownership differences as well as the differences in the destination choice logsums - ma.<purpose|income>dcls.csv Maybe the ma.<purpose|income>dcls.csv files should be added to the mf.cval.csv before input to the bca tool?

aggregate_od

class bca4abm.processors.four_step.aggregate_od.ODSkims(omx_file_path, name, zone_index, transpose=False, cache_skims=True)

Wrapper for skim arrays to facilitate use of skims by aggregate_od_processor

Parameters
skims_dictempty dict to cache skims read from file
omx: open omx file object

this is only used to load skims on demand that were not preloaded

length: int

number of zones in skim to return in skim matrix in case the skims contain additional external zones that should be trimmed out so skim array is correct shape to match (flattened) O-D tiled columns in the od dataframe

transpose: bool

whether to transpose the matrix before flattening. (i.e. act as a D-O instead of O-D skim)

bca4abm.processors.four_step.aggregate_od.create_zone_matrices(model_settings, zones)

ODSkims look-alikes that have identical values for all zone origins/dests

i.e. we either repeat (origin_zone_matrices) or tile (dest_zone_matrices) zone values to expand zones columns into ODSkims-style flattened arrays

bca4abm.processors.four_step.aggregate_od.logger = <Logger bca4abm.processors.four_step.aggregate_od (WARNING)>

Aggregate OD processor

each row in the data table to solve is an OD pair and this processor calculates trip differences. It requires the access to input zone tables, the COC coding, trip matrices and skim matrices. The new OD_aggregate_manifest.csv file tells this processor what data it can use and how to reference it. The following input data tables are required: assign_mfs.omx, inputs and results of the zone aggregate processor, and skims_mfs.omx.

Contribution Guidelines

bca4abm development follows the same development guidelines as ActivitySim.

Release Notes

  • v0.4 - first release

  • v0.5 - add Python 3.5+ support

  • v0.6 - update 4step example