Software Implementation

This page describes the bca4abm software implementation and how to contribute.

The implementation starts with the ActivitySim framework, which serves as the foundation for the software. The framework, as briefly described below, includes features for data pipeline management, expression handling, testing, etc. Built upon the framework are additional core components for benefits calculation.

ActivitySim Framework

bca4abm is implemented in the ActivitySim framework. As summarized here, being implemented in the ActivitySim framework means:

  • Overall Design

    • Implemented in Python, and makes heavy use of the vectorized backend C/C++ libraries in pandas and numpy.

    • Vectorization instead of for loops when possible

    • Runs sub-models that solve Python expression files that operate on data tables

  • Data Handling

    • Inputs are in CSV format, with the exception of settings

    • CSVs are read-in as pandas tables and stored in an intermediate HDF5 binary file that is used for data I/O throughout the model run

    • Key outputs are written to CSV files

  • Key Data Structures

    • pandas.DataFrame - A data table with rows and columns, similar to an R data frame, Excel worksheet, or database table

    • pandas.Series - a vector of data, a column in a DataFrame table or a 1D array

    • numpy.array - an N-dimensional array of items of the same type, such as a matrix

  • Model Orchestrator

    • ORCA is used for running the overall model system and for defining dynamic data tables, columns, and injectables (functions). ActivitySim wraps ORCA functionality to make a Data Pipeline tool, which allows for re-starting at any model step.

  • Expressions

    • Model expressions are in CSV files and contain Python expressions, mainly pandas/numpy expression that operate on the input data tables. This helps to avoid modifying Python code when making changes to the model calculations.

  • Code Documentation

  • Testing

    • A protected master branch that can only be written to after tests have passed

    • pytest for tests

    • TravisCI for building and testing with each commit


Software components common to both ABM and four-step model usage.


bca4abm.bca4abm.calc_rows_per_chunk(chunk_size, df, spec, extra_columns=0, trace_label=None)

simple rows_per_chunk calculator for chunking calls to assign_variables

ActivitySim’s chunk.rows_per_chunk method handles the main logic, including a missing/zero chunk size

dfpandas DataFrame
specpandas DataFrame
extra_columnsint, optional
trace_labelstr, optional
bca4abm.bca4abm.eval_and_sum(assignment_expressions, df, locals_dict, group_by_column_names=None, df_alias=None, chunk_size=0, trace_rows=None)

Evaluate assignment_expressions against df, and sum the results (sum by group if list of group_by_column_names is specified. e.g. group by coc column names and return sums grouped by community of concern.)

group_by_column_namesarray of str

list of names of the columns to group by (e.g. coc_column_names of trip_coc_end)


assign_variables df_alias (name of df in assignment_expressions)

trace_rowsarray of bool

array indicating which rows in df are to be traced


Read a CSV model specification into a Pandas DataFrame or Series.

The CSV is expected to have columns for component descriptions targets, and expressions,

The CSV is required to have a header with column names. For example:



Name of a CSV spec file.


The description column is dropped from the returned data and the expression values are set as the table index.

bca4abm.bca4abm.scalar_assign_variables(assignment_expressions, locals_dict)

Evaluate a set of variable expressions from a spec in the context of a given data table.

Python expressions are evaluated in the context of this function using Python’s eval function. Users should take care that these expressions must result in a scalar

assignment_expressionspandas sequence of str

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @


Will have the index of df and columns of exprs.


bca4abm.processors.aggregate_trips.aggregate_trips_processor(aggregate_trips_spec, settings, data_dir)

Compute aggregate trips benefits

The data manifest contains a list of trip count files (one for base, one for build) along with their their corresponding in-vehicle-time (ivt), operating cost (aoc), and toll skims.

Since the skims are all aligned numpy arrays , we can express their benefit calculation as vector computations in the aggregate_trips_spec

Aggregate trips processor

ABM Processors

Software components for ABM model usage.



bca4abm.processors.abm.auto_ownership.auto_ownership_processor(persons_merged, auto_ownership_spec, auto_ownership_settings, coc_column_names, chunk_size, trace_hh_id)

Compute auto ownership benefits

auto ownership processor


Demographics processor


Person trips processor

bca4abm.processors.abm.person_trips.person_trips_processor(trips_with_demographics, person_trips_spec, person_trips_settings, coc_column_names, settings, chunk_size, trace_hh_id)

Compute disaggregate trips benefits


physical activity processor

bca4abm.processors.abm.physical_activity.physical_activity_processor(trips_with_demographics, persons_merged, physical_activity_trip_spec, physical_activity_person_spec, physical_activity_settings, coc_column_names, settings, chunk_size, trace_hh_id)

Compute physical benefits

Physical activity benefits generally accrue if the net physical activity for an individual exceeds a certain threshold. We calculate individual physical activity based on trips, so we need to compute trip activity and then sum up to the person level to calculate benefits. We chunk trips by household id to ensure that all of a persons trips are in the same chunk.

Four Step Processors

Software components for four-step model usage.


bca4abm.processors.four_step.aggregate_demographics.aggregate_demographics_processor(zone_hhs, aggregate_demographics_spec, settings, trace_od)
zone_hhsorca table

input zone demographics

Aggregate demographics processor

each row in the data table to solve is an origin zone and this processor calculates communities of concern (COC) / market segments based on mf.cval.csv


bca4abm.processors.four_step.aggregate_zone.aggregate_zone_processor(zones, trace_od)

zones: orca table

zone data for base and build scenario dat files combined into a single dataframe with columns names prefixed with base_ or build_ indexed by ZONE

Aggregate zone processor

each row in the data table to solve is an origin zone and this processor calculates zonal auto ownership differences as well as the differences in the destination choice logsums - ma.<purpose|income>dcls.csv Maybe the ma.<purpose|income>dcls.csv files should be added to the mf.cval.csv before input to the bca tool?


class bca4abm.processors.four_step.aggregate_od.ODSkims(omx_file_path, name, zone_index, transpose=False, cache_skims=True)

Wrapper for skim arrays to facilitate use of skims by aggregate_od_processor

skims_dictempty dict to cache skims read from file
omx: open omx file object

this is only used to load skims on demand that were not preloaded

length: int

number of zones in skim to return in skim matrix in case the skims contain additional external zones that should be trimmed out so skim array is correct shape to match (flattened) O-D tiled columns in the od dataframe

transpose: bool

whether to transpose the matrix before flattening. (i.e. act as a D-O instead of O-D skim)

bca4abm.processors.four_step.aggregate_od.create_zone_matrices(model_settings, zones)

ODSkims look-alikes that have identical values for all zone origins/dests

i.e. we either repeat (origin_zone_matrices) or tile (dest_zone_matrices) zone values to expand zones columns into ODSkims-style flattened arrays

Aggregate OD processor

each row in the data table to solve is an OD pair and this processor calculates trip differences. It requires the access to input zone tables, the COC coding, trip matrices and skim matrices. The new OD_aggregate_manifest.csv file tells this processor what data it can use and how to reference it. The following input data tables are required: assign_mfs.omx, inputs and results of the zone aggregate processor, and skims_mfs.omx.

Contribution Guidelines

bca4abm development follows the same development guidelines as ActivitySim.

Release Notes

  • v0.4 - first release

  • v0.5 - add Python 3.5+ support

  • v0.6 - update 4step example