Massachusetts Travel Study Data User Guide

Author

Resource Systems Group, Inc.

Published

May 21, 2026

Data User Guide

Massachusetts Travel Study Data User Guide

A reusable reference for study design, survey documentation, delivered data structure, weighting, codebook metadata, and analyst workflows.

Choose one of the three buckets below for a guided path through the guide.

1 Overview

The Massachusetts Department of Transportation (MassDOT) contracted with RSG to conduct the 2024-2025 Massachusetts Travel Study (MTS), a statewide survey designed to collect demographically and geographically representative travel behavior data from 15,140 households across the Commonwealth. The survey exceeded this target and collected data from 18,122 households. Since the last Massachusetts Household Travel Survey was conducted in 2011, the transportation landscape, inclusive of infrastructure, services, and travel behaviors, has changed substantially. MassDOT conducted the MTS to gain a better understanding of changing travel patterns and mode choice, as well as to inform planning efforts and tools, including the statewide travel demand model.

1.1 Study Geography

The study sampled 0.6% of Massachusetts households and was geographically and demographically representative of the Commonwealth’s population.

To support adequate sample across the state, the survey team used Massachusetts’ 13 sample geographies for stratification. These geographies align with the Commonwealth’s MPO and regional planning areas and are summarized in Table 1.

Sample Geography MPO / Regional Planning Body Regional Description Sample-Frame Households
Berkshire Berkshire Regional Planning Commission Berkshire County and surrounding western Massachusetts communities. 56,078
Boston Region Boston Region MPO / MAPC Greater Boston core and inner suburban communities. 1,315,052
Cape Cod Cape Cod Commission Barnstable County and the Cape Cod region. 99,969
Central Massachusetts Central Massachusetts Regional Planning Commission Worcester-area communities in central Massachusetts. 229,416
Franklin Franklin Regional Council of Governments Franklin County and nearby western Massachusetts communities. 31,234
Martha's Vineyard Martha's Vineyard Commission The island region of Martha's Vineyard. 6,899
Merrimack Valley Merrimack Valley Planning Commission Northeastern Massachusetts communities centered on the Merrimack Valley. 137,029
Montachusett Montachusett Regional Planning Commission North-central Massachusetts communities in the Montachusett region. 98,602
Nantucket Nantucket Planning and Economic Development Commission The island region of Nantucket. 4,659
Northern Middlesex Northern Middlesex Council of Governments Lowell-area and nearby communities in northern Middlesex County. 113,727
Old Colony Old Colony Planning Council South Shore and Plymouth County communities in the Old Colony region. 142,628
Pioneer Valley Pioneer Valley Planning Commission Connecticut River Valley communities in western Massachusetts. 244,794
Southeastern Massachusetts Southeastern Regional Planning and Economic Development District South Coast and southeastern Massachusetts communities. 260,908
Household totals come from the MassDOT sampling plan and reflect the ABS sample frame, not achieved completes.
Table 1: Massachusetts sample geographies used for statewide stratification.

These geographies are shown in Figure 1 below.

Section 2 provides additional details about the sample design.

Figure 1: Map of the Survey Region, Showing Sample Geographies by Block Group

1.2 Study Timeline

Data collection for the 2024 Massachusetts Travel Study started in May 2024 and continued through June 2025 over three fielding periods, detailed in Table 2.

SURVEY TASK TIMELINE
Survey Design (Sample planning, survey website programming, invitation development) January 2024 - April 2024
Data Collection - Spring 2024 (Sending invitations, data monitoring, and adjustments) May 2024 - July 2024
Data Collection - Fall 2024 (Sending invitations, data monitoring, and adjustments) September 2024 - November 2024
Data Collection - Winter/Spring 2025 (Sending invitations, data monitoring, and adjustments) January 2025 - June 2025
Data Preparation (Data cleaning and weighting, finalizing dashboard, final reporting) June 2025 - June 2026
Table 2: Survey task timeline

Table 3 displays the number of completed households by year and month of first travel date. Travel data collection was limited in the Summer, when school was out of session, and in the winter during peak holiday travel periods.

Complete Households by Month and Year of First Travel Date
Month 2024 2025
January 0 1,390
February 0 276
March 0 473
April 0 1,980
May 1,246 522
June 2,461 0
July 0 0
August 0 0
September 2,719 0
October 4,188 0
November 386 0
December 0 0
Incomplete households are excluded from this summary.
Table 3: Completed Households by First Travel Date

1.3 Data Collection

Survey data were collected through a mixed-mode design that combined:

  • Smartphone-based travel diary (rMove®): Participants recorded travel via smartphone app in real-time for up to seven consecutive days.
  • Web-based travel diary (rMove for Web): Participants reported travel via a web survey on one assigned weekday.
  • Call center interviews: Participants reported travel via a call center on one assigned weekday, and were recorded in the web-based travel diary (or rMove for Web).

Each household first completed a recruit survey describing household composition, demographics, and vehicles, followed by a travel diary describing all trips made during the assigned day(s).

Table 4 shows a count of completed households by survey mode.

Completed Households by Survey Mode
Survey Mode Completed Households Percentage
Web-based Diary (rMove for Web) 8,660 55.4%
Smartphone App (rMove app) 6,410 41.0%
Call Center Interview (rMove for Web) 571 3.7%
Total 15,641
Table 4: Completed Households by Survey Mode

Section 3 provides further details about the survey instrument and question content.

2 Sample Design

The Massachusetts Travel Study used a probability-based, geographically stratified sample of households across Massachusetts.

The primary method of sampling was a probability address-based sampling (ABS) approach, whereby Massachusetts was stratified by key demographic features along census block groups and within those segments, random households throughout Massachusetts were invited into the study through the mail.

2.1 Sampling Framework

Sampling Frame

The survey used a United States Postal Service (USPS) address-based sampling (ABS) frame that includes all Massachusetts residential addresses, excluding group quarters such as dormitories, prisons, or assisted living facilities. Each sampled address represented a single household eligible for recruitment. The ABS frame provided complete statewide coverage and supported stratification by geography, land-use density, and socioeconomic characteristics.

Primary Sampling Unit

The primary sampling unit was the household, selected through random sampling from the ABS frame. All household members reported person-level and trip-level details, but only one member (the “primary respondent”) completed the recruit survey on behalf of the household.

Though the primary sampling unit was the household, the data collected also represent the behavior of individual persons. For participants who reported data using the smartphone, data were collected across multiple days, representing a multitude of travel and daily activity data.

Surveyable Population

Not all household members were surveyable. Only persons related to Person 1 (the primary respondent) were considered surveyable. Non-surveyable members (e.g., guests, visitors, or unrelated roommates) did not have trip or day completion requirements and were excluded from household-level completeness determinations.

Non-surveyable members can be identified in the person table using the surveyable variable. These members do not contribute to the household’s completeness status. They count towards the total number of household members but are not weighted (see Section 5.4.2.2).

2.2 Target Completions

The study’s goal was 15,140 completed household surveys statewide, distributed across MPO areas. The study achieved 18,122 completed households, exceeding the statewide target while maintaining proportionality across MPO geographies.

Sampling by Season and Wave

Sampling was conducted in three main fielding periods: Spring 2024, Fall 2024, and Winter/Spring 2025. Each wave included households from across the study geography, with adjustments to invitation pacing and oversampling emphasis based on observed response patterns and representativeness in earlier periods. This adaptive approach helped the final sample maintain geographic balance and demographic coverage across the state.

2.3 Stratification and Oversampling

The sample design combined statewide address-based sampling with geographic stratification and targeted oversampling. Within each MPO geography, sampled block groups were assigned to one of four mutually exclusive strata so the study could improve representation of groups that are typically harder to recruit while also increasing sample for key policy questions.

Sample strata

  1. General population. Block groups that did not qualify for any of the targeted oversampling strata.
  2. Rural population. Block groups that did not qualify for other oversampling strata and had fewer than 150 people per square kilometer.
  3. Hard-to-reach oversample. Block groups with at least 30% of households earning less than $25,000 per year, at least 60% of households identified as Hispanic and/or BIPOC, or at least 15% of households speaking limited English.
  4. Walk/Bike/Transit oversample. Block groups within the Boston Region MPO with transit access density classified as CBD or Dense Urban.

The sample plan identified 270 block groups that qualified for both the hard-to-reach and walk/bike/transit strata. Those block groups were assigned to the hard-to-reach stratum, which the plan anticipated would have the lower response rate. That overlap rule made the final strata mutually exclusive for sample management and weighting. Table 5 summarizes the resulting block groups, households, and adults in each sample stratum.

Sample Stratum Number of BGs Total Households Total Adults Adults per Household
Walk/Bike/Transit 327 183,508 354,282 1.9
Hard-to-reach 1,330 670,542 1,337,712 2.0
Rural 475 250,685 517,967 2.0
General 2,923 1,636,260 3,380,998 2.0
Total 5,055 2,740,995 5,590,959 2.0
Table 5: Survey region households and adults by sample stratum.

The hard-to-reach oversample increased representation for lower-income, BIPOC, and limited-English block groups, while the walk/bike/transit oversample increased representation for dense Boston-area block groups where multimodal travel was expected to be more common. Together, these design choices improved analytic coverage without changing the fact that the final weighted dataset represents the statewide household population. Table 6 summarizes the reference households, invitations sent, and invitation rates used across geographies and sample strata.

Geography Sample Stratum Invitations Sent Reference Households Invitation Rate
Berkshire General 3,755 22,471 16.7%
Berkshire Hard-to-reach 4,826 10,343 46.7%
Berkshire Rural 6,793 23,264 29.2%
Boston Region General 146,945 754,832 19.5%
Boston Region Hard-to-reach 171,751 355,304 48.3%
Boston Region Rural 5,870 21,408 27.4%
Boston Region Walk/Bike/Transit 28,061 183,508 15.3%
Cape Cod General 31,328 79,927 39.2%
Cape Cod Hard-to-reach 4,927 7,617 64.7%
Cape Cod Rural 4,499 12,425 36.2%
Central Massachusetts General 26,344 133,520 19.7%
Central Massachusetts Hard-to-reach 35,015 57,589 60.8%
Central Massachusetts Rural 11,583 38,307 30.2%
Franklin General 1,477 10,124 14.6%
Franklin Hard-to-reach 1,230 4,258 28.9%
Franklin Rural 4,395 16,852 26.1%
Martha's Vineyard General 3,559 3,128 113.8%
Martha's Vineyard Rural 3,678 3,771 97.5%
Merrimack Valley General 22,319 86,361 25.8%
Merrimack Valley Hard-to-reach 40,407 42,244 95.7%
Merrimack Valley Rural 4,504 8,424 53.5%
Montachusett General 14,498 53,897 26.9%
Montachusett Hard-to-reach 7,321 12,125 60.4%
Montachusett Rural 11,911 32,580 36.6%
Nantucket General 5,474 3,032 180.5%
Nantucket Rural 2,856 1,627 175.5%
Northern Middlesex General 22,001 86,555 25.4%
Northern Middlesex Hard-to-reach 22,886 23,834 96.0%
Northern Middlesex Rural 1,356 3,338 40.6%
Old Colony General 37,417 103,891 36.0%
Old Colony Hard-to-reach 28,394 31,685 89.6%
Old Colony Rural 3,182 7,052 45.1%
Pioneer Valley General 22,925 129,544 17.7%
Pioneer Valley Hard-to-reach 48,752 71,373 68.3%
Pioneer Valley Rural 10,740 43,877 24.5%
Southeastern Massachusetts General 50,296 168,978 29.8%
Southeastern Massachusetts Hard-to-reach 36,651 54,170 67.7%
Southeastern Massachusetts Rural 13,365 37,760 35.4%
Table 6: Reference households, invitations sent, and invitation rates by geography and sample stratum.

Across all waves, the study sent 903,291 invitations. Relative to the reference household counts in each sample segment, the statewide invitation rate was 23.7% in the general stratum, 60.0% in the hard-to-reach stratum, 33.8% in the rural stratum, and 15.3% in the walk/bike/transit stratum. These realized invitation rates show that the field effort emphasized hard-to-reach households statewide while also maintaining targeted coverage of rural areas and dense multimodal areas in the Boston Region.

In a small number of segments, the cumulative invitation rate exceeded 100%. This reflects repeated fielding across waves relative to the segment’s reference household count and should be interpreted as the total invitation effort rather than as unique-household coverage of the sampling frame.

Recruitment Channels

Households were recruited primarily through mailed invitation letters that directed sampled addresses to the study website and provided their survey access information. Follow-up reminder postcards were sent to nonresponding households, and the project team also used website support, email, and phone follow-up where appropriate to help households complete the study.

Participation Modes

Eligible households could participate via:

  • rMove smartphone app (seven-day diary),
  • rMove for Web (one-day diary), or
  • Call-center interview (one-day diary).

Mode assignment depended on household technology access and preference; all modes followed identical survey logic and data validation.

Incentives

The study used different incentive amounts by participation mode and sample stratum. rMove incentives were paid per adult; online and call-center incentives were paid per household.

  • General population — rMove: $25 per adult.
  • General population — web or call center: $15 per household.
  • Hard-to-reach population — rMove: $35 per adult.
  • Hard-to-reach population — web or call center: $25 per household.

Monitoring and Response Tracking

RSG maintained a real-time survey monitoring dashboard accessible to the Massachusetts Travel Study team throughout data collection.

The dashboard provided:

  • Response rates by segment and demographic subgroup,
  • Comparison to American Community Survey (ACS) benchmarks, and
  • Progress toward study targets

This tool supported geographic balance and demographic representativeness through adaptive field management.

TipAnalyst Tip: Interpreting Response Rates

Differences in observed response rates across MPOs or demographic strata reflect design priorities, not data quality. The weighting process fully corrects for these differences, so analysts should rely on weighted data for representativeness.

2.4 Representativeness and Nonresponse

Post-survey comparisons to ACS and model control totals indicated that the achieved sample closely reflected the statewide household population by:

  • Income group,
  • Household size,
  • Vehicle availability, and
  • Land-use context.

Residual differences were addressed through weighting adjustments (see Section 5).

3 Survey Instrument

The Massachusetts Travel Study collected detailed information about households, people, vehicles, and daily travel through a unified instrument designed for use across multiple reporting platforms. Each mode implemented the same core survey logic, ensuring results are directly comparable across participants and modes.

3.1 Survey Modes and Language Support

The survey instrument was administered through three participation modes:

  • Smartphone-based travel diary (rMove): Participants recorded travel via smartphone app in real time for up to seven consecutive days.
  • Web-based travel diary (rMove for Web): Participants reported travel via a web survey on one assigned weekday.
  • Call center interviews: Participants reported travel via a call center on one assigned weekday, and were recorded in the web-based travel diary (or rMove for Web).

The survey instrument was available in English and Spanish. The call center also supported participation in Portuguese, Chinese, Haitian Creole, Vietnamese, and Russian.

3.2 Recruit Survey

The recruit survey established household eligibility and collected baseline information to assign travel days and tailor diary prompts. Key modules included:

  • Household composition and member roster
  • Demographics (age, gender, race/ethnicity, income, employment, student status)
  • Housing characteristics (type, tenure, vehicles available)
  • Technology access and preferred reporting mode

The recruit survey was completed by the primary respondent (Person 1) on behalf of all household members. The primary respondent was also responsible for ensuring that all eligible household members completed their assigned travel diary.

3.3 Travel Diary

The travel diary collected information about travel made on the assigned reporting day or days. In the smartphone app diary, participants reviewed passively collected travel and completed prompted trip surveys. In rMove for Web and the call center diaries, respondents reported travel directly through the prompted diary instrument.

Across modes, the travel diary collected:

  • Day-begin and day-end location confirmation
  • Trip destinations, purposes, travel modes, and timing
  • Access, transfer, and egress details for transit trips
  • Companion and escort activity (where applicable)
  • Reasons for no travel on the assigned day (where applicable)

Adults could report their own travel, while proxy reporting was used for children (under 18 years) and other eligible household members.

TipAnalyst Tip: Multi-Day Travel Data

Only participants who used the rMove smartphone app recorded travel for multiple days, including weekends. The standard weights represent Monday, Tuesday, Wednesday, and Thursday travel. For analyses that compare travel across Monday through Sunday, use the alternate day-of-week weights described in the weighting chapter and analyst handbook.

3.4 Daily Surveys

The daily survey collected additional context about each reporting day, including:

  • Deliveries and pickups (e-commerce activity)
  • Telecommuting activity
  • Attitudinal questions
  • School attendance and activities

Some questions were repeated across all travel days; others (e.g., attitudinal questions) were asked only once. Household-level questions were asked of the primary respondent, while person-level questions were directed to each individual respondent.

When respondents completed the travel diary via browser or call center, “daily” questions were consolidated into a single survey following the travel diary.

3.5 Travel Date Assignment

Households were assigned one of the study’s weighted travel weekdays (Monday, Tuesday, Wednesday, and Thursday) during the study period. Households participating via rMove were assigned a seven-day reporting period beginning with the assigned start date. Households participating via web or call center reported travel for one assigned day and completed the survey after that travel date.

3.6 Questionnaire

The survey instrument covered the standard household travel survey modules needed for household, person, vehicle, travel day, trip, location, and tour delivery tables. It also included modules that were especially important for MassDOT’s analysis needs, including deliveries, telecommuting, school and work travel context, and household roster detail.

Question wording and skip logic were aligned across smartphone, web, and call center participation so that the delivered analysis variables remain comparable across modes. Table 7 summarizes the major topic areas covered by the instrument.

Topic Area What the Instrument Collected
Household Household composition, vehicles, income, home context, and respondent assignment
Person Demographics, employment, student status, technology access, and proxy reporting
Travel day Assigned travel date, begin/end-of-day location, no-travel confirmation, and daily context
Trip Destinations, purposes, modes, timing, transfers, and related trip details
School and work context Commuting, telework, school attendance, and related routine travel context
Special topics Deliveries and pickups, attitudinal questions, and study-specific follow-up items
Table 7: Major survey instrument topic areas.

4 Data Processing

4.1 Overview

This section describes the procedures used to transform raw household travel survey data – collected from participants’ smartphones and survey responses – into clean, analysis-ready tables. The process was designed to preserve the integrity of participants’ reported travel while correcting errors, filling gaps, and enriching the data with geographic and analytical variables that support modeling and planning applications.

Data processing occurred in four phases:

  1. Automated Processing – Raw survey records were copied into a structured working environment, trips were routed on the road network, and an automated classifier flagged trips that required human attention.
  2. Analyst Review – Trained data analysts reviewed flagged trips using a web-based interface, correcting errors in trip start and end points, splitting trips that contained unreported stops, joining trip fragments that were incorrectly separated, and removing invalid trips.
  3. Post-Review Processing – After analyst review, the data underwent a second round of automated processing that cleaned remaining issues, assigned geographic identifiers, imputed trip purposes where necessary, and performed a second pass of transit trip unlinking.
  4. TICTOC Processing – The processed unlinked trip data underwent additional treatment through TICTOC (Trip Imputation, Coordination, and Tour Organization Compiler), which prepared household travel survey data for travel forecasting by imputing selected missing trips, coordinating joint household travel, organizing unlinked trips into linked trips and tours, and adding model-facing attributes to the household, person, day, trip, linked trip, and tour outputs.

Each phase is described in detail in the sections that follow.

4.2 Automated Processing

When a participant completed a travel day, the smartphone application transmitted a set of raw records to our survey platform. These records included household and person characteristics from the recruitment survey, GPS traces from the phone’s location sensors, and the participant’s own descriptions of their trips – where they went, how they traveled, and why.

Automated processing began by copying these raw records into a working environment where they could be modified without affecting the original data. Several operations were then performed in sequence.

Data Completion and Household Disposition

The system evaluated whether each participant had provided sufficient data for their assigned travel days. Households that had completed all assigned travel periods were marked as complete. Households with insufficient data – such as those that uninstalled the app before their travel period ended – were flagged and could be excluded from further processing depending on the study’s sample requirements.

Trip Classification

After routing, an automated classifier examined each trip to determine whether it required analyst review. The classifier evaluated a set of rules based on trip characteristics – for example, whether the trip had an unusually high speed for its reported mode, whether it appeared to duplicate another trip, or whether its start and end times overlapped with other trips by the same person.

Trips that passed all checks were considered clean and did not require review. Trips that failed one or more checks were flagged for analyst attention and assigned to the review queue.

Transit Unlinking

Trips reported by participants as a single transit journey were automatically separated into their component segments: the walk, bike, or drive to the transit stop (the access leg); the ride on the transit vehicle itself; and the walk, bike, or drive from the alighting stop to the final destination (the egress leg). This separation was based on routing data that identified where the mode of travel changed.

To do this, the system used the Google Routes API (Google API) to identify the most likely path between the trip’s origin and destination, then classified each segment of that path as walk, bike, drive, or transit based on the routing profile. This step produced separate trip records for each segment of a transit journey. Only rMove-recorded trips were subject to transit unlinking; manually added trips and trips recorded through the online survey instrument was not processed through this step. The Post-Review Processing step described later performs a second round of transit unlinking that applies to all trip types, including those not processed through this initial automated unlinking.

4.3 Analyst Review

After automated processing, trained analysts reviewed all flagged trips using a web-based editing tool that displayed each trip on a map alongside its GPS trace and survey responses. The goal of analyst review was to help the final trip table accurately reflect the travel that actually occurred, correcting errors that automated processing could not resolve.

Analysts performed four primary types of edits:

  • Dropping invalid trips. Some GPS traces were recorded as trips by the application when no actual travel occurred – for example, when a phone drifted in a parking garage or when a brief walk to the mailbox was detected. Analysts removed these records.

  • Joining trip fragments. Occasionally a single trip was recorded as two or more separate trips due to GPS signal loss (for example, when a participant entered a tunnel or a large building). Analysts merged these fragments back into the trip they represented.

  • Splitting trips with unreported stops. When a participant made an intermediate stop during a trip – such as stopping for coffee on the way to work – but the application recorded it as a single trip, analysts split it into two trips with the correct stop location and times.

  • Reviewing transit trips. Analysts verified that transit trips had been correctly separated into access, transit, and egress segments, and adjusted the segment boundaries if the automated unlinking produced incorrect results.

After analyst review, households with no remaining flagged trips were marked as ready for post-review processing.

4.4 Post-Review Processing

Post-review processing transformed the analyst-reviewed trip records into the final tables that make up the delivered dataset. This phase involved extensive cleaning, quality checks, and enrichment of the data. Steps included table construction, trip cleaning, location processing, distance derivation, geographic enrichment, and purpose assignment and imputation. Each of these steps is described in detail below.

Table Construction

The survey platform stores data in a “normalized” (long-format) database structure optimized for data collection. Post-review processing reshaped these records into the “denormalized” (wide-format), analysis-ready tables familiar to data users: household, person, day, trip, and vehicle. During this step, variables were renamed to standard conventions and identifiers were standardized across tables.

Post-Review Trip Cleaning

Post-review cleaning addressed a range of data quality issues that remained after analyst review. The cleaning process proceeded through several sub-steps:

  1. Missing coordinates. A small number of trips may have missing origin or destination coordinates. Where possible, these were filled from nearby GPS points in the trip’s trace.

  2. Travel period enforcement. Trips that fell outside the participant’s assigned travel period were removed. The travel day boundary was set at 3:00 AM rather than midnight, so a trip departing at 1:00 AM was assigned to the previous calendar day’s travel.

  3. Zero or negative duration. Trips with nonsensical durations were removed.

  4. Transit segment processing. Transit trips were separated into their component legs using the routing data produced during automated processing. Each segment received its own origin, destination, departure time, and arrival time.

  5. Spatial gap cleaning. When the destination of one trip and the origin of the next trip were far apart but the trips themselves appeared to be near-duplicates (similar origins and destinations, similar times), one of the duplicate trips was removed. This addressed situations where overlapping device recordings or delayed survey submissions produced redundant trip records.

  6. Overlapping trip resolution. Trips whose time windows overlapped were resolved through an iterative set of rules that favored completed surveys over incomplete ones, longer trips over shorter ones, and trips with reasonable speeds over those with extreme speeds.

  7. TNC trip correction. In some cases, analysts split ride-hailing trips (such as Uber or Lyft) into separate segments during review. Because a ride-hailing trip is a single journey from the passenger’s perspective, these segments were automatically merged back together.

  8. Loop trip splitting. A loop trip is one where the participant departs from and returns to the same location – for example, a jog around the neighborhood or a drive to run multiple errands that ends back at home. When the GPS trace revealed a clear outbound and return path, the loop was split at the point farthest from the origin. This produced two trips: one outbound and one return. This is important for modeling because the outbound and return portions of a loop trip often serve different purposes or pass through different areas.

  9. Dwell time calculation. The time spent at each destination (the interval between arriving at one location and departing for the next trip) was calculated and stored as dwell_mins and dwell_time_hr.

Proxy and Copied Trips

In a household travel survey, not every household member carries a smartphone or directly reports their own travel. Young children, for example, typically do not have their own devices. Instead, an adult household member, called a proxy, reports travel on the child’s behalf. In most cases, this means the child was traveling with the adult; the child’s trip record was therefore created as a copy of the adult’s trip.

These copied trips were created during automated processing (before analyst review) and were preserved through all subsequent processing steps. A copied trip has the same GPS trace, origin, destination, departure time, arrival time, and distance as the trip it was copied from – only the person identifier differs. The flag copied_from_proxy identifies these records.

Analysts should be aware that copied trips will produce identical geometries and travel times for multiple household members. This is expected and correct: it reflects the fact that those individuals were traveling together. Proxy-copied trips are distinct from TICTOC joint trip imputations, which are created later in processing based on a different set of rules (see Section 4.5).

Location Processing and Distance Derivation

Raw GPS traces can contain hundreds of individual location points for each trip. Location processing cleaned these traces and prepared them for distance and duration calculations.

The location-processing step included:

  • Removing erroneous points, including points flagged as untrustworthy by rMove.
  • Eliminating duplicate points from the GPS trace.
  • Imputing start and end points where the GPS trace did not perfectly align with the trip’s reported departure and arrival.

After cleaning, the trace data was used to recalculate distance and duration measures for GPS-tracked trips.

  • distance_m was calculated as the sum of straight-line distances between consecutive points along the cleaned GPS trace. This trace-based distance represents the approximate path the traveler followed. Units: meters.
  • distance_beeline_m was calculated as the direct straight-line distance from origin to destination. Unlike distance_m, which depends on the available trip geometry, distance_beeline_m is calculated consistently from the trip origin and destination and is retained for comparison and quality assurance. Units: meters.
  • distance_miles is derived from distance_m by converting meters to miles (1 mile = 1,609.34 meters). Units: miles.
  • duration_s was recalculated from the cleaned GPS timestamps. Units: seconds.

For trips without a usable GPS trace, including participant-added trips, analyst-added trips, and trips collected through the online survey instrument, distance_m could not be calculated from observed trace points. These trips were processed separately using origin-destination network routing, described below.

Origin-Destination Routing for Non-GPS Trips

Origin-destination routing was used for trips that had only a reported origin and destination and no full GPS trace. This included manually added trips and trips recorded through the online survey instrument.

To estimate a realistic path distance for these trips, origins and destinations were routed through the Open Source Routing Machine (OSRM), a routing engine built on OpenStreetMap data.

The routing process used:

  • Origin and destination coordinates as the required inputs.
  • Mode-specific routing profiles for automobile, bicycle, and pedestrian travel.
  • Shortest feasible network paths between each trip’s endpoints.

The routing step produced distance_m, a network-based path distance.

For GPS-tracked trips, distance_m values were derived from the cleaned GPS trace by summing point-to-point distances along the observed path. For trips without GPS traces, OSRM origin-destination routing provided an analogous path-based distance rather than a simple straight-line distance.

As a result, the delivered distance_m field represents the best available path-distance estimate for each trip:

  • GPS-tracked trips: observed trace distance.
  • Trips with only origin and destination coordinates: routed network distance.

The separate distance_beeline_m field provides a consistent straight-line origin-to-destination distance for comparison and quality assurance.

%%{init: {"theme":"base","flowchart":{"htmlLabels":true,"curve":"basis","nodeSpacing":36,"rankSpacing":52},"themeVariables":{"fontFamily":"Bai Jamjuree, Arial, sans-serif"}}}%%
flowchart TB

    subgraph ALL["<b>All trips</b>"]
        direction LR

        subgraph GPS["<b>GPS-Tracked Trips</b>"]
            direction TB
            TRACE("<span style='font-size:1.02em; font-weight:700;'>Cleaned GPS trace</span><br/><span style='font-size:0.88em; font-weight:400;'>(cleaned location points)</span>"):::gpsNode
            SUMHAV("<span style='font-size:1.02em; font-weight:700;'>Summed trace distance</span><br/><span style='font-size:0.88em; font-weight:400;'>point-to-point haversine distances</span>"):::gpsNode
            TRACE --> SUMHAV
        end

        BEELINE("<span style='font-size:1.02em; font-weight:700;'>distance_beeline_m</span><br/><span style='font-size:0.88em; font-weight:400;'>= haversine(origin,<br/>destination)</span>"):::beelineNode

        subgraph NONGPS["<b>Non-GPS Trips</b>"]
            direction TB
            OD("<span style='font-size:1.02em; font-weight:700;'>Origin + Destination</span><br/><span style='font-size:0.88em; font-weight:400;'>coordinates only</span>"):::manualNode
            OSRM("<span style='font-size:1.02em; font-weight:700;'>OSRM shortest path</span><br/><span style='font-size:0.88em; font-weight:400;'>network route</span>"):::manualNode
            OD --> OSRM
        end
    end

    DM("<span style='font-size:1.05em; font-weight:700;'>distance_m</span><br/><span style='font-size:0.89em; font-weight:400;'>path-distance estimate</span>"):::outputNode
    MILES("<span style='font-size:1.03em; font-weight:700;'>distance_miles</span><br/><span style='font-size:0.89em; font-weight:400;'>= distance_m / 1,609.34</span>"):::outputNode
    SPEED("<span style='font-size:1.03em; font-weight:700;'>speed_mph</span><br/><span style='font-size:0.89em; font-weight:400;'>= distance_miles /<br/>(duration_s / 3,600)</span>"):::outputNode

    SUMHAV --> DM
    OSRM --> DM
    DM --> MILES
    MILES --> SPEED

    classDef gpsNode fill:#FFF7E8,stroke:#F4A300,color:#232323,font-family:Bai Jamjuree,stroke-width:2.6px,font-weight:bold,fill-opacity:0.98
    classDef manualNode fill:#FFF0EB,stroke:#E94B2E,color:#232323,font-family:Bai Jamjuree,stroke-width:2.6px,font-weight:bold,fill-opacity:0.98
    classDef beelineNode fill:#F6F6F6,stroke:#9CA3AF,color:#232323,font-family:Bai Jamjuree,stroke-width:2.4px,font-weight:bold,fill-opacity:0.98
    classDef outputNode fill:#F7F7F7,stroke:#9CA3AF,color:#232323,font-family:Bai Jamjuree,stroke-width:2.4px,font-weight:bold,fill-opacity:0.98

    style ALL fill:#FCFCFB,stroke:#A8A29E,color:#232323,fill-opacity:0.42,stroke-width:2px
    style GPS fill:#FFF7E8,stroke:#F4A300,color:#232323,fill-opacity:0.24,stroke-width:2.2px
    style NONGPS fill:#FFF0EB,stroke:#E94B2E,color:#232323,fill-opacity:0.24,stroke-width:2.2px

    linkStyle 0 stroke:#F4A300,stroke-width:4px,stroke-linecap:round
    linkStyle 1 stroke:#E94B2E,stroke-width:4px,stroke-linecap:round
    linkStyle 2 stroke:#F4A300,stroke-width:4px,stroke-linecap:round
    linkStyle 3 stroke:#E94B2E,stroke-width:4px,stroke-linecap:round
    linkStyle 4 stroke:#6B7280,stroke-width:4px,stroke-linecap:round
    linkStyle 5 stroke:#6B7280,stroke-width:4px,stroke-linecap:round

Figure 2: Trip distance derivation for GPS-tracked and manually added trips.

Table 8 summarizes the distance source used for each major trip type in the delivered data.

Distance derivation by trip type
Trip Type distance_m Source distance_beeline_m Notes
GPS-tracked trips Cleaned GPS trace Haversine O-D Observed path distance from cleaned trace points
Manually added trips OSRM origin-destination network route Haversine O-D Only origin and destination were available; no trace
Online survey trips OSRM origin-destination network route Haversine O-D Only origin and destination were available; no trace
Split loop legs Cleaned GPS trace, where trace geometry was available Haversine O-D per leg Each leg was processed independently
Unlinked transit legs Cleaned GPS trace, where trace geometry was available Haversine O-D per leg May have been sparse if portions of the trip occurred underground
Synthetic access/egress NA NA Zero-distance placeholder
distance_m was a path-distance measure: cleaned trace distance for GPS-tracked trips and OSRM origin-destination network distance for trips without usable trace geometry. distance_beeline_m was calculated consistently as the direct origin-to-destination distance and was provided for comparison and quality assurance.
Table 8: Distance derivation by trip type.

Geographic Enrichment

Trip origins and destinations, home locations, and habitual work and school locations were assigned to U.S. Census geographic units through spatial point-in-polygon joins. Table 9 summarizes the identifiers added during this step.

Geographic variables added during spatial enrichment
Variable Pattern Geography Applied To
*_bg_2020 Census Block Group (2020) Home, work, school, trip O/D
*_puma_2022 Public Use Microdata Area (2022) Home, work, school, trip O/D
*_county County (derived from block group) Home, work, school, trip O/D
*_state State (derived from block group) Home, work, school, trip O/D
o_in_region, d_in_region Study region boundary Trip O/D
Table 9: Geographic variables added during spatial enrichment.

These identifiers enable geographic analysis at multiple levels without requiring users to perform their own spatial operations.

Purpose Assignment and Imputation

Each trip in the dataset has a destination purpose (d_purpose) describing why the traveler went to that location, for example, going to work, shopping, or returning home. Respondents report the destination purpose in the trip survey. The origin purpose (o_purpose) is generally derived from the previous trip’s destination purpose, reflecting the activity the traveler was engaged in before departing.

Purpose assignment involves several steps:

  1. Purpose cleaning. Purposes on split loop trips were corrected: the return leg was assigned the purpose of the location the traveler was returning to (typically the same as the purpose two trips prior). Unlinked transit segments were assigned a purpose of “change mode.”

  2. Purpose categorization. Detailed purpose codes from the survey were grouped into broader purpose categories (e.g., “work,” “school,” “shopping,” “social/recreation”) to support aggregate analysis.

  3. Open-ended purpose classification. When a participant selected “other” as the trip purpose and provided a free-text description, that description could be assigned to one of the standard purpose categories used in the dataset using a language model. This step reduced the number of uncategorized “other” trips while preserving a consistent set of purpose categories for analysis. If no suitable standard category could be identified, the trip remained classified as “other.”

  4. Habitual location matching. Trip endpoints were compared to known home, work, and school locations. If a trip’s destination fell within 100 meters of the participant’s home, it was classified as a home location; similar thresholds applied to work and school. When a trip was reported as “work” but the destination was far from any known work location, it was reclassified as “work-related” to distinguish between commute trips and trips to secondary work sites.

  5. Purpose imputation. Respondents report the purpose of each trip destination, and the origin purpose is generally derived from the destination purpose of the previous trip. During processing, a rules-based algorithm identified trips whose reported purposes appeared inconsistent with their locations or with the surrounding trip sequence and corrected them where appropriate. For example, a trip ending at the participant’s home but reported as “shopping” would be reclassified. This processing could include location-based corrections, derived values for analyst-split trips, and broader imputations when reported purposes were missing or implausible. The imputation algorithm iterated across related trips to resolve chains of dependencies.

Table 10 summarizes the main purpose-assignment and imputation outcomes reflected in the delivered trip data.

Purpose assignment and imputation outcomes
Label Description
Reported Purpose as reported by participant, unchanged
Location-corrected Reported purpose conflicted with proximity to a habitual location (e.g., reported 'work' but location is home)
AI-classified Participant selected 'other' and provided text; the text was assigned to a standard purpose category using a language model when automated coding was used
Split loop Return leg of a split loop trip; purpose set to match the location being returned to
Algorithm-imputed Purpose assigned by the iterative imputation algorithm based on location type, dwell time, and trip sequence
Linked transit Purpose set to 'change mode' during transit trip linking
Incomplete survey Trip survey was not completed; purpose defaulted to 'other'
Browser/proxy Trip was not processed through imputation (browser-move or non-participant copy)
These categories describe how trip purposes may remain as reported or be modified during processing.
Table 10: Purpose assignment and imputation outcomes.

Delivered purpose columns. After processing, the trip table includes both the originally reported purpose fields and the final delivered purpose fields for origins and destinations. The final detailed-purpose columns and final purpose-category columns are paired outputs of the same processing pipeline: they are delivered together and intended to remain consistent with one another. The reported fields preserve the pre-imputation values for comparison. Table 11 summarizes those delivered purpose fields.

Delivered purpose columns on the trip table
Column Content
d_purpose / o_purpose Final imputed detailed purpose code. Use these columns when detailed-purpose distinctions are needed.
d_purpose_category / o_purpose_category Grouped category paired with the final imputed purpose. Derived from the same imputation as `*_purpose`, not from a separate downstream recode.
d_purpose_reported / o_purpose_reported Originally reported detailed purpose before reclassification and imputation. Provided for comparison and quality assurance.
d_purpose_category_reported / o_purpose_category_reported Grouped category corresponding to the originally reported purpose.
Use the final *_purpose or *_purpose_category columns for analysis, depending on the level of detail needed. These final columns are designed to stay in sync; the _reported columns preserve the pre-imputation values.
Table 11: Delivered purpose columns on the trip table.

In most cases, the final and reported purposes are identical. They differ only when processing reclassified or imputed purpose values. The final detailed-purpose and purpose-category fields are intended to agree with each other, though open-ended “other” purposes may require additional analyst caution.

Origin purpose on first trips. Because o_purpose is derived from the previous trip’s destination purpose, the first trip of each person’s travel period has no preceding trip to draw from. In the post-review processed data, o_purpose for first trips will be missing (NA).

During TICTOC processing, origin purposes were recalculated after trip imputation. TICTOC set o_purpose to the previous trip’s d_purpose only when the trip’s origin was spatially consistent with the previous trip’s destination (i.e., within a configurable distance buffer). First trips of the day and trips with a spatial gap from the previous destination retained their existing o_purpose value and were not overwritten. Analysts filtering or tabulating on o_purpose should account for these missing values on first trips.

Mode Type Assignment

The survey asked respondents to select all modes used on each trip from a checkbox list. Respondents could select as many modes as applied. The first four selections are preserved in the delivered unlinked trip table as mode_1, mode_2, mode_3, and mode_4. These columns are unordered: mode_1 is simply the first-reported mode, not a primary or dominant one. For most analyses, including mode share, use mode_type rather than the mode_n columns directly.

mode_type is derived by applying a priority hierarchy across all populated mode_n columns on each trip. When a respondent selected more than one mode, the mode with the highest priority value wins. For example, if a respondent selected both walk and transit, the trip is assigned mode_type = 5 (Transit), because transit outranks walk in the hierarchy. In the rare cases where a respondent selected more than four modes, the mode_type assignment may not correspond to the first four mode_n values.

mode_priority records the numeric priority of the winning mode, and is useful for confirming which mode was selected when a trip has multiple mode_n values populated.

Table 12 shows the full crosswalk of detailed survey mode codes to mode_type groups, in priority order.

Detailed mode to mode_type crosswalk
mode_type Detailed Mode Value Detailed Mode
Walk
1 1 Walk (or jog/wheelchair)
1 43 Skateboard or rollerblade
Bike
2 2 Standard bicycle (my household's)
2 3 Borrowed bicycle (e.g., a friend's)
2 4 Other rented bicycle
2 56 Other personal bicycle (e.g., cargo, tandem, etc.)
2 82 Electric bicycle (my household's)
2 103 Bicycle or e-bicycle
2 107 Micromobility (e.g., scooter, moped, skateboard)
Bike Share
3 69 Bike-share - standard bicycle
3 70 Bike-share - electric bicycle
Scooter Share
4 73 Moped-share (e.g., Scoot)
4 74 Segway
4 83 Scooter-share (e.g., Bird, Lime)
Taxi
5 36 Regular taxi (e.g., Yellow Cab)
5 60 Other hired car service (e.g., black car, limo)
Tnc
6 49 Uber, Lyft, or other smartphone-app ride service
6 106 Uber/Lyft, taxi or car service
Other
7 5 Other
7 27 Paratransit/Dial-A-Ride (e.g., The RIDE)
7 44 Golf cart
7 45 ATV
7 75 Other
7 77 Personal scooter or moped (not shared)
7 80 Other boat (e.g., kayak)
7 81 Snowmobile
7 104 Other
Car
8 6 Household vehicle 1
8 7 Household vehicle 2
8 8 Household vehicle 3
8 9 Household vehicle 4
8 10 Household vehicle 5
8 11 Household vehicle 6
8 12 Household vehicle 7
8 13 Household vehicle 8
8 14 Household vehicle 9
8 15 Household vehicle 10
8 16 Other vehicle in household
8 17 Rental car
8 22 Other vehicle (not my household's)
8 33 Car from work
8 34 Friend/relative/colleague's car
8 47 Other motorcycle in household
8 54 Other motorcycle (not my household's)
8 68 Cable car or streetcar
8 100 Household vehicle (or motorcycle)
8 101 Other vehicle (e.g., friend's car, rental, carshare, work car)
Car Share
9 18 Carshare service (e.g., Zipcar)
9 59 Peer-to-peer car rental (e.g., Turo)
9 76 Carpool match (e.g., Waze Carpool)
School Bus
10 24 School bus
Shuttle Vanpool
11 21 Vanpool
11 26 Other private shuttle/bus (e.g., a hotel's, an airport's)
11 38 University/college shuttle/bus
11 62 Employer-provided shuttle/bus
Ferry
12 78 Other public ferry or water taxi
12 79 Vehicle ferry (took vehicle on board)
Transit
13 23 Local bus
13 28 Other bus
13 30 Subway
13 39 Light rail
13 42 Other rail
13 55 Express/commuter bus
13 58 Commuter rail
13 61 Rapid transit bus (BRT)
13 102 Bus, shuttle, or vanpool
13 105 Rail (e.g., train, subway)
Ld Passenger
14 25 Intercity bus (e.g., Greyhound)
14 31 Airplane/helicopter
14 41 Intercity rail (e.g., Amtrak)
Higher mode_type values take priority when multiple modes are reported on a single trip. mode_type on unlinked transit access and egress legs is assigned by the routing engine rather than from the participant's original survey response.
Table 12: Detailed mode_n to mode_type crosswalk, in priority order.

Secondary Transit Trip Unlinking

Initial pre-review processing used the Google Routes API to split rMove-recorded transit trips into their component legs. To split manually added, analyst-added, and online survey trips into access, transit, and egress legs, a secondary unlinking process was applied to the unlinked trip table after analyst review. This unlinking step identified transit trips without user-recorded or Google-derived leg splits and applied a set of rules to create “synthetic” access and egress legs where needed. This ensured that all transit trips had a consistent structure for downstream processing, even if the original survey response did not capture the full set of legs.

The unlinking algorithm applied a set of rules based on:

  • Whether consecutive trips had a short dwell time between them (consistent with a transfer rather than a true stop)
  • Whether the mode sequence suggested access/transit/egress
  • Whether the destination purpose was “change mode” (indicating a transfer point)

Unlinking returned flags that identify each trip’s role in the linked journey: is_access, is_egress, is_transit_leg, and is_primary_leg (the highest-priority mode segment).

For transit trips that were missing an access or egress segment – for example, because the walk to the bus stop was too short to be detected – a synthetic zero-distance leg was created as a placeholder to maintain a consistent data structure. These synthetic legs were flagged with transit_quality_flag values of “SA” (synthetic access) or “SE” (synthetic egress).

Mode codes on unlinked transit legs. When a transit trip was separated into access, transit, and egress segments, the mode on each segment was reassigned to reflect the actual travel mode of that segment (e.g., walk for an access leg, bus for the transit leg) based on the routing engine’s classification. As part of this reassignment, the original survey-reported mode codes (mode_1 through mode_N) on affected segments were cleared and set to a missing/not-applicable value (995). Analysts querying the detailed mode columns on transit access or egress legs will find these values blank; the mode_type column contains the correct grouped mode for each segment.

Derived Travel Variables

After all cleaning and linking was complete, the following derived variables were calculated:

  • distance_miles – trip distance converted from meters to miles
  • speed_mph – derived from distance and duration
  • duration_minutes – duration in minutes
  • depart_hour, depart_minute, arrive_hour, arrive_minute – time components for modeling software
  • travel_dow – day of week
  • mode_type – grouped mode category (e.g., car, transit, walk, bike). See Section 4.4.8.
  • mode_priority – the highest-priority mode used on the trip
  • speed_flag – indicator for trips with speeds exceeding plausible thresholds for their mode
  • teleport – indicator for spatial discontinuities between consecutive trips (destination of one trip is far from origin of the next)

Driver status. The driver variable indicates whether the trip-maker was the driver or a passenger for automobile trips. During processing, this variable was adjusted in two cases. First, if a person reported as “driver” was under the minimum driving age or did not hold a driver’s license, they were reclassified as a passenger. Second, if an automobile trip included exactly one licensed household member, that person was imputed as the driver regardless of their original response. These corrections maintained consistency between the driver variable and the person table’s age and license fields. Analysts performing auto occupancy or driver/passenger analyses should be aware that some driver values reflect these corrections rather than the original survey response.

Quality Assurance

A comprehensive suite of automated quality checks was applied to the final tables, producing an HTML diagnostic report and a CSV file of test results. Checks included verification of referential integrity across tables (every trip belongs to a valid person and day), consistency of trip counts, plausibility of speeds and distances, and completeness of required fields. A draft codebook documenting all variables and their value labels was also generated automatically.

4.5 TICTOC Processing

TICTOC (Trip Imputation, Coordination, and Tour Organization Compiler) prepared household travel survey data for travel forecasting by imputing selected missing trips, coordinating joint household travel, organizing unlinked trips into linked trips and tours, and adding model-facing attributes to the household, person, day, trip, linked trip, and tour outputs. Before organizing trips into linked trips and tours, TICTOC performed location-purpose correction, imputed missing joint trips for household members who traveled together but did not independently report the trip, and imputed selected child school trips when survey responses indicated school attendance but no corresponding trip was present. Each of these steps is described in detail in the sections that follow.

Joint Trip Detection and Imputation

Household travel surveys rely on individual participants to report their own travel. When household members travel together – for example, a parent driving children to school – each person’s trips should appear in the data. In practice, non-participant household members (particularly young children) often have missing or incomplete trip records.

TICTOC addresses this in two steps:

  1. Detecting joint trips. For each household, the system examined all pairs of household members and identified trips that overlapped in both time and space. Trips were considered joint if they departed from and arrived at similar locations within similar time windows.

  2. Imputing missing joint trips. When one household member had a reported trip that indicated joint travel with another member who did not have a corresponding trip, a new trip record was created for the missing member. The imputed trip was created from the host trip and populated using project-specific column-action rules that specified which attributes were copied directly from the host trip, which were filled with person-specific values (such as the target person’s demographics), and which received default or sampled values. Imputation occurred only when:

    • The host trip indicated joint travel with the target person
    • The imputed trip would not overlap with the target person’s existing trips
    • The imputed trip would not create a spatial discontinuity (teleport) in the target person’s travel chain
    • The host trip had not been dropped or flagged as invalid

School Trip Imputation

Children’s school trips are among the most commonly underreported trip types in household travel surveys, particularly for younger children who do not carry smartphones. TICTOC imputed school trips when survey responses indicated that a child attended school and no corresponding school trip was present in the data. Trips were not imputed when the available responses indicated that the child did not attend school, attended a different or unknown school location, was home-schooled, or was not a student.

Imputed school trips used the child’s reported home and school locations from the person table, the household’s reported usual school travel mode where available, and sampled departure times. Other trip attributes–such as occupancy, driver status, and mode details–were derived from project-specific rules rather than copied from another person’s trip. This distinguished school trip imputation from joint trip imputation, where most attributes came from a host trip.

Trip Linking

After trip imputation, TICTOC organized trips into linked trips. Linked trips were constructed from one or more unlinked trips, namely where “change mode” purposes indicated that multiple segments belonged to a single journey. The TICTOC-derived linked trip records were used downstream for tour organization, origin-destination analysis, and weighting. Each trip_linked record summarized one or more unlinked trip records into a single journey with the origin of the first segment, the destination of the last segment, and journey-level mode, distance, and duration.

Figure 3 illustrates this linked trip structure for downstream TICTOC processing.

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%
flowchart LR
    subgraph UNLINKED["Unlinked Trips (trip table)"]
        direction LR
        T1["Trip 1: Walk
        Home -> Bus Stop
        purpose: change_mode
        is_access: 1"]
        T2["Trip 2: Bus
        Stop A -> Stop B
        purpose: change_mode
        is_transit_leg: 1"]
        T3["Trip 3: Walk
        Bus Stop -> Work
        purpose: work
        is_egress: 1"]
        T1 --> T2 --> T3
    end

    subgraph LINKED["Linked Trip (trip_linked table)"]
        LT["Linked Trip
        Mode: Transit (bus)
        Home -> Work
        Access: Walk | Egress: Walk
        Distance: sum of all segments"]
    end

    UNLINKED --> LINKED

    classDef linked_trip fill:#F68B1F,stroke:#C66916,color:#000000,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
    classDef unlinked_trip fill:#E4572E,stroke:#BA3F21,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92

    style UNLINKED fill:#E4572E,stroke:#BA3F21,color:#ffffff,fill-opacity:0.14,stroke-width:1.75px
    style LINKED fill:#F68B1F,stroke:#C66916,color:#000000,fill-opacity:0.16,stroke-width:1.75px

    linkStyle default stroke:#475569,stroke-width:2.25px

    class T1,T2,T3 unlinked_trip
    class LT linked_trip

Figure 3: Unlinked transit segments summarized into a single linked transit trip for downstream TICTOC processing.
Note

Multi-leg transit trips and intermediate transfer segments.

The example above shows a simple three-segment journey (access → transit → egress). Some linked trips include multiple transit vehicles — for example: bike to bus stop, bus leg, short transfer, second bus leg.

For these multi-leg linked trips:

  • is_access is assigned only to the first leg, when it is non-transit and immediately precedes a transit leg.
  • is_egress is assigned only to the last leg, when it is non-transit and immediately follows a transit leg.
  • An intermediate non-transit segment between two transit legs — such as a walk or bike transfer between buses — will appear as its own row in trip_unlinked under the same linked_trip_id, but will carry none of these flags: it is not is_access, not is_egress, and not is_transit_leg.

These intermediate segments are present in the unlinked trip table, but they should be treated as an occasional byproduct of the Google Routes API un-linking process or unusually detailed participant recording of trips rather than a guaranteed analytical construct. Their detailed mode fields (mode_1 through mode_N) are typically set to 995 (not applicable); mode_type reflects the routing engine’s classification (usually walk or bike).

To identify intermediate transfer-like segments in trip_unlinked, filter to rows where all four conditions hold:

  • is_transit == 1 — part of a transit linked trip
  • is_transit_leg == 0 — not the transit vehicle leg
  • is_access == 0
  • is_egress == 0

Time and distance for these segments are included in the linked trip totals in trip_linked.

Linked Trip Mode Assignment

TICTOC assigned a single mode to each linked trip through a two-step process that operated on the raw survey mode values (mode_n columns) across all constituent unlinked trips — not on the mode_type variable derived during post-review processing.

Step 1: Group raw survey modes. TICTOC collected every populated mode_n value across all unlinked segments belonging to the linked trip and mapped each value to an intermediate mode group using the project-configurable crosswalk in Table 13. For example, a respondent who selected “Local bus” (value 23) and “Walk” (value 1) on separate segments would contribute both LOCAL and WALK to the mode group set. All unique mode groups present across all segments were collected into a single set for use in Step 2.

Step 1: Survey mode value to mode group crosswalk
Each raw mode_n value is mapped to an intermediate group before the hierarchy is applied
Survey Mode Value Survey Mode Label
SCHOOLBUS
24 School bus
LONGDIST
25 Intercity bus (e.g., Greyhound)
31 Airplane/helicopter
41 Intercity rail (e.g., Amtrak)
REGIONAL
55 Express/commuter bus
58 Commuter rail
78 Other public ferry or water taxi
79 Vehicle ferry (took vehicle on board)
LOCAL
23 Local bus
26 Other private shuttle/bus (e.g., a hotel's, an airport's)
28 Other bus
30 Subway
38 University/college shuttle/bus
39 Light rail/trolley
42 Other rail
61 Rapid transit bus (BRT)
62 Employer-provided shuttle/bus
102 Bus, shuttle, or vanpool
105 Rail (e.g., train, subway)
DRIVE
6 Household vehicle 1
7 Household vehicle 2
8 Household vehicle 3
9 Household vehicle 4
10 Household vehicle 5
11 Household vehicle 6
12 Household vehicle 7
13 Household vehicle 8
14 Household vehicle 9
15 Household vehicle 10
16 Other vehicle in household
17 Rental car
18 Carshare service (e.g., Zipcar)
21 Vanpool
22 Other vehicle (not my household's)
27 Medical transportation service
33 Car from work
34 Friend/relative/colleague's car
47 Other motorcycle in household
54 Other motorcycle (not my household's)
59 Peer-to-peer car rental (e.g., Turo)
76 Carpool match (e.g., Waze Carpool)
100 Household vehicle (or motorcycle)
101 Other vehicle (e.g., friend's car, rental, carshare, work car)
BIKE
2 Standard bicycle (my household's)
3 Borrowed bicycle (e.g., a friend's)
4 Other rented bicycle
56 Other personal bicycle (e.g., cargo, tandem, etc.)
82 Electric bicycle (my household's)
103 Bicycle or e-bicycle
PERSONAL MOBILITY
43 Skateboard or rollerblade
44 Golf cart
45 ATV
74 Segway
77 Personal scooter or moped (not shared)
80 Other boat (e.g., kayak)
81 Snowmobile
83 Scooter-share (e.g., Bird, Lime)
107 Micromobility (e.g., scooter, moped, skateboard)
TNC
36 Regular taxi (e.g., Yellow Cab)
49 Uber, Lyft, or other smartphone-app ride service
60 Other hired car service (e.g., black car, limo)
106 Uber/Lyft, taxi or car service
200 Paratransit/Dial-A-Ride (e.g., The RIDE)
SHARED
69 Bike-share - standard bicycle
70 Bike-share - electric bicycle
WALK
1 Walk (or jog/wheelchair)
OTHER
5 Other
75 Other
104 Other
995 Missing Response
DRIVE is an intermediate group only and does not appear directly as a linked trip mode. It is further resolved into SOV, HOV2, or HOV3 in Step 2 based on vehicle occupancy.
Table 13: Step 1: Survey mode values to intermediate mode groups.

Step 2: Apply the mode hierarchy. Given the set of mode groups collected in Step 1, TICTOC walked down the priority-ordered hierarchy in Table 14 and assigned the linked trip the first mode group present in the set. This meant that a more “significant” mode always took precedence: a trip that included any transit segment would be classified as LOCAL or REGIONAL transit regardless of how many walk segments accompanied it, and a long-distance trip would outrank a local transit trip.

The one exception to a simple group-wins rule is DRIVE. When DRIVE was present in the mode group set, the final linked_trip_mode — SOV, HOV2, or HOV3 — was determined by the maximum number of travelers (num_travelers) reported across all constituent unlinked trips: a single traveler yields SOV, two travelers yields HOV2, and three or more yields HOV3.

If all mode_n values were missing across every segment of the linked trip, the linked_trip_mode was set to missing. This is the case for incomplete survey responses where no mode information is available. Analysts can filter out these records when analyzing mode share or apply imputation rules as needed; these records are unweighted and therefore automatically excluded from weighted analyses.

Step 2: Linked trip mode hierarchy
TICTOC assigns the first matching mode group present in the set
Linked Trip Mode Priority (1 = highest)
SCHOOLBUS 1
LONGDIST 2
REGIONAL 3
LOCAL 4
HOV3 (3 or more person occupancy vehicle) 5
HOV2 (2-person occupancy vehicle) 6
SOV (Single-occupancy vehicle) 7
BIKE 8
PERSONAL MOBILITY 9
TNC 10
SHARED 11
WALK 12
OTHER 13
HOV3, HOV2, and SOV all derive from the DRIVE mode group. The split is determined by the maximum num_travelers across all constituent unlinked trips: 1 = SOV, 2 = HOV2, 3+ = HOV3. The priority order is project-configurable.
Table 14: Step 2: Linked trip mode hierarchy.

Example. Consider a linked transit trip consisting of three unlinked segments: a walk to the bus stop (mode_n value 1WALK), a local bus ride (mode_n value 23LOCAL), and a walk from the stop to the destination (mode_n value 1WALK). The mode group set is {WALK, LOCAL}. TICTOC checks the hierarchy from the top: SCHOOLBUS — not present; LONGDIST — not present; REGIONAL — not present; LOCAL — present. The linked trip is assigned mode LOCAL.

Linked Trip Purpose Assignment

TICTOC assigned a single destination purpose to each linked trip from the final unlinked segment’s d_purpose. Because the intermediate segments of a transit journey carry d_purpose = "change mode" by convention, the last segment’s destination purpose reflects the true trip destination — where the traveler actually ended up and why.

Origin purpose on the linked trip was taken from the first segment’s o_purpose, following the same convention used for unlinked trips.

As a result, analysts working with trip_linked can use d_purpose and o_purpose directly for trip-purpose summaries without needing to filter out “change mode” records, which are an artifact of the unlinked segment structure and do not appear on linked trip records.

Tour Organization

A tour is a sequence of trips that begins and ends at the same anchor location. Most tours are home-based – the traveler departs from home, makes one or more stops, and returns home. At-work subtours begin and end at the workplace (for example, leaving work for lunch and returning). Open-jawed tours occur when a travel day begins or ends away from home, so that the full home-to-home circuit is not completed within the observed day.

TICTOC organized all of a person’s daily trips into tours by:

  1. Identifying anchor locations (departures from and returns to home, or to work for subtours)
  2. Grouping intermediate trips into the tour they belong to
  3. Scoring candidate primary destinations based on activity duration, purpose, and trip characteristics using configurable scoring functions (see Section 4.5.4.1 below).
  4. Assigning a tour purpose based on the primary destination
  5. Identifying sub-tours within parent tours
  6. Classifying each person’s daily activity pattern (e.g., “mandatory” for days with work/school tours, “non-mandatory” for discretionary travel only, “home” for days with no travel)

Figure 4 illustrate how TICTOC organized unlinked trips into tours and subtours for a closed and open-jawed example.

Home-based work tour with at-work subtour:

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%
flowchart LR
    HOME(("Home")) -- "Trip 1: Car" --> WORK["Work"]
    WORK -- "Trip 2: Walk" --> LUNCH["Lunch"]
    LUNCH -- "Trip 3: Walk" --> WORK
    WORK -- "Trip 4: Car" --> HOME

    classDef household fill:#024D5F,stroke:#013845,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef person fill:#0D7993,stroke:#085C70,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef tour fill:#FDD835,stroke:#C2A200,color:#000000,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90

    linkStyle default stroke:#475569,stroke-width:2.4px

    class HOME household
    class WORK person
    class LUNCH tour

Open-jawed tour (day begins away from home):

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%
flowchart LR
    WORK(("Work
    (day starts here)")) -- "Trip 1: Car" --> GROCERY["Grocery"]
    GROCERY -- "Trip 2: Car" --> HOME(("Home"))

    classDef household fill:#024D5F,stroke:#013845,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef person fill:#0D7993,stroke:#085C70,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef tour fill:#FDD835,stroke:#C2A200,color:#000000,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90

    linkStyle default stroke:#475569,stroke-width:2.4px

    class HOME household
    class WORK person
    class GROCERY tour

Figure 4: Examples of home-based and open-jawed tours used in TICTOC tour organization.

Tour Purpose Scoring and Primary Destination

Tour purpose was determined by the primary destination — the most “important” stop on the tour.
Because a tour includes multiple trips, one destination was selected as the representative stop, and its purpose became the tour purpose.

For tours with mandatory travel (work or school), primary-destination selection was straightforward: the mandatory stop took precedence.

Complexity was higher for discretionary tours. For example, a tour might include two shopping stops (e.g., grocery store and auto shop), where purpose alone did not clearly identify which stop was primary.

TICTOC resolved this using a weighted penalty method:

  • Each candidate destination was scored using a decay function of duration, stratified by purpose.
  • The destination with the minimum penalty score was selected as primary.
  • Duration was defined as:
    • time spent at the destination (dwell time), plus
    • duration of the preceding trip, plus
    • duration of the subsequent trip.

This approach gave more favorable scores to destinations reached via longer trips or with longer dwell times, reflecting greater relative importance within the tour.

Purpose priority was built into scoring:

  • Mandatory destinations (work, school) received higher base scores than discretionary activities, ensuring a work or school stop was selected whenever present.
  • Discretionary destinations were differentiated by the duration-based decay function, with scoring functions stored in project-configurable files delivered with the data.

The assigned tour purpose was taken from the primary destination’s purpose category (e.g., “work,” “school,” “shop”).

At-work subtours — sequences that departed from and returned to the workplace within a parent tour — were identified separately and assigned their own tour purpose using the same scoring logic.

Additional TICTOC Outputs

TICTOC appended model-facing fields to the household, person, day, and trip tables, and produced new linked trip and tour tables. Key additions include:

  • Daily activity pattern (daily_activity_pattern) on the day table, classifying each person-day as mandatory, non-mandatory, or home
  • Tour identifiers (tour_id, tour_num) on trips and the tour table
  • Linked trip identifiers (linked_trip_id) connecting unlinked trip segments to their linked trip record
  • Stop counts on tours, indicating the number of intermediate stops
  • Joint travel indicators identifying which trips were taken with other household members
  • Escorting attributes identifying trips where one household member accompanies another
  • Imputation flags distinguishing reported trips from imputed joint and school trips (see Section 4.6 for details)
  • Summary diagnostics documenting imputation rates, tour distributions, and data quality metrics

4.6 Reference: Flags and Classifications

The following tables provide reference definitions for flags and classification variables used throughout the delivered tables.

Trip Flag Reference

Table 15 summarizes the main trip-level flags included in the delivered trip table.

Trip-level flags in the delivered trip table
Flag Values Meaning Filtering Guidance
browser 0/1 Trip created via browser survey (not GPS) Exclude for GPS-quality analysis
added_trip 0/1 Trip manually added by analyst or participant No GPS trace; OD-routed distance
split_loop 0/1 Trip created by splitting a loop trip Original loop no longer exists
unlinked_trip 0/1 Trip is a segment of a transit journey Use trip_linked for O-D analysis
is_primary_leg 0/1 Highest-priority mode leg in a linked trip Use to avoid double-counting linked trips
is_access / is_egress 0/1/995 Role in a linked transit trip (995 = not applicable) --
is_synthetic_transit_leg 0/1 Placeholder leg for missing access/egress Distance and duration are NA
speed_flag 0/1 Speed exceeds plausible threshold for mode Review or exclude
teleport 0/1 Gap >= 250m between destination and next trip's origin May indicate missing trip
copied_from_proxy 0/1 Trip record copied from a proxy reporter Same trace as reporter's trip
Table 15: Trip-level flags in the delivered trip table.

TICTOC Imputation Flags

Additional flags identify imputed and modified records. Table 16 summarizes the additional TICTOC-specific fields used to identify imputed and coordinated records.

TICTOC-specific flags and identifiers
Field Description
imputed_record_type Indicates whether the trip is reported (0), imputed as a joint trip, or imputed as a school trip
imputed_host_trip For joint trip imputations, the trip_id of the household member's trip that served as the basis for the imputed record
imputed_joint_trip Flag indicating whether this trip was created through joint trip imputation
joint_trip_id Identifier grouping household members who traveled together on the same trip
daily_activity_pattern Person-day classification: mandatory, non-mandatory, or home
These fields allow analysts to distinguish reported travel from imputed travel and to identify joint-travel episodes.
Table 16: TICTOC-specific flags and identifiers.

Joint Travel Taxonomy

TICTOC classifies joint travel at both the trip and tour level:

  • Non-joint: Trip or tour made by the person alone
  • Partially joint: Some but not all segments of the tour include another household member
  • Fully joint: All segments of the tour are shared with another household member
  • Joint tour participants: Identifiers linking all household members sharing a tour
  • Escorting: Trips where the primary purpose is to transport another household member (e.g., driving a child to school); further classified by whether the escort makes a dedicated round trip or chains the escort with other activities

4.7 Delivered Data Products

Table 17 summarizes the delivered tables and their units of observation.

Delivered tables
Table Records Unit of Observation Source
Household One per household Household Survey + processing
Person One per person Person Survey + processing
Day One per person per travel day Person-day Survey + processing + TICTOC
Vehicle One per household vehicle Vehicle Survey
Trip One per unlinked trip (includes imputed joint and school trips) Trip Survey + processing + TICTOC
Trip Linked One per linked journey Linked trip TICTOC
Location GPS trace points per trip Location point Survey app + processing
Tour One per tour Tour TICTOC
Joint Tour Participant One per person per joint tour Person-tour TICTOC
All tables are accompanied by a codebook listing every variable, its data type, and value labels.
Table 17: Delivered tables.

All tables are accompanied by a codebook listing every variable, its data type, and its value labels (see Section 7). Weighted versions of these tables are produced separately by the weighting process and documented in Section 5.


For questions about specific variables, processing decisions, or data quality metrics for your study, please contact the project team.

5 Weighting

This section summarizes the weighting and expansion procedures used in the Massachusetts Travel Study dataset. The goal of weighting is to expand the survey sample so that it represents the full resident population of Massachusetts.

The Massachusetts Travel Study leveraged two related weighting workflows. The standard weights represent travel behavior on a typical weekday, using Monday through Thursday travel. The day-of-week weights represent travel behavior on each day of the week, including Friday, Saturday, and Sunday. Both workflows used the same general approach, but they differed in the records included, the weighting geographies, and the way the final weights should be used in analysis.

NoteWhat are survey weights?

To produce statistics that represent an entire population without surveying every household or individual, survey researchers assign weights to each completed observation. In household travel surveys, the survey weight indicates how many people, households, days, or trips in the population a given respondent or record is estimated to represent. By applying these weights, analysts can generate regional estimates even when the sample is only a small fraction of the full population.

5.1 Overview of Weighting Goals

The weighting process aligns weighted survey estimates with external population totals and distributions across key household, person, day, and trip characteristics. Weighting corrects for differential sampling, differences in survey completion across demographic groups, and systematic differences in trip reporting that arise from the method respondents used to report their travel, such as smartphone app, web diary, or call center.

For the Massachusetts Travel Study, this process produced two sets of final weights. The standard weighting process expanded the survey sample to represent travel on an average weekday across Monday, Tuesday, Wednesday, and Thursday. The day-of-week weighting process expanded the survey sample separately for each day across Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. The day-of-week weights are most useful when an analysis explicitly depends on the day of week, such as comparing weekday and weekend travel or estimating travel totals for a specific day.

The day-of-week workflow built on the standard weighting workflow. Both workflows began with initial expansion, adjusted household weights to demographic targets, accounted for day-pattern reporting differences by diary platform, derived person, day, and trip weights, and then applied trip-level adjustments. The main difference was that the day-of-week workflow repeated the relevant weighting steps separately by day, so the weighted records represent Monday travel, Tuesday travel, Saturday travel, and so on, rather than one average weekday.

Table 18 summarizes the practical difference between the two weighting workflows.

Dimension Standard weighting Day-of-week weighting
Travel days represented Monday through Thursday as an average weekday Monday through Sunday, weighted separately by day
Recommended use Default weighted workflow for typical weekday summaries and most standard reporting Day-specific, weekday/weekend, and weekend travel analysis
Geographic controls Custom client-defined weighting zones developed for the project Broader Boston / Not Boston weighting groups
Completion basis Complete eligible weekday travel days Complete travel data for the specific day being weighted
Day weights Person weights are divided across complete eligible weekdays Day weights are equal to the person weight for that day
Trip weights Trip totals represent an average weekday Trip totals represent a specific day of week
Table 18: Comparison of standard and day-of-week weighting workflows.

5.2 What the Weights Represent

Across all steps, the weighting process produces final weights at multiple analytic levels. These weights allow the survey records to represent households, people, person-days, trips, linked trips, and tours in the study area.

  • Household weight: expands each surveyed household to represent households in the study area.
  • Person weight: expands each person to represent the population of persons.
  • Day weight: expands each complete person-day to represent daily travel, with the interpretation depending on whether the standard or day-of-week weighting workflow is used.
  • Unlinked trip weight: expands individual trip segments, including transit access, transfer, and egress legs.
  • Linked trip weight: expands complete trips between an origin and destination, with intermediate transit transfers combined into a single trip.
  • Tour weight: expands sequences of linked trips that begin and end at the same location.

For this study, the standard final weights represent a typical weekday based on Monday through Thursday travel. Standard day weights represent average weekday person-days, and standard trip, linked trip, and tour weights represent travel on an average weekday.

When the analytic question is explicitly about differences across Monday through Sunday, use the alternate day-of-week weighting workflow described in Section 16. The day-of-week weights produce a separate set of weights for each day of the week, so weighted estimates can represent Monday travel, Tuesday travel, weekend travel, or other day-specific comparisons.

NoteSome Weights are Zero

The final dataset may contain weights equal to zero. When a weight is equal to zero, it means that the record is present in the delivered data, but was not eligible to receive that particular weight.

Records may receive zero weights for the following reasons:

  • Partially complete records. For example, if a household participated for seven days but only provided three days of complete diary data, the incomplete days would be retained in the delivered data but would not receive positive day weights.
  • For households with children, days without complete proxy-reported child travel. For the standard weights, households with children needed complete reported travel for children on the weighted travel day. Some additional household days involving children may therefore receive zero standard weights if they do not meet the standard completion rules. The day-of-week weights use a relaxed child-completion rule because children’s travel was proxy-reported for only one day.
  • For standard weights, days outside of the standard “typical weekday” definition. The standard weights represent typical weekday travel based on Monday through Thursday. Friday, Saturday, and Sunday records are therefore not eligible to receive positive standard day, trip, linked trip, or tour weights.
  • For day-of-week weights, day-specific eligibility. The day-of-week weights are assigned separately for each day of the week. A record with a zero standard weight may still receive a positive day-of-week weight if it meets the completion criteria for that specific day.

Analysts should treat zero weights as specific to the weight being used. A zero value does not necessarily mean that the record is invalid or unusable for all analyses.

5.3 Inputs to Weighting

The Massachusetts Travel Study used two primary inputs in the weighting process: survey data and population target data. These same general inputs supported both the standard weights and the day-of-week weights, but the eligible survey records, weighting geographies, and control totals differed between the two sets of weights.

Survey Data

The survey data consisted of cleaned household, person, day, and trip records that met the completion criteria for weighting. The records eligible for weighting depended on whether the standard weights or day-of-week weights were being created.

For the standard weights, households were included if they provided complete data for at least one Monday, Tuesday, Wednesday, or Thursday travel day. These records support estimates of travel on an average weekday.

For the day-of-week weights, survey records were evaluated separately for each day of the week. A household was included for a given day only when it provided complete data for that specific day. As a result, the set of records that receive positive day-of-week weights can differ across Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Before weighting, missing demographic values needed for weighting were imputed where possible. These included income, gender, race, and ethnicity. The imputed values were used to reduce missingness in the demographic control variables used during weighting.

Population Target Data

The target data provided the household and person control totals used to adjust the survey records to the study area population. Demographic weighting targets were developed from the 2023 ACS 1-year Public Use Microdata Sample (PUMS). Selected auxiliary inputs used in imputation, including block-group income distributions, were drawn from 2023 ACS 5-year data.

The target data provided total household and population counts for each weighting geography, as well as detailed demographic distributions used as control totals in weighting. These controls included household characteristics, person characteristics, and selected travel-related controls where appropriate.

Weighting Geographies

The standard weights and day-of-week weights used different weighting geographies.

The standard weights used custom weighting zones developed for this study (Figure 5). These zones were based on MPO geographies, with smaller areas grouped where needed to maintain enough sample for stable weighting. The weighting zones were designed to balance the need for geographic specificity with the need for stable estimation across a range of demographic and travel behavior targets.

Figure 5: Standard weighting zone groups.

The day-of-week weights used broader geographic groups because estimating weights separately for each day reduces the available sample size. For this reason, the day-of-week weights were developed using a simpler Boston / Not Boston geography (Figure 6). In the weighting memo, the “Boston” group is described as the four inner-most commuter rings; the remaining areas are included in the “Not Boston” group.

Figure 6: Day-of-week weighting zones used for alternate weekday- and weekend-specific weights.
TipAnalyst Tip: Interpreting Weighting Geography

The practical implication is important for analysis. Standard weights are calibrated to the custom weighting zones used for the typical weekday workflow. Day-of-week weights are calibrated to broader Boston / Not Boston geographies by day. Estimates are generally more stable when summarized at or above the geography used in weighting. Estimates for smaller geographies, or geographies that cut across weighting zones, should be interpreted with additional caution and should be accompanied by checks of unweighted sample size and weight variability (e.g., standard errors, design effects, or effective sample size).

Targets

Targets are the specific demographic and household distributions that the weighting procedure seeks to align between the survey sample and the underlying population estimates. For the Massachusetts Travel Study, targets were defined using ACS PUMS data to promote statistical representativeness across the weighting geographies.

Each target represents a key dimension of the study area’s population and travel behavior that is important for accurate expansion of survey data to reflect the total population. Target variables span household characteristics, person-level attributes, and selected travel-related controls. At the highest level, the weighting process was constrained to match the total number of households and total number of persons in the study area and within the relevant weighting geographies.

  • Total households in the study area: approximately 2,816,000
  • Total persons in the study area: approximately 6,760,613

The standard and day-of-week workflows use similar target concepts, but some day-of-week categories are combined to improve stability after the sample is split by day. For example, the day-of-week process uses broader geographic controls and combines selected target levels where the day-specific sample is smaller.

Household-level targets

Table 19 summarizes the household-level target categories used in the two weighting workflows.

Variable Standard weighting categories Day-of-week weighting categories
Household size 1 person; 2 people; 3 people; 4 people; 5 people or more Same as standard weighting
Income Under $25,000; $25,000-$49,999; $50,000-$74,999; $75,000-$99,999; $100,000-$199,999; $200,000 or more Same as standard weighting
Workers 0 workers; 1 worker; 2 workers; 3 workers or more 0 workers; 1 worker; 2 workers or more
Vehicles No vehicles; at least one vehicle and fewer vehicles than drivers age 16 or older; vehicles greater than or equal to drivers Same as standard weighting
Presence of children 0 children; 1 or more children Same as standard weighting
Total households Total households by weighting geography Total households by weighting geography and day
Table 19: Household-level weighting targets by workflow.

Person-level targets

Table 20 summarizes the person-level target categories used in the two weighting workflows.

Variable Standard weighting categories Day-of-week weighting categories
Gender Male; female Same as standard weighting
Age Under 5; 5-15; 16-17; 18-24; 25-44; 45-64; 65 or older Under 5; 5-17; 18-24; 25-44; 45-64; 65 or older
Worker status Full-time worker; part-time worker; non-worker Same as standard weighting
Commute mode Work from home; walk; bike; transit; Other (include auto); not applicable Work from home; walk; bike; transit; other; not applicable
University student status University student; not a university student Same as standard weighting
Educational attainment Some college education; no college education Same as standard weighting
Race African American; Asian Pacific; White; Other Same as standard weighting
Ethnicity Hispanic; Non-Hispanic Same as standard weighting
Total persons Total persons by weighting geography Total persons by weighting geography and day
Table 20: Person-level weighting targets by workflow.

Travel-related controls

The standard weights included a regional transit trip target to address overrepresentation of transit trips in the survey data. The day-of-week weights did not use the same transit-trip control target.

Combined Weighting Targets

The categories listed above summarize the household- and person-level controls used in weighting. Analysts can use them as a quick reference for the dimensions and levels at which the weighted survey was calibrated to known population totals.

In practice, the weighting controls do two things simultaneously:

  • match the total households and total persons in each weighting zone group; and
  • match the marginal distributions of these household and person characteristics within each weighting zone group.

The controls do not guarantee that every cross-classification of those characteristics is perfectly represented. For example, age and income may each match target distributions, while age by income may still reflect sampling variability.

Some target categories were simplified, combined, or selectively applied to maintain stable estimation in smaller geographies and to avoid over-constraining the weighting process. Analysts should therefore interpret these categories as the effective levels at which the survey was calibrated to known population totals.

TipAnalyst Tip: What Weights Can and Cannot Correct

Weighting targets define the population dimensions used to calibrate the survey. These controls make the marginal distributions (e.g., age, gender, income groups) in the weighted data match known population totals. However, there are important limitations:

1. Weighting improves representativeness only within defined categories.
Estimates are most reliable at the level of the weighting targets. More detailed breakdowns (e.g., finer income bins) were not controlled and may still reflect sampling variability or bias. In practice, targets define the finest level of safe aggregation.

2. Joint distributions are not guaranteed to match the population.
Weights align individual targets, not combinations of them. For example, age and race may each match population totals, but age x race may still be misrepresented. Be cautious with highly disaggregated cross-tabulations.

3. Non-targeted variables and small cells may be unstable.
Variables not included in weighting controls are not explicitly bias-corrected. Small or sparse groups remain unstable after weighting, especially when weights are large or variable. We recommend checking cell sizes and relative standard errors (RSEs) before interpreting results, especially when sample sizes are small.

4. Weighting does not correct measurement error.
Targets adjust who is represented, not what was reported. Misreporting or limitations in survey design (e.g., coarse mode categories) are not fixed through weighting.

Other useful diagnostics include the effective sample size, which reflects the equivalent number of equally weighted observations, and the design effect, which captures how weighting inflates variance (see Section 5.6.3 below).

Bottom line:
Weighting improves representativeness along specific dimensions, but it does not guarantee reliable estimates for all subgroups. Use targets as a guide to where estimates are most trustworthy.

5.4 Weighting Process

The flow chart below summarizes the full weighting workflow described in this section. The sections that follow explain each step in more detail.

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%

flowchart TD

  Survey[("Survey Data<br/>(households, persons, days, trips)")]
  Census[("Target Data<br/>(ACS PUMS and project controls)")]
  Targets[["Household and Person Targets"]]
  Base{"Base Weight Estimation"}
  P1{"Round 1:<br/>Demographic Reweighting"}
  DP{"Day-Pattern Modeling"}
  DayTargets[["Day-Pattern Targets"]]
  P2{"Round 2:<br/>Day-Pattern Reweighting"}
  PersonDay{"Person and Day Weight Derivation"}
  TripAdj{"Round 3:<br/>Trip Adjustment"}
  FinalWeights[/"Final Household, Person,<br/>Day, Trip, Linked Trip,<br/>and Tour Weights"/]

  Survey --> Base
  Census --> Targets
  Base --> P1
  Targets --> P1
  P1 --> DP
  DP --> DayTargets
  Targets --> P2
  DayTargets --> P2
  P2 --> PersonDay
  PersonDay --> TripAdj
  TripAdj --> FinalWeights

  classDef source fill:#024D5F,stroke:#013845,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
  classDef control fill:#1B9E77,stroke:#15785B,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
  classDef process fill:#0D7993,stroke:#085C70,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
  classDef weight fill:#695CB4,stroke:#4B4180,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92

  linkStyle default stroke:#475569,stroke-width:2.25px

  class Survey,Census source
  class Targets,DayTargets control
  class Base,P1,DP,P2,PersonDay,TripAdj process
  class FinalWeights weight

Base Weights

Weighting began with base weights, which reflected the probability that a household was included in the survey. For each sample segment, RSG calculated a base weight as the inverse of the probability of inclusion, which depended on both the probability of selection and the probability of response. For segment s with H total households and R responding households, the base weight can be understood as:

\[ w_s = \frac{H_s}{R_s} \]

Base weights provided the initial expansion from the sample to the population and served as the seed weights for subsequent rounds of weighting adjustments. For the day-of-week workflow, the same concept was applied separately by day. If a segment had a different number of complete records on Monday than on Tuesday, then the initial expansion could differ by day.

Round 1 Weighting: Adjusting for Demographic Bias

Round 1 weighting used PopulationSim to adjust base weights so that weighted survey estimates matched demographic control totals derived from ACS PUMS. PopulationSim performed constrained entropy maximization, adjusting household weights in the smallest way necessary to match a set of household- and person-level targets.

NoteWhat is Entropy Maximization?

Entropy maximization is a statistical method used to adjust survey weights so that the weighted survey data matches known population totals, such as the number of households, adults, workers, or children in a region.

The key idea is simple: change the initial weights as little as possible while forcing the final weighted totals to match external control totals. Groups that were underrepresented in the sample receive higher weights, while groups that were overrepresented receive lower weights. This approach preserves the structure of the collected data while helping the survey reflect the population.

For standard weighting, this reweighting process was applied to the Monday-through-Thursday records together to represent an average weekday. For day-of-week weighting, weights were estimated to match the household- and person-level targets for each day of the week. In the end, a household could have different weights for different days, depending on which days of complete travel data were available and how those records fit the day-specific controls.

The output of Round 1 consisted of target-optimized household weights. These weights aligned the survey with demographic targets and served as inputs to the day-pattern adjustment described below.

Round 2 Weighting: Adjusting for Day-Pattern Bias

Survey trip rates differed across diary platforms, in part because smartphone app users tended to report more complete travel than online diary or call center respondents. To address this issue, RSG applied a day-pattern adjustment before finalizing household, person, and day weights.

RSG classified each person-day into three mutually exclusive day-pattern categories: made no trips, made mandatory trips, or made only non-mandatory trips. Mandatory trips are trips to work, work-related activities, school, or school-related activities. The day-pattern model estimated how likely each person-day was to fall into one of these categories after accounting for demographic characteristics and diary platform.

For standard weighting, the day-pattern adjustment represented the Monday-through-Thursday average weekday. For day-of-week weighting, the same general procedure was applied separately by day, with an additional day-of-week term in the model. The resulting day-pattern targets were added to a second PopulationSim run so that the final household weights accounted for both demographic targets and diary-platform reporting differences.

The table below shows the general direction of the day-pattern adjustment for the day-of-week workflow. For Monday through Thursday, the adjustment reduces the share of no-travel days for online diary and call center respondents and increases the share of days with reported travel. Online and call center diaries were collected Monday through Thursday, so no analogous adjustment is needed for Friday through Sunday trip records.

Day Day type Call center before Call center after Online diary before Online diary after Smartphone
Mon No trips 26% 20% 22% 13% 12%
Mon Made mandatory trips 18% 23% 42% 42% 45%
Mon Made only non-mandatory trips 56% 57% 36% 45% 42%
Tue No trips 26% 21% 22% 13% 12%
Tue Made mandatory trips 18% 23% 44% 43% 48%
Tue Made only non-mandatory trips 55% 56% 35% 44% 40%
Wed No trips 25% 20% 21% 12% 11%
Wed Made mandatory trips 19% 24% 44% 44% 48%
Wed Made only non-mandatory trips 56% 56% 35% 44% 41%
Thu No trips 25% 19% 21% 12% 11%
Thu Made mandatory trips 21% 27% 43% 43% 47%
Thu Made only non-mandatory trips 54% 54% 36% 45% 41%
Table 21: Illustrative day-pattern adjustment by weekday and diary platform.

Adjusting Person and Day Weights

After household weights were finalized, person and day weights were derived from the household weights. Person weights were created by assigning the household weight to each household member. Because the survey does not collect travel diaries from unrelated household members, unrelated persons received a person weight of zero and their weight was redistributed evenly among the remaining related household members.

Day weights were then assigned to complete person-days. This was one of the most important differences between the standard and day-of-week workflows. In standard weighting, a person with multiple complete eligible weekdays had their person weight divided across those complete days, so the resulting day records collectively represented that person’s average weekday contribution. In day-of-week weighting, the weights were already specific to a day of week, so the day weight was equal to the person weight for that day.

Round 3 Weighting: Adjusting for Trip-Type Reporting Bias

The final weighting step corrected for under-reporting of specific trip types across diary platforms. Trip records were grouped into work, school, and other trip categories. For each trip type, RSG estimated a weighted model predicting the number of trips per person-day and used the model to calculate a trip adjustment factor.

The adjustment factor was applied to unlinked trip weights, using the final day weight as the starting point. The adjustment was designed to account for under-reporting of stops in the trip diary, such as a brief stop that a respondent forgot to record. For day-of-week weighting, the same adjustment concept was applied to the day-specific weights. Because online diary and call center records were only collected Monday through Thursday, the adjustment was relevant to those days and platforms.

The table below summarizes the trip adjustment factors used in the day-of-week workflow.

Trip type Online diary adjustment Call center adjustment
Work trips 1.62 1.51
School trips 1.87 1.00
Other trips 1.50 1.15
Table 22: Day-of-week trip adjustment factors by trip type and diary platform.

After the unlinked trip weights were adjusted, linked trip weights and tour weights were calculated from the updated trip weights. Linked trip weights represent complete origin-to-destination trips, while tour weights represent sequences of linked trips that begin and end at the same location.

5.5 Weighted Totals

The final weights expand the survey to population totals at the household, person, day, trip, linked trip, and tour levels.

Table 23 shows the total weighted households, persons, person-days, trips, linked trips, and tours for the standard weekday workflow. These totals represent the overall scale of travel in the study area on an average weekday across Monday through Thursday.

Final Weighted Totals by Analysis Level
Weight Level Weighted Total
Household 2,814,595.3
Person 6,759,611.8
Day 6,759,611.8
Trip 30,078,666.6
Table 23: Weighted totals by analysis level - standard weekday weights

Table 24 summarizes the day-specific totals available for the alternate day-of-week workflow. These totals represent the scale of travel in the study area on each specific day of the week, with separate estimates for Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday travel.

Day-of-Week Weighted Totals
Day Weight Level Weighted Total
Monday Day 6,751,877.4
Monday Trip 30,625,904.0
Tuesday Day 6,770,579.5
Tuesday Trip 31,791,642.6
Wednesday Day 6,765,954.5
Wednesday Trip 32,235,056.1
Thursday Day 6,764,262.0
Thursday Trip 32,082,955.5
Friday Day 6,745,710.1
Friday Trip 33,560,235.3
Saturday Day 6,739,056.6
Saturday Trip 33,110,358.2
Sunday Day 6,746,910.2
Sunday Trip 27,489,425.3
Table 24: Weighted totals by analysis level and day of week - day-of-week weights

5.6 Additional Guidance for Analysts

Choosing the Right Weight

Different analyses require different weight types. Analysts should select the weight that matches both the level of measurement and the weighting workflow.

For most typical weekday summaries, analysts should use the standard weights. Standard household, person, day, trip, linked trip, and tour weights are appropriate when the research question is about average weekday travel, especially for summaries that are not intended to distinguish Monday from Tuesday or weekday from weekend behavior.

Day-of-week weights should be used when the analytic question is explicitly about a specific day of week or about differences across days. Examples include comparing Saturday and Sunday travel, estimating Friday trip rates, or comparing weekday and weekend mode share. When using day-of-week weights, analysts should filter to the relevant day or group of days before applying the corresponding day-of-week weight. A Monday estimate should use Monday records and Monday weights; a Saturday estimate should use Saturday records and Saturday weights.

At each level, the same unit-matching principle applies:

  • Household weights should be used when households are the unit of analysis or when studying household-level characteristics.
  • Person weights should be used for demographic characteristics, person-level behaviors, and analyses where individuals, not days or trips, are the unit.
  • Day weights should be used when analyzing person-day travel behavior, including day patterns, trip rates, and average daily travel.
  • Trip weights should be used when analyzing trips.
  • Linked trip weights should be used when analyzing complete origin-to-destination linked trips.
  • Tour weights should be used when analyzing tours.

Using the wrong weight type can lead to biased estimates. For example, applying person weights to trip tables will underestimate total travel, while using day-of-week weights without filtering to the intended day can mix together records that represent different target days.

What the Weights Can and Cannot Correct

The weighting process corrects for several forms of bias:

  • differences in sampling likelihood across geographies;
  • differential response rates across demographic groups;
  • reporting differences across diary platforms, including smartphone, online diary, and call center reporting; and
  • under-reporting of specific trip types.

However, weighting cannot correct for every possible source of error. It cannot fully correct misreported or miscoded trip purposes, missing data not captured through imputation, recall errors unrelated to diary platform, GPS or routing errors, or sparse samples in very small geographies or rare population groups.

The day-of-week weights also have a specific limitation. Because the sample is split by day, each day-specific weighting run has fewer records than the standard Monday-through-Thursday workflow. The broader Boston / Not Boston geography and combined target categories improve stability, but they do not make all day-specific subgroup estimates equally reliable. Analysts should interpret highly granular day-of-week estimates with caution.

Design Effects and Effective Sample Size

Unequal weights reduce the statistical precision of estimates compared with a simple random sample of the same size. This reduction is summarized by the design effect (DEFF) and the effective sample size (ESS). DEFF reflects how much weight variability inflates variance; ESS reflects the size of an unweighted sample that would yield equivalent precision.

Because day-of-week weights are estimated separately by day, they may have larger variance than the standard weights, particularly for Friday, Saturday, Sunday, or small subgroups. As a rule of thumb, when DEFF exceeds 2.0, analysts should expect a noticeable loss of precision, especially when estimates are based on small subgroups where limited sample size and weight variability compound.

Weight Quality Diagnostics by Analysis Level
Weight Level CV DEFF ESS
Household 1.18 2.38 6,521.83
Person 1.22 2.48 11,917.58
Day 1.60 3.57 13,727.30
Trip 1.81 4.26 46,971.97
Table 25: Weight quality diagnostics by analysis level

The standard diagnostics in Table 25 summarize the default weekday workflow. Table 26 shows the weekday-specific diagnostics for the alternate day-of-week weights. These values are especially helpful when comparing the relative stability of Monday-through-Thursday estimates with Friday, Saturday, or Sunday estimates.

Day-of-Week Weight Quality Diagnostics
Day Weight Level CV DEFF ESS
Monday Day 1.36 2.85 4,461.10
Monday Trip 1.44 3.08 17,038.78
Tuesday Day 1.33 2.78 5,760.86
Tuesday Trip 1.45 3.10 21,093.64
Wednesday Day 1.39 2.93 5,444.14
Wednesday Trip 1.56 3.42 19,229.54
Thursday Day 1.37 2.88 5,253.48
Thursday Trip 1.49 3.22 19,562.40
Friday Day 1.37 2.88 3,557.98
Friday Trip 1.44 3.08 16,416.02
Saturday Day 1.36 2.86 3,574.00
Saturday Trip 1.42 3.00 17,224.92
Sunday Day 1.37 2.88 3,528.14
Sunday Trip 1.44 3.06 13,759.76
Table 26: Day-of-week weight quality diagnostics by weekday and analysis level

Distribution of Weights

Weight variability differs by dataset level and by weighting workflow. Household and person weights reflect the demographic and geographic calibration process. Day weights reflect the way person weights are assigned to complete travel days. Trip weights inherit day-weight variability and incorporate trip-type adjustment factors.

In the day-of-week workflow, weight distributions also differ by day. Friday, Saturday, and Sunday generally have fewer records because online diary and call center travel data were collected Monday through Thursday, while Friday through Sunday records come from smartphone respondents. This smaller sample size contributes to larger day-of-week weights and greater uncertainty for some estimates. Figure 7 visualizes those differences across dataset levels.

Figure 7: Weight Distributions by Dataset Level

Analysts should be cautious when conducting analyses in which a small number of high-weight observations dominate the estimates. This caution is especially important for day-specific estimates, small geographies, rare subgroups, and cross-tabulations with many categories.

Geographic Considerations and Small-Area Estimates

Because weighting was performed to specific weighting geographies, those are the geographies at which the weighted data is most internally consistent. For standard weighting, the relevant geography is the custom client-defined weighting zone structure. For day-of-week weighting, the relevant geography is the broader Boston / Not Boston structure.

This does not prevent analysts from summarizing the data to other geographies, but it does affect interpretation. Cities, towns, neighborhoods, corridors, and other analyst-defined geographies are not individually controlled unless they align with the weighting geographies. Weighted totals for those areas may not match external benchmarks, and estimates may be driven by a small number of high-weight records.

For fine-scale estimates, analysts should check the unweighted sample size, the number of households or persons contributing to the estimate, and the distribution of weights. In some cases, pooling multiple areas, pooling multiple days, or reporting estimates at a broader geography may be more appropriate.

Small Population Groups

Rare population groups, rare travel behaviors, and highly specific day-by-geography combinations may have limited representation in the weighted data. This can include groups such as zero-vehicle households, transit commuters, active transportation users, university students, or weekend travelers in a small geography.

For these groups, analysts should consider pooling response categories, pooling across geographies or days where conceptually appropriate, reporting uncertainty measures, or using model-based estimation techniques. The day-of-week weights improve the ability to analyze daily variation in travel, but they do not eliminate the need to evaluate sample size and precision.

5.7 Summary

Weighting for the Massachusetts Travel Study dataset follows a structured and incremental process. Base weights correct for sample design. Round 1 adjustments correct for demographic nonresponse. Round 2 adjustments correct for day-pattern reporting bias. Round 3 adjustments correct for trip-type under-reporting. Together, these steps yield household, person, day, trip, linked trip, and tour weights for estimating population-level travel behavior.

The standard weights should be treated as the default workflow for typical weekday analysis. The day-of-week weights should be used when the research question depends on a specific day of week or on comparisons across days, including weekday/weekend analysis. In either workflow, analysts should match the weight to the analytic unit, use the geography and target structure as a guide to stable interpretation, and check sample size and weight variability before interpreting small or highly detailed estimates.

6 Dataset Overview

This section describes the prepared tables available for analysis and how they relate to one another. It uses the prepared hts object and the current settings.yml configuration.

6.1 Data Structure

Household travel survey data are hierarchical: though the primary sampling unit (see Section 2.1.2) is the household, the data collected also represent the behavior of individual persons. For participants who reported data via the rMove smartphone app, data were collected across multiple days, representing a multitude of travel and daily activity data.

In the delivered dataset, this hierarchical structure shows up in the form of multiple tables that link to one another using stable identifiers (typically columns ending in _id). Figure 8 summarizes the relationship among the prepared tables.

Diagram showing how prepared study tables relate to each other.
Figure 8: Data linkages across prepared tables.

6.2 Summary of Data Tables

Table 27 lists the prepared tables, their units of observation, and their primary identifiers.

Table Name Record Unit Primary ID(s) Weight Column What's in the Table
Household (`hh`) Household `hh_id` `hh_weight` One record per household, with household-level attributes (e.g., sampling/strata fields and household characteristics) used for analysis and weighting.
Person (`person`) Person `person_id` `person_weight` One record per person in each household, including demographic attributes and person-level variables used for analysis and weighting.
Day (`day`) Person-day `day_id` `day_weight` One record per person-day (a single survey day for a person), used for day-based analysis such as trip rates and daily metrics.
Vehicle (`vehicle`) Vehicle `vehicle_id` `hh_weight` One record per household vehicle (when delivered), including vehicle identifiers and vehicle characteristics used in vehicle-based analysis and joins.
Location (`location`) GPS point on a trip `trip_id`, `collect_time` One record per place/location reference (when delivered), often used to store geocoded attributes or repeated location metadata linked to trips, tours, or activities.
Unlinked Trip (`trip_unlinked`) Person-trip `trip_id` `trip_weight` One record per unlinked trip segment (when delivered), typically representing each movement between stops; used for detailed mode/path and trip-chaining analysis.
Linked Trip (`trip_linked`) Person-trip `linked_trip_id` `linked_trip_weight` One record per linked trip (when delivered), typically aggregating unlinked segments into a single journey between primary origin and destination.
Tour (`tour`) Person-tour `tour_id` `tour_weight` One record per tour (when delivered), grouping trips into an out-and-back sequence anchored at home or a primary location; used for tour-based analysis.
Table 27: Summary of prepared tables

6.3 Trip Unit of Measure: Person-Trips

Understanding how travel events are represented in the dataset is essential for correctly interpreting trip and tour outputs. This section describes the structure of person-trip records, how shared travel is represented, and what that means for later analyses.

What is a Person-Trip?

Trips are represented at the person-trip level: each row is a single travel event made by a single person. If multiple household members traveled together, the shared movement appears as multiple records, one per participating person.

The guide’s default trip table is trip_unlinked. See the Analyst Handbook for examples that build from this table.

Replication of Shared Trips

When household members travel together (for example, carpooling, walking together, or biking as a group), the data include:

  • one person-trip record per traveler, and
  • a unique trip identifier for each person-trip record.

It is therefore common to observe:

  • identical origin/destination coordinates,
  • nearly identical start/end times,
  • matching modes, and
  • matching purposes

across members of the same household. These patterns often indicate shared travel, not data duplication errors.

The prepared trip tables contain records only for surveyed persons. Non-household travelers (for example, friends, coworkers, carpool partners, or other companions) may be captured indirectly through trip-level metadata fields such as num_non_hh_travelers and num_hh_travelers.

For MassDOT, hh_member_* columns on the trip table should be treated as supplementary household co-travel metadata rather than definitive truth of co-travel. Processed, imputed co-travel information is present in joint_trip_id and joint_tour_id fields, which are the primary source of information about shared travel in the dataset.

Example Structure of Joint-Trip Records

When joint-trip identifiers are available, Table 28 shows one example of how the same shared movement appears across multiple person-trip records.

Example joint trip records (joint_trip_id = -1)
hh_id person_id day_id trip_id joint_trip_id depart_date depart_hour depart_minute arrive_date arrive_hour arrive_minute o_lat o_lon d_lat d_lon hh_member_1 hh_member_2 hh_member_3 hh_member_4 hh_member_5 hh_member_6 hh_member_7 hh_member_8
24000089 2400008901 240000890101 2400008901001 -1 2024-06-11 14 20 2024-06-11 14 27 42.54132 -70.87968 42.57236 -70.85492 1 0 0 0 0 0 0 0
24000089 2400008901 240000890101 2400008901002 -1 2024-06-11 16 2 2024-06-11 16 15 42.57236 -70.85492 42.54132 -70.87968 1 0 0 0 0 0 0 0
24000089 2400008902 240000890201 2400008902001 -1 2024-06-11 8 0 2024-06-11 8 5 42.54132 -70.87968 42.54144 -70.88607 0 1 0 0 0 0 0 0
24000089 2400008902 240000890201 2400008902002 -1 2024-06-11 9 20 2024-06-11 9 25 42.54144 -70.88607 42.54626 -70.88079 0 1 0 0 0 0 0 0
24000089 2400008902 240000890201 2400008902003 -1 2024-06-11 9 40 2024-06-11 9 45 42.54626 -70.88079 42.54132 -70.87968 0 1 0 0 0 0 0 0
24000089 2400008902 240000890201 2400008902004 -1 2024-06-11 11 20 2024-06-11 11 30 42.54132 -70.87968 42.55242 -70.87901 0 1 0 0 0 0 0 0
24000089 2400008902 240000890201 2400008902005 -1 2024-06-11 12 5 2024-06-11 12 15 42.55242 -70.87901 42.54132 -70.87968 0 1 0 0 0 0 0 0
24000122 2400012201 240001220101 2400012201001 -1 2024-06-11 12 39 2024-06-11 13 1 42.34075 -71.15493 42.33640 -71.17110 1 0 0 0 0 0 0 0
24000122 2400012201 240001220101 2400012201002 -1 2024-06-11 15 55 2024-06-11 16 15 42.33663 -71.17125 42.34063 -71.15506 1 0 0 0 0 0 0 0
24000122 2400012201 240001220101 2400012201036 -1 2024-06-11 16 31 2024-06-11 16 31 42.34075 -71.15493 42.34075 -71.15493 1 0 0 0 0 0 0 0
Table 28: Example joint-trip records.

6.4 Record Counts

Table 29 summarizes the delivered record counts and the number of positively weighted records by table.

Table Records Weight Column Weighted Records Percent Weighted
Household (`hh`) 18,122 `hh_weight` 15,552 85.8%
Person (`person`) 37,616 `person_weight` 29,560 78.6%
Day (`day`) 134,187 `day_weight` 49,028 36.5%
Vehicle (`vehicle`) 25,849 `hh_weight` 21,669 83.8%
Location (`location`) 8,607,225 NA NA
Unlinked Trip (`trip_unlinked`) 468,018 `trip_weight` 200,120 42.8%
Linked Trip (`trip_linked`) 419,469 `linked_trip_weight` 173,983 41.5%
Tour (`tour`) 160,091 `tour_weight` 68,007 42.5%
Value Labels (`value_labels`) 2,422 NA NA
Variable List (`variable_list`) 567 NA NA
Table 29: Record counts and weighted records by table

6.5 Household Completion Status

The delivered MassDOT data include both complete and incomplete households. For most descriptive and inferential analyses, the recommended analytic universe is the set of households where hts$hh$is_complete == 1.

This household-level completeness rule is different from trip-level completion flags. In particular, trip_unlinked$is_complete describes trip-record completion or usability, not whether the household belongs in the complete-household analytic universe. When an analysis starts from person-, day-, trip-, vehicle-, or tour-level records, use the household table to define the complete-household universe and then carry that restriction down through hh_id.

Table 30 shows how many delivered records belong to complete versus incomplete households across the main prepared tables.

Table Household Completion Status Records
hh Complete household 15,641
hh Incomplete household 2,481
person Complete household 31,255
person Incomplete household 6,361
day Complete household 96,370
day Incomplete household 37,817
trip_unlinked Complete household 411,573
trip_unlinked Incomplete household 56,445
trip_linked Complete household 366,186
trip_linked Incomplete household 53,283
tour Complete household 139,240
tour Incomplete household 20,851
vehicle Complete household 21,770
vehicle Incomplete household 4,079
Table 30: Delivered records by household completion status.

For lower-level analysis tables, the simplest workflow is:

  • create complete_hh_ids from hts$hh
  • filter households directly with dplyr::filter(is_complete == 1)
  • filter lower-level tables with dplyr::filter(hh_id %in% complete_hh_ids)
  • use trip-level is_complete only when the question is specifically about trip completion or trip usability

6.6 Data Types and Considerations

The dataset includes variables that behave differently in analysis. Understanding common patterns helps avoid common mistakes in summaries and models.

Categorical Variables

Categorical variables store labels rather than magnitudes. They include binary fields, nominal fields (no inherent order), ordinal fields (with a natural order), and many count-like fields that are top-coded or otherwise treated as binned categories.

NoteUse the Codebook to Order Categories

When you build a table, chart, or derived factor from a categorical variable, use codebook$value_labels as the source of truth for both labels and ordering.

Continuous Numeric Variables

Continuous numeric variables represent numeric measures where arithmetic operations are meaningful (for example, distances, durations, travel time, or speed). These fields can have wide ranges and may include extreme values.

NoteTop-coded count variables

Some variables that look numeric (for example age brackets or capped household sizes) should be treated as categorical in analysis when they represent binned values rather than true continuous measures.

The table below summarizes the configured outlier diagnostics used to review the tails of selected numeric variables.

Outlier diagnostics.

Min Max P01 P99 IQR Lower bound Upper bound Outliers % outliers Worst gap Severity Suggested handling
day
num_trips 0 68 0 15 5 −8 12 3,061 2.3% 56 Moderate Consider trimming >= 99th pct.
person
num_trips 0 310 0 70 17 −24 44 2,560 6.8% 266 High Trim or winsorize >= 95th pct.
trip
distance_meters 0 19,872,573 93 111,517 8,568 −11,723 22,549 49,331 10.8% 19,850,024 High Trim or winsorize >= 95th pct.
distance_miles 0 12,348 0 69 5 −7 14 49,331 10.8% 12,334 High Trim or winsorize >= 95th pct.
duration_minutes 0 8,008 0 158 16 −18 46 34,690 7.6% 7,962 High Trim or winsorize >= 95th pct.
duration_seconds 1 480,464 1 9,472 979 −1,102 2,814 34,281 7.5% 477,650 High Trim or winsorize >= 95th pct.
dwell_mins 0 9,162 0 2,367 311 −462 783 52,110 11.1% 8,379 High Trim or winsorize >= 95th pct.
speed_mph 0 14,766,979 0 539 21 −26 57 19,239 4.2% 14,766,922 High Trim or winsorize >= 95th pct.

Missing Values

Table 31 summarizes the configured missing-value codes and their labels in the codebook.

Code Label(s) in codebook # Variables
-1 Not imputable; Missing 13
995 Missing Response; 995 310
996 Never; None; None (I do not drive a vehicle) 7
997 Other; Other/prefer to self-describe; Other (e.g., boat, RV, van); Other vehicle 12
998 Don't know 3
999 Prefer not to answer 8
Table 31: Configured missing-value codes and codebook labels.

7 Codebook

The codebook is the primary reference for understanding the variables, response categories, question and response logic, and data structures used in this dataset.

A clear codebook is essential for reproducible analysis. It helps analysts identify the meaning of each variable, understand where it appears in the dataset, distinguish categorical variables from numeric or top-coded fields, verify skip logic and valid values, and interpret coded values consistently across tables.

NoteUse the Codebook First

Start here whenever you need to answer any of these questions:

  • What table contains this variable?
  • What does this field mean?
  • Who was asked this question, and under what conditions should I expect a value?
  • Is this field categorical, numeric, top-coded, or part of a grouped response?
  • What do the stored values mean?
  • What order should categories appear in a plot or table?
  • Is this field part of a “select-all-that-apply” group or controlled by survey logic?

7.1 What the Codebook Contains

Variable List

The variable list is the structural reference for the dataset. It describes each delivered data element and helps analysts understand how variables are organized across household, person, day, trip, tour, vehicle, location, or other study tables.

The variable list includes:

  • variable name
  • table membership
  • delivered data type
  • description of the variable’s meaning, units, or derivation
  • survey question text, when available
  • survey logic that governs whether a respondent was asked the question or should have a value
  • checkbox or select-all-that-apply flags for multiple-response categorical variables

Value Labels

The value-label table is the categorical reference for the dataset. For coded categorical variables, it maps stored values to human-readable labels so analysts can reconstruct ordered factors, standard tabulations, and interpretable plots.

Depending on the study, value-label records may include:

  • table name, when labels vary by table
  • variable name
  • stored value or code
  • human-readable label
  • category order

7.2 Variable List

The variable list below is searchable and downloadable. For display, table membership flags present in the raw delivered codebook as binary hh, person, day columns are combined into a single table_membership field when the source codebook stores membership as separate table columns.

Codebook variable list.

7.3 Value Labels

The value-label table below lists the available value labels for categorical variables. Use it alongside the variable list to interpret coded values and preserve the intended category order in summaries, charts, and models.

Codebook value labels.

8 Frequency Tables

This chapter provides frequency summaries for variables in each prepared data table. Use the table of contents on the left to jump directly to each dataset table section.

8.1 Household

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 2,481 13.69% 0 0.00%
1 Yes 15,641 86.31% 2,814,595 100.00%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_trips

num_trips
Number of trips
Statistic
Unweighted
Weighted
Value Value
N 18,122.00 2,814,595.28
Min 0.00 0.00
P25 4.00 5.00
Median 10.00 10.00
Mean 25.83 28.96
P75 37.00 38.00
P95 101.00 120.00
Max 355.00 355.00
SD 35.13 40.52

num_days_complete

num_days_complete
Number of complete days
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 complete days 2,477 13.67% 0 0.00%
1 1 complete day 10,482 57.84% 2,074,334 73.70%
2 2 complete days 69 0.38% 12,725 0.45%
3 3 complete days 79 0.44% 12,724 0.45%
4 4 complete days 121 0.67% 22,069 0.78%
5 5 complete days 269 1.48% 41,327 1.47%
6 6 complete days 813 4.49% 124,650 4.43%
7 7 complete days 3,812 21.04% 526,766 18.72%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

participation_group

participation_group
Participation group
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Signup survey completed via browserMove, Diary completed via browserMove 7,569 41.77% 1,426,249 50.67%
2 Signup survey completed via browserMove, Diary completed via call center 130 0.72% 23,856 0.85%
3 Signup survey completed via browserMove, Diary completed via rMove 3,298 18.20% 355,089 12.62%
4 Signup survey completed via call center, Diary completed via browserMove 91 0.50% 15,885 0.56%
5 Signup survey completed via call center, Diary completed via call center 425 2.35% 82,109 2.92%
6 Signup survey completed via call center, Diary completed via rMove 15 0.08% 1,999 0.07%
7 Signup survey completed via rMove, Diary completed via browserMove 1,335 7.37% 217,059 7.71%
8 Signup survey completed via rMove, Diary completed via call center 18 0.10% 3,451 0.12%
9 Signup survey completed via rMove, Diary completed via rMove 5,241 28.92% 688,898 24.48%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

sample_segment

sample_segment
Sample segment
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Berkshire General 108 0.60% 20,148 0.72%
2 Berkshire Hard-to-reach 107 0.59% 15,776 0.56%
3 Berkshire Rural 153 0.84% 22,595 0.80%
4 Boston Region General 3,747 20.68% 741,690 26.35%
5 Boston Region Hard-to-reach 3,389 18.70% 441,647 15.69%
6 Boston Region Rural 141 0.78% 20,827 0.74%
7 Boston Region Walk/Bike/Transit 1,123 6.20% 130,153 4.62%
8 Cape Cod General 516 2.85% 88,117 3.13%
9 Cape Cod Hard-to-reach 74 0.41% 8,852 0.31%
10 Cape Cod Rural 85 0.47% 11,660 0.41%
11 Central Massachusetts General 664 3.66% 124,515 4.42%
12 Central Massachusetts Hard-to-reach 565 3.12% 73,071 2.60%
13 Central Massachusetts Rural 272 1.50% 37,407 1.33%
14 Franklin General 54 0.30% 8,771 0.31%
15 Franklin Hard-to-reach 39 0.22% 3,017 0.11%
16 Franklin Rural 133 0.73% 19,912 0.71%
17 Martha’s Vineyard General 73 0.40% 6,387 0.23%
18 Martha’s Vineyard Rural 82 0.45% 4,089 0.15%
19 Merrimack Valley General 461 2.54% 78,234 2.78%
20 Merrimack Valley Hard-to-reach 349 1.93% 55,599 1.98%
21 Merrimack Valley Rural 72 0.40% 8,246 0.29%
22 Montachusett General 324 1.79% 55,578 1.97%
23 Montachusett Hard-to-reach 128 0.71% 16,844 0.60%
24 Montachusett Rural 216 1.19% 26,139 0.93%
25 Nantucket General 80 0.44% 6,525 0.23%
26 Nantucket Rural 27 0.15% 1,326 0.05%
27 Northern Middlesex General 438 2.42% 77,595 2.76%
28 Northern Middlesex Hard-to-reach 331 1.83% 37,193 1.32%
29 Northern Middlesex Rural 28 0.15% 2,792 0.10%
30 Old Colony General 583 3.22% 99,395 3.53%
31 Old Colony Hard-to-reach 262 1.45% 37,333 1.33%
32 Old Colony Rural 59 0.33% 9,823 0.35%
33 Pioneer Valley General 646 3.56% 107,815 3.83%
34 Pioneer Valley Hard-to-reach 792 4.37% 105,120 3.73%
35 Pioneer Valley Rural 265 1.46% 41,054 1.46%
36 Southeastern Massachusetts General 968 5.34% 173,990 6.18%
37 Southeastern Massachusetts Hard-to-reach 521 2.87% 64,372 2.29%
38 Southeastern Massachusetts Rural 247 1.36% 30,988 1.10%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

signup_platform

signup_platform
Signup platform
Value Label
Unweighted
Weighted
Count Percent Count Percent
browser 10,997 60.68% 1,805,194 64.14%
call 531 2.93% 99,993 3.55%
rmove 6,594 36.39% 909,408 32.31%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

diary_platform

diary_platform
Diary platform
Value Label
Unweighted
Weighted
Count Percent Count Percent
browser 8,995 49.64% 1,659,193 58.95%
call 573 3.16% 109,416 3.89%
rmove 8,554 47.20% 1,045,986 37.16%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_people

num_people
Number of household members
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 1 person 6,598 36.41% 814,090 28.92%
2 2 people 6,858 37.84% 949,530 33.74%
3 3 people 2,349 12.96% 476,856 16.94%
4 4 people 1,624 8.96% 356,382 12.66%
5 5 people 497 2.74% 150,374 5.34%
6 6 people 132 0.73% 45,412 1.61%
7 7 people 39 0.22% 12,731 0.45%
8 8 people 17 0.09% 6,913 0.25%
9 9 people 7 0.04% 1,474 0.05%
10 10 people 1 0.01% 834 0.03%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_surveyable

num_surveyable
Number of surveyable household members
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 1 surveyable person 7,490 41.33% 938,544 33.35%
2 2 surveyable persons 6,445 35.56% 907,685 32.25%
3 3 surveyable persons 2,100 11.59% 431,299 15.32%
4 4 surveyable persons 1,484 8.19% 340,390 12.09%
5 5 surveyable persons 444 2.45% 137,783 4.90%
6 6 surveyable persons 109 0.60% 40,588 1.44%
7 7 surveyable persons 34 0.19% 11,603 0.41%
8 8 surveyable persons 14 0.08% 4,858 0.17%
9 9 surveyable persons 1 0.01% 1,013 0.04%
10 10 surveyable persons 1 0.01% 834 0.03%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_participants

num_participants
Number of participants
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 1 participant 8,084 44.61% 1,071,806 38.08%
2 2 participants 8,442 46.58% 1,324,835 47.07%
3 3 participants 1,136 6.27% 286,880 10.19%
4 4 participants 355 1.96% 96,861 3.44%
5 5 participants 87 0.48% 27,090 0.96%
6 6 participants 15 0.08% 5,976 0.21%
7 7 participants 3 0.02% 1,146 0.04%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_adults

num_adults
Number of adults
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 1 adult 7,173 39.58% 940,374 33.41%
2 2 adults 8,870 48.95% 1,370,461 48.69%
3 3 adults 1,406 7.76% 338,107 12.01%
4 4 adults 492 2.71% 114,259 4.06%
5 5 adults 140 0.77% 40,669 1.44%
6 6 adults 31 0.17% 8,982 0.32%
7 7 adults 5 0.03% 936 0.03%
8 8 adults 3 0.02% 411 0.01%
9 9 adults 2 0.01% 397 0.01%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_kids

num_kids
Number of children
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 children 14,737 81.32% 2,057,879 73.11%
1 1 child 1,701 9.39% 341,852 12.15%
2 2 children 1,299 7.17% 291,568 10.36%
3 3 children 305 1.68% 90,658 3.22%
4 4 children 63 0.35% 24,796 0.88%
5 5 children 14 0.08% 6,895 0.24%
6 6 children 3 0.02% 949 0.03%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_workers

num_workers
Number of workers
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 workers 4,697 25.92% 646,321 22.96%
1 1 worker 6,909 38.12% 953,169 33.87%
2 2 workers 5,562 30.69% 934,944 33.22%
3 3 workers 714 3.94% 226,046 8.03%
4 4 workers 180 0.99% 48,507 1.72%
5 5 workers 47 0.26% 4,007 0.14%
6 6 workers 9 0.05% 542 0.02%
7 7 workers 2 0.01% 1,044 0.04%
8 8 workers 2 0.01% 16 0.00%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_students

num_students
Number of students
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 students 15,428 85.13% 2,253,685 80.07%
1 1 student 2,122 11.71% 428,818 15.24%
2 2 students 436 2.41% 103,021 3.66%
3 3 students 85 0.47% 17,159 0.61%
4 4 students 36 0.20% 7,837 0.28%
5 5 students 12 0.07% 3,266 0.12%
6 6 students 2 0.01% 810 0.03%
7 7 students 1 0.01% 0 0.00%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

num_vehicles

num_vehicles
Number of vehicles
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 (no vehicles in my household) 2,437 13.45% 325,071 11.55%
1 1 vehicle 7,984 44.06% 1,096,480 38.96%
2 2 vehicles 5,884 32.47% 1,002,864 35.63%
3 3 vehicles 1,338 7.38% 275,204 9.78%
4 4 vehicles 362 2.00% 84,670 3.01%
5 5 vehicles 85 0.47% 22,896 0.81%
6 6 vehicles 20 0.11% 5,610 0.20%
7 7 vehicles 6 0.03% 1,167 0.04%
8 8 or more vehicles 6 0.03% 635 0.02%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

income_detailed

income_detailed
Last year’s household income (detailed categories)
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Less than $15,000 1,135 6.26% 203,316 7.22%
2 $15,000-$24,999 954 5.26% 163,302 5.80%
3 $25,000-$34,999 846 4.67% 140,662 5.00%
4 $35,000-$49,999 1,237 6.83% 172,112 6.11%
5 $50,000-$74,999 2,204 12.16% 320,298 11.38%
6 $75,000-$99,999 2,242 12.37% 296,036 10.52%
7 $100,000-$149,999 2,904 16.02% 380,305 13.51%
8 $150,000-$199,999 1,861 10.27% 262,130 9.31%
9 $200,000-$249,999 1,044 5.76% 199,327 7.08%
10 $250,000 or more 1,391 7.68% 270,305 9.60%
999 Prefer not to answer 2,304 12.71% 406,802 14.45%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

income_followup

income_followup
Last year’s household income (broad categories)
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Under $25,000 63 3.00% 14,600 3.62%
2 $25,000-$49,999 81 3.86% 15,029 3.73%
3 $50,000-$74,999 59 2.81% 11,148 2.77%
4 $75,000-$99,999 77 3.67% 12,448 3.09%
5 $100,000-$199,999 126 6.01% 19,522 4.85%
6 $200,000 or more 140 6.68% 33,486 8.31%
999 Prefer not to answer 1,551 73.96% 296,535 73.62%
995 Missing Response 16,025 2,411,828
Total valid 2,097 100.00% 402,768 100.00%
Total missing 16,025 2,411,828
Total 18,122 2,814,595
Logic: if income_detailed = ‘Prefer not to answer’

income_broad

income_broad
Last year’s household income upcoded responses from income_detailed and income_broad
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Under $25,000 2,152 11.88% 381,218 13.54%
2 $25,000-$49,999 2,164 11.94% 327,803 11.65%
3 $50,000-$74,999 2,263 12.49% 331,446 11.78%
4 $75,000-$99,999 2,319 12.80% 308,484 10.96%
5 $100,000-$199,999 4,891 26.99% 661,957 23.52%
6 $200,000 or more 2,575 14.21% 503,118 17.88%
999 Prefer not to answer 1,758 9.70% 300,569 10.68%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

residence_rent_own

residence_rent_own
Current residence ownership
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Own/buying (paying a mortgage) 10,436 57.59% 1,697,353 60.31%
2 Rent 6,840 37.74% 960,106 34.11%
3 Housing provided by job or military 23 0.13% 4,155 0.15%
4 Provided by family, relative, or friend without payment or rent 185 1.02% 30,172 1.07%
997 Other 256 1.41% 39,121 1.39%
999 Prefer not to answer 382 2.11% 83,688 2.97%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

home_county

home_county
Home location– County
Value Label
Unweighted
Weighted
Count Percent Count Percent
25001 Barnstable County 652 3.60% 105,044 3.73%
25003 Berkshire County 367 2.03% 58,390 2.07%
25005 Bristol County 1,518 8.38% 230,388 8.19%
25007 Dukes County 148 0.82% 9,073 0.32%
25009 Essex County 1,633 9.01% 289,466 10.28%
25011 Franklin County 226 1.25% 31,307 1.11%
25013 Hampden County 1,178 6.50% 190,544 6.77%
25015 Hampshire County 523 2.89% 62,675 2.23%
25017 Middlesex County 4,498 24.82% 706,564 25.10%
25019 Nantucket County 106 0.58% 7,739 0.27%
25021 Norfolk County 1,549 8.55% 267,604 9.51%
25023 Plymouth County 1,148 6.33% 189,748 6.74%
25025 Suffolk County 2,464 13.60% 336,827 11.97%
25027 Worcester County 2,112 11.65% 329,227 11.70%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

residence_type

residence_type
Type of current residence
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Single-family house (detached house) 8,594 47.42% 1,459,613 51.86%
2 Single-family house attached to one or more houses (rowhouse or townhouse) 991 5.47% 185,848 6.60%
3 Building with 2-4 units (duplexes, triplexes, quads) 3,210 17.71% 462,111 16.42%
4 Building with 5-49 apartments/condos 3,088 17.04% 410,008 14.57%
5 Building with 50 or more apartments/condos 1,695 9.35% 211,258 7.51%
6 Senior or age-restricted apartments/condos 371 2.05% 53,676 1.91%
7 Manufactured home/mobile home/trailer 90 0.50% 15,918 0.57%
9 Dorm, group quarters, or institutional housing 37 0.20% 5,791 0.21%
997 Other (e.g., boat, RV, van) 46 0.25% 10,373 0.37%
Total valid 18,122 100.00% 2,814,595 100.00%
Total 18,122 2,814,595

8.2 Person

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 6,361 16.91% 0 0.00%
1 Yes 31,255 83.09% 6,759,612 100.00%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

num_trips

num_trips
Number of trips
Statistic
Unweighted
Weighted
Value Value
N 37,616.00 6,759,611.78
Min 0.00 0.00
P25 2.00 2.00
Median 4.00 4.00
Mean 12.44 13.01
P75 19.00 21.00
P95 49.00 49.00
Max 310.00 180.00
SD 17.24 17.32

num_days_complete

num_days_complete
Number of complete days
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 complete days 4,621 12.89% 0 0.00%
1 1 complete day 21,097 58.83% 4,958,389 73.35%
2 2 complete days 114 0.32% 15,867 0.23%
3 3 complete days 126 0.35% 16,716 0.25%
4 4 complete days 176 0.49% 29,619 0.44%
5 5 complete days 367 1.02% 62,778 0.93%
6 6 complete days 1,315 3.67% 235,969 3.49%
7 7 complete days 8,043 22.43% 1,440,275 21.31%
NA No value assigned 1,757 0
Total valid 35,859 100.00% 6,759,612 100.00%
Total missing 1,757 0
Total 37,616 6,759,612

hh_is_complete

hh_is_complete
Household day completion status
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 6,357 16.90% 0 0.00%
1 Yes 31,259 83.10% 6,759,612 100.00%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

is_participant

is_participant
Active participant (age 18+ and surveyable)
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 7,274 19.34% 1,343,373 19.87%
1 Yes 30,342 80.66% 5,416,239 80.13%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

num_bicycles

num_bicycles
Number of bicycles
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 bicycles 8,986 46.67% 1,606,548 44.13%
1 1 bicycle 3,489 18.12% 604,712 16.61%
2 2 bicycles 3,515 18.26% 661,238 18.17%
3 3 bicycles 1,401 7.28% 292,476 8.03%
4 4 bicycles 1,059 5.50% 257,860 7.08%
5 5 bicycles 381 1.98% 108,746 2.99%
6 6 bicycles 188 0.98% 50,683 1.39%
7 7 bicycles 84 0.44% 22,650 0.62%
8 8 or more bicycles 151 0.78% 35,251 0.97%
995 Missing Response 18,362 3,119,448
Total valid 19,254 100.00% 3,640,163 100.00%
Total missing 18,362 3,119,448
Total 37,616 6,759,612
Logic: if rMove or (rMove for Web and person 1)

bicycle_type

bicycle_type
Type of bicycle
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Standard bicycle Bicycle Type 1 9,919 96.6% 27,348 1,967,594 96.8% 4,725,996
Electric bicycle Bicycle Type 2 916 8.9% 27,348 191,151 9.4% 4,725,996
Other Bicycle Type 997 419 4.1% 27,348 85,606 4.2% 4,725,996
Logic: show if number of bicycles > 0

second_home_in_region

second_home_in_region
Second home in study region
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 515 23.26% 97,154 21.96%
1 Yes 1,699 76.74% 345,193 78.04%
995 Missing Response 35,402 6,317,265
Total valid 2,214 100.00% 442,347 100.00%
Total missing 35,402 6,317,265
Total 37,616 6,759,612
Logic: if has second_home

second_home_state

second_home_state
Second home location– State
Value Label
Unweighted
Weighted
Count Percent Count Percent
09 Connecticut 55 2.71% 10,214 2.47%
25 Massachusetts 1,699 83.74% 345,193 83.62%
33 New Hampshire 154 7.59% 33,186 8.04%
36 New York 32 1.58% 6,198 1.50%
44 Rhode Island 45 2.22% 7,064 1.71%
50 Vermont 44 2.17% 10,975 2.66%
NA No value assigned 35,587 6,346,782
Total valid 2,029 100.00% 412,830 100.00%
Total missing 35,587 6,346,782
Total 37,616 6,759,612
Logic: if has second_home

second_home_county

second_home_county
Second home location– County
Value Label
Unweighted
Weighted
Count Percent Count Percent
09001 Fairfield County 1 0.05% 0 0.00%
09003 Hartford County 20 0.99% 3,977 0.96%
09005 Litchfield County 4 0.20% 280 0.07%
09007 Middlesex County, Connecticut 1 0.05% 58 0.01%
09009 New Haven County 8 0.39% 854 0.21%
09011 New London County 9 0.44% 1,892 0.46%
09013 Tolland County 7 0.34% 1,733 0.42%
09015 Windham County, Connecticut 5 0.25% 1,422 0.34%
25001 Barnstable County 166 8.18% 26,475 6.41%
25003 Berkshire County 41 2.02% 6,306 1.53%
25005 Bristol County 116 5.72% 22,283 5.40%
25007 Dukes County 14 0.69% 2,343 0.57%
25009 Essex County 138 6.80% 23,709 5.74%
25011 Franklin County 28 1.38% 6,251 1.51%
25013 Hampden County 102 5.03% 24,943 6.04%
25015 Hampshire County 67 3.30% 12,053 2.92%
25017 Middlesex County 323 15.92% 62,086 15.04%
25019 Nantucket County 14 0.69% 4,798 1.16%
25021 Norfolk County 144 7.10% 34,017 8.24%
25023 Plymouth County 135 6.65% 31,466 7.62%
25025 Suffolk County 236 11.63% 50,078 12.13%
25027 Worcester County 175 8.62% 38,384 9.30%
33001 Belknap County 25 1.23% 4,867 1.18%
33003 Carroll County 27 1.33% 7,272 1.76%
33005 Cheshire County 7 0.34% 1,039 0.25%
33007 Coos County 1 0.05% 0 0.00%
33009 Grafton County 25 1.23% 4,913 1.19%
33011 Hillsborough County 22 1.08% 6,800 1.65%
33013 Merrimack County 5 0.25% 1,713 0.41%
33015 Rockingham County 30 1.48% 5,488 1.33%
33017 Strafford County 6 0.30% 655 0.16%
33019 Sullivan County, New Hampshire 6 0.30% 441 0.11%
36001 Albany County 1 0.05% 0 0.00%
36005 Bronx County 1 0.05% 524 0.13%
36029 Bullock County 1 0.05% 0 0.00%
36033 Franklin County, New York 2 0.10% 391 0.09%
36035 Fulton County 1 0.05% 713 0.17%
36047 Kings County 3 0.15% 783 0.19%
36055 Monroe County 1 0.05% 83 0.02%
36061 New York County 6 0.30% 1,239 0.30%
36065 Oneida County 1 0.05% 175 0.04%
36067 Broome County 2 0.10% 140 0.03%
36079 Putnam County 1 0.05% 152 0.04%
36081 Queens County 1 0.05% 1,098 0.27%
36083 Rensselaer County 1 0.05% 21 0.01%
36091 Saratoga County 4 0.20% 158 0.04%
36103 Suffolk County, New York 3 0.15% 587 0.14%
36111 Ulster County 1 0.05% 0 0.00%
36115 Washington County, New York 2 0.10% 133 0.03%
44001 Bristol County, Rhode Island 3 0.15% 954 0.23%
44003 Kent County 6 0.30% 574 0.14%
44005 Newport County 3 0.15% 941 0.23%
44007 Providence County 18 0.89% 2,932 0.71%
44009 Washington County, Rhode Island 15 0.74% 1,663 0.40%
50001 Addison County 2 0.10% 32 0.01%
50003 Bennington County 2 0.10% 969 0.23%
50005 Caledonia County 3 0.15% 93 0.02%
50007 Chittenden County 2 0.10% 2,171 0.53%
50009 Essex County, Vermont 2 0.10% 71 0.02%
50015 Lamoille County 2 0.10% 0 0.00%
50019 Orleans County 2 0.10% 34 0.01%
50021 Rutland County 4 0.20% 785 0.19%
50023 Washington County, Vermont 7 0.34% 689 0.17%
50025 Windham County, Vermont 4 0.20% 527 0.13%
50027 Windsor County 14 0.69% 5,604 1.36%
NA No value assigned 35,587 6,346,782
Total valid 2,029 100.00% 412,830 100.00%
Total missing 35,587 6,346,782
Total 37,616 6,759,612
Logic: if has second_home

is_proxy

is_proxy
Assigned proxy reporter
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 33,213 88.29% 5,617,136 83.10%
1 Yes 4,403 11.71% 1,142,476 16.90%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

has_proxy

has_proxy
Has a proxy
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 32,794 87.18% 5,416,239 80.13%
1 Yes 4,822 12.82% 1,343,373 19.87%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

has_phone

has_phone
Participant has phone
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 9,220 24.51% 1,779,711 26.33%
1 Yes 28,396 75.49% 4,979,900 73.67%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

phone_type

phone_type
Participant phone type
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 Does not have a smartphone 1,500 6.31% 315,388 7.17%
1 Has an Android phone 7,268 30.59% 1,348,539 30.66%
2 Has an Apple iPhone 14,534 61.17% 2,613,037 59.42%
3 Has other smartphone type 457 1.92% 120,951 2.75%
995 Missing Response 13,857 2,361,696
Total valid 23,759 100.00% 4,397,915 100.00%
Total missing 13,857 2,361,696
Total 37,616 6,759,612

relationship

relationship
Relationship to household person number 1
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 Self 18,122 48.18% 3,054,428 45.19%
1 Spouse, partner 8,846 23.52% 1,517,200 22.45%
2 Child, child in-law 6,962 18.51% 1,720,415 25.45%
3 Parent, parent in-law 1,134 3.01% 262,050 3.88%
4 Sibling, sibling in-law 479 1.27% 116,402 1.72%
5 Other relative (grandchild, cousin) 327 0.87% 89,117 1.32%
6 Nonrelative (friend, roommate, household help) 1,746 4.64% 0 0.00%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

age

age
Age of household member
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Age under 5 1,638 4.35% 343,059 5.08%
2 Age 5-15 3,310 8.80% 835,318 12.36%
3 Age 16-17 606 1.61% 164,996 2.44%
4 Age 18-24 2,554 6.79% 528,447 7.82%
5 Age 25-34 6,583 17.50% 926,633 13.71%
6 Age 35-44 5,991 15.93% 944,855 13.98%
7 Age 45-54 4,288 11.40% 899,169 13.30%
8 Age 55-64 5,052 13.43% 873,011 12.92%
9 Age 65-74 5,086 13.52% 856,797 12.68%
10 Age 75-84 2,167 5.76% 333,185 4.93%
11 Age 85 up 341 0.91% 54,143 0.80%
Total valid 37,616 100.00% 6,759,612 100.00%
Total 37,616 6,759,612

gender

gender
Gender
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Female 18,361 51.19% 3,272,412 48.41%
2 Male 15,991 44.58% 3,125,722 46.24%
4 Non-binary 316 0.88% 57,210 0.85%
997 Other/prefer to self-describe 84 0.23% 15,344 0.23%
999 Prefer not to answer 1,118 3.12% 288,924 4.27%
995 Missing Response 1,746 0
Total valid 35,870 100.00% 6,759,612 100.00%
Total missing 1,746 0
Total 37,616 6,759,612
Logic: if surveyable

race

race
Race
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
African American or Black Race 1 1,412 5.0% 9,629 415,324 7.7% 1,349,718
American Indian or Alaska Native Race 2 214 0.8% 9,629 90,755 1.7% 1,349,718
Asian Race 3 2,449 8.8% 9,629 456,179 8.4% 1,349,718
Native Hawaiian or other Pacific Islander Race 4 73 0.3% 9,629 23,812 0.4% 1,349,718
White Race 5 21,473 76.7% 9,629 3,763,627 69.6% 1,349,718
Other race Race 997 643 2.3% 9,629 293,069 5.4% 1,349,718
Prefer not to answer Race 999 2,494 8.9% 9,629 679,994 12.6% 1,349,718
Logic: if surveyable

ethnicity

ethnicity
Ethnicity
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Not of Hispanic, Latino, or Spanish origin Ethnicity 1 23,413 83.7% 9,629 4,134,795 76.4% 1,349,718
Mexican, Mexican American, Chicano Ethnicity 2 206 0.7% 9,629 57,367 1.1% 1,349,718
Puerto Rican Ethnicity 3 755 2.7% 9,629 217,328 4.0% 1,349,718
Cuban Ethnicity 4 85 0.3% 9,629 23,308 0.4% 1,349,718
Another Hispanic, Latino, or Spanish origin Ethnicity 997 1,012 3.6% 9,629 316,852 5.9% 1,349,718
Prefer not to answer Ethnicity 999 2,603 9.3% 9,629 685,496 12.7% 1,349,718
Logic: if surveyable

employment

employment
Employment status
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Employed full-time (paid) 16,274 49.82% 2,821,097 50.55%
2 Employed part-time (paid) 3,250 9.95% 814,786 14.60%
3 Self-employed 1,690 5.17% 120,124 2.15%
5 Not employed and not looking for work (e.g., retired, stay-at-home parent, student) 9,382 28.72% 1,441,689 25.83%
6 Unemployed and looking for work 1,440 4.41% 226,176 4.05%
7 Unpaid volunteer or intern 326 1.00% 107,909 1.93%
8 Employed, but not currently working (e.g., on leave, furloughed 100%) 306 0.94% 49,453 0.89%
995 Missing Response 4,948 1,178,377
Total valid 32,668 100.00% 5,581,235 100.00%
Total missing 4,948 1,178,377
Total 37,616 6,759,612
Logic: if 16 years or older

work_mode

work_mode
Mode of transportation to work
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walk (or jog/wheelchair) 880 6.06% 146,847 4.60%
26 Shuttle or vanpool 53 0.36% 12,982 0.41%
100 Household vehicle (or motorcycle) 10,074 69.32% 2,441,582 76.49%
101 Other vehicle (e.g., friend’s car, rental, carshare, work car) 314 2.16% 87,597 2.74%
102 Bus 486 3.34% 83,604 2.62%
103 Bicycle or e-bicycle 455 3.13% 51,248 1.61%
104 Other 395 2.72% 105,879 3.32%
105 Rail (e.g., train, subway) 1,665 11.46% 201,122 6.30%
106 Uber/Lyft, taxi, or car service 186 1.28% 55,160 1.73%
107 Mircomobility (e.g., scooter, moped, skateboard) 24 0.17% 6,062 0.19%
995 Missing Response 23,084 3,567,530
Total valid 14,532 100.00% 3,192,082 100.00%
Total missing 23,084 3,567,530
Total 37,616 6,759,612
Logic: if job_type IS NOT “work only from home” or “drive/bike/travel for work”

job_type

job_type
Work location type
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Go to one work location ONLY (outside of home) 9,582 47.13% 1,992,111 51.56%
2 2,046 10.06% 378,600 9.80%
3 Work ONLY from home or remotely (telework, self-employed) 3,017 14.84% 596,569 15.44%
4 Drive/bike/travel for work (driver, sales, deliveries) 366 1.80% 36,115 0.93%
5 Telework some days and travel to a work location some days (work location may vary) 5,319 26.16% 860,521 22.27%
995 Missing Response 17,286 2,895,696
Total valid 20,330 100.00% 3,863,916 100.00%
Total missing 17,286 2,895,696
Total 37,616 6,759,612
Logic: if employed full/part/self/volunteer

num_jobs

num_jobs
Number of jobs
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 1 job 18,120 87.89% 3,444,005 88.01%
2 2 jobs 2,101 10.19% 403,761 10.32%
3 3 jobs 300 1.46% 49,643 1.27%
4 4 jobs 58 0.28% 7,652 0.20%
5 5 jobs 17 0.08% 3,559 0.09%
6 6 or more jobs 20 0.10% 4,750 0.12%
995 Missing Response 17,000 2,846,242
Total valid 20,616 100.00% 3,913,369 100.00%
Total missing 17,000 2,846,242
Total 37,616 6,759,612
Logic: if employed full/part/furloughed/self/volunteer

commute_subsidy

commute_subsidy
Commute benefits provided by employer
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Free (fully subsidized) transit passes or fares Commute Subsidy 1 936 5.3% 19,901 173,071 4.5% 2,893,742
Discounted (partially subsidized) transit passes or fares Commute Subsidy 2 2,008 11.3% 19,901 340,190 8.8% 2,893,742
Free (fully subsidized) parking at work Commute Subsidy 3 5,145 29.0% 19,901 1,124,870 29.1% 2,893,742
Discounted (partially subsidized) parking at work Commute Subsidy 4 907 5.1% 19,901 172,070 4.5% 2,893,742
Ability to work from home Commute Subsidy 5 3,928 22.2% 19,901 699,993 18.1% 2,893,742
Free/discount transit fare Commute Subsidy 6 317 1.8% 19,901 59,788 1.5% 2,893,742
Free/discount vanpool Commute Subsidy 7 106 0.6% 19,901 22,364 0.6% 2,893,742
Cash or incentives for carpooling, walking, or biking to work Commute Subsidy 8 296 1.7% 19,901 60,997 1.6% 2,893,742
Free/discount Uber, Lyft, or other smartphone-app ride service Commute Subsidy 9 135 0.8% 19,901 29,590 0.8% 2,893,742
Free/discount carshare membership/use (e.g., ZipCar) Commute Subsidy 10 113 0.6% 19,901 23,825 0.6% 2,893,742
Free/discount shuttle service Commute Subsidy 11 606 3.4% 19,901 112,178 2.9% 2,893,742
Free/discount bikeshare membership Commute Subsidy 12 514 2.9% 19,901 63,799 1.7% 2,893,742
Free/discount bicycle tune-up/maintenance Commute Subsidy 13 216 1.2% 19,901 26,190 0.7% 2,893,742
Stipend for working at home (e.g., internet, equipment) Commute Subsidy 14 576 3.3% 19,901 116,280 3.0% 2,893,742
None of the above Commute Subsidy 996 7,334 41.4% 19,901 1,709,321 44.2% 2,893,742
Don’t know Commute Subsidy 998 905 5.1% 19,901 235,141 6.1% 2,893,742
Logic: if employed full/part/furloughed/volunteer

commute_subsidy_use

commute_subsidy_use
Commute benefit used
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Free (fully subsidized) transit passes or fares Commute Subsidy Use 1 614 6.5% 28,140 102,876 5.4% 4,838,204
Discounted (partially subsidized) transit passes or fares Commute Subsidy Use 2 846 8.9% 28,140 114,474 6.0% 4,838,204
Free (fully subsidized) parking at work Commute Subsidy Use 3 4,601 48.6% 28,140 1,018,301 53.0% 4,838,204
Discounted (partially subsidized) parking at work Commute Subsidy Use 4 466 4.9% 28,140 106,960 5.6% 4,838,204
Ability to work from home Commute Subsidy Use 5 3,603 38.0% 28,140 644,165 33.5% 4,838,204
Free/discount transit fare Commute Subsidy Use 6 135 1.4% 28,140 24,966 1.3% 4,838,204
Free/discount vanpool Commute Subsidy Use 7 22 0.2% 28,140 5,056 0.3% 4,838,204
Cash or incentives for carpooling, walking, or biking to work Commute Subsidy Use 8 84 0.9% 28,140 15,132 0.8% 4,838,204
Free/discount Uber, Lyft, or other smartphone-app service Commute Subsidy Use 9 66 0.7% 28,140 16,487 0.9% 4,838,204
Free/discount carshare membership/use (e.g., ZipCar) Commute Subsidy Use 10 25 0.3% 28,140 2,793 0.1% 4,838,204
Free/discount shuttle service Commute Subsidy Use 11 191 2.0% 28,140 36,208 1.9% 4,838,204
Free or discount bikeshare membership Commute Subsidy Use 12 159 1.7% 28,140 19,845 1.0% 4,838,204
Free or discount bicycle tune-up/maintenance Commute Subsidy Use 13 77 0.8% 28,140 6,471 0.3% 4,838,204
Stipend for working at home (e.g., internet, equipment Commute Subsidy Use 14 504 5.3% 28,140 103,935 5.4% 4,838,204
None of the above Commute Subsidy Use 996 943 10.0% 28,140 212,531 11.1% 4,838,204
Logic: if selected benefit in commute_subsidy

work_in_region

work_in_region
Work in study region
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 550 3.69% 115,845 4.06%
1 Yes 14,351 96.31% 2,736,787 95.94%
995 Missing Response 22,715 3,906,980
Total valid 14,901 100.00% 2,852,632 100.00%
Total missing 22,715 3,906,980
Total 37,616 6,759,612
Logic: if job_type is “only one work location” or “teleworks some days and travels to a work location some days”

work_state

work_state
Work location– State
Value Label
Unweighted
Weighted
Count Percent Count Percent
09 Connecticut 112 0.75% 26,589 0.93%
25 Massachusetts 14,351 96.52% 2,736,787 96.11%
33 New Hampshire 159 1.07% 35,386 1.24%
36 New York 23 0.15% 3,624 0.13%
44 Rhode Island 210 1.41% 42,694 1.50%
50 Vermont 13 0.09% 2,498 0.09%
NA No value assigned 22,748 3,912,035
Total valid 14,868 100.00% 2,847,577 100.00%
Total missing 22,748 3,912,035
Total 37,616 6,759,612
Logic: if job_type is “only one work location” or “teleworks some days and travels to a work location some days”

work_county

work_county
Work location– County
Value Label
Unweighted
Weighted
Count Percent Count Percent
09003 Hartford County 77 0.52% 19,645 0.69%
09005 Litchfield County 5 0.03% 1,115 0.04%
09007 Middlesex County, Connecticut 1 0.01% 58 0.00%
09009 New Haven County 7 0.05% 778 0.03%
09011 New London County 7 0.05% 2,059 0.07%
09013 Tolland County 7 0.05% 815 0.03%
09015 Windham County, Connecticut 8 0.05% 2,119 0.07%
25001 Barnstable County 373 2.51% 89,589 3.15%
25003 Berkshire County 219 1.47% 40,513 1.42%
25005 Bristol County 756 5.08% 146,068 5.13%
25007 Dukes County 98 0.66% 5,750 0.20%
25009 Essex County 962 6.47% 228,695 8.03%
25011 Franklin County 131 0.88% 21,427 0.75%
25013 Hampden County 739 4.97% 149,235 5.24%
25015 Hampshire County 370 2.49% 67,519 2.37%
25017 Middlesex County 3,898 26.22% 737,491 25.90%
25019 Nantucket County 79 0.53% 7,275 0.26%
25021 Norfolk County 1,127 7.58% 252,836 8.88%
25023 Plymouth County 677 4.55% 153,925 5.41%
25025 Suffolk County 3,502 23.55% 556,554 19.54%
25027 Worcester County 1,420 9.55% 279,909 9.83%
33001 Belknap County 1 0.01% 22 0.00%
33003 Carroll County 2 0.01% 60 0.00%
33005 Cheshire County 5 0.03% 1,166 0.04%
33011 Hillsborough County 70 0.47% 16,901 0.59%
33013 Merrimack County 4 0.03% 504 0.02%
33015 Rockingham County 69 0.46% 14,523 0.51%
33017 Strafford County 7 0.05% 2,107 0.07%
33019 Sullivan County, New Hampshire 1 0.01% 104 0.00%
36001 Albany County 2 0.01% 77 0.00%
36021 Columbia County 1 0.01% 0 0.00%
36029 Erie County 1 0.01% 0 0.00%
36055 Monroe County 1 0.01% 83 0.00%
36059 Nassau County 1 0.01% 91 0.00%
36061 New York County 11 0.07% 2,348 0.08%
36065 Oneida County 1 0.01% 175 0.01%
36067 Onondaga County 2 0.01% 47 0.00%
36081 Queens County 1 0.01% 727 0.03%
36093 Schenectady County 1 0.01% 78 0.00%
36119 Montgomery County 1 0.01% 0 0.00%
44001 Bristol County, Rhode Island 6 0.04% 1,011 0.04%
44003 Kent County 14 0.09% 4,782 0.17%
44005 Newport County 35 0.24% 9,986 0.35%
44007 Providence County 151 1.02% 26,481 0.93%
44009 Washington County, Rhode Island 4 0.03% 435 0.02%
50003 Bennington County 3 0.02% 76 0.00%
50007 Chittenden County 3 0.02% 1,106 0.04%
50025 Windham County, Vermont 6 0.04% 1,086 0.04%
50027 Windsor County 1 0.01% 230 0.01%
NA No value assigned 22,748 3,912,035
Total valid 14,868 100.00% 2,847,577 100.00%
Total missing 22,748 3,912,035
Total 37,616 6,759,612
Logic: if job_type is “only one work location” or “teleworks some days and travels to a work location some days”

education

education
Highest level of education completed
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Less than high school 467 1.78% 154,014 2.86%
2 High school graduate/GED 2,554 9.75% 830,013 15.42%
3 Some college 2,891 11.04% 602,625 11.20%
4 Vocational/technical training 696 2.66% 206,493 3.84%
5 Associate degree 1,461 5.58% 280,729 5.22%
6 Bachelor’s degree 8,058 30.77% 1,455,989 27.05%
7 Graduate/post-graduate degree 9,233 35.25% 1,573,602 29.24%
999 Prefer not to answer 831 3.17% 278,949 5.18%
995 Missing Response 11,425 1,377,197
Total valid 26,191 100.00% 5,382,415 100.00%
Total missing 11,425 1,377,197
Total 37,616 6,759,612
Logic: if participant

student

student
Student status and location
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 Full-time student, currently attending some or all classes in-person 2,156 6.60% 468,359 8.39%
1 Part-time student, currently attending some or all classes in-person 542 1.66% 115,454 2.07%
2 Not a student 29,196 89.37% 4,838,956 86.70%
3 Part-time student, ONLY online classes 529 1.62% 100,687 1.80%
4 Full-time student, ONLY online classes 245 0.75% 57,779 1.04%
995 Missing Response 4,948 1,178,377
Total valid 32,668 100.00% 5,581,235 100.00%
Total missing 4,948 1,178,377
Total 37,616 6,759,612
Logic: if surveyable

school_mode

school_mode
Mode of transportation to school
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walk (or jog/wheelchair) 698 13.12% 194,883 12.78%
24 School bus 1,354 25.45% 394,970 25.90%
100 Household vehicle (or motorcycle) 2,368 44.51% 672,154 44.07%
101 Other vehicle (e.g., friend’s car, rental, work car) 114 2.14% 37,341 2.45%
102 Bus, shuttle, or vanpool 336 6.32% 94,756 6.21%
103 Bicycle or e-bicycle 141 2.65% 33,158 2.17%
104 Other 87 1.64% 27,895 1.83%
105 Rail (e.g., train, subway) 174 3.27% 51,546 3.38%
106 Uber/Lyft, taxi, or car service 36 0.68% 14,730 0.97%
107 Micromobility (e.g., scooter moped, skateboard) 12 0.23% 3,789 0.25%
995 Missing Response 32,296 5,234,389
Total valid 5,320 100.00% 1,525,223 100.00%
Total missing 32,296 5,234,389
Total 37,616 6,759,612
Logic: if adult student and school_freq is not never or child who attends school or daycare

school_type

school_type
Type of school attends
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Cared for at home 580 7.36% 124,310 6.47%
2 Daycare outside home 615 7.81% 131,643 6.85%
3 Preschool 459 5.83% 93,171 4.85%
4 Home school 129 1.64% 33,651 1.75%
5 Elementary school (public, private, charter) 1,722 21.86% 443,334 23.08%
6 Middle school (public, private, charter) 865 10.98% 219,732 11.44%
7 High school (public, private, charter) 1,148 14.57% 298,245 15.53%
10 Vocational/technical school 121 1.54% 37,088 1.93%
11 2-year college 345 4.38% 77,811 4.05%
12 4-year college 635 8.06% 186,775 9.72%
13 Graduate or professional school 927 11.77% 206,693 10.76%
997 Other 331 4.20% 68,204 3.55%
995 Missing Response 29,739 4,838,956
Total valid 7,877 100.00% 1,920,656 100.00%
Total missing 29,739 4,838,956
Total 37,616 6,759,612
Logic: if age 0-15 or adult student

school_freq

school_freq
Frequency of travel to school
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 163 2.97% 60,188 3.80%
2 5 days a week 4,066 74.10% 1,151,075 72.62%
3 4 days a week 224 4.08% 65,095 4.11%
4 3 days a week 276 5.03% 71,328 4.50%
5 2 days a week 244 4.45% 67,612 4.27%
6 1 day a week 122 2.22% 39,936 2.52%
7 1-3 days a month 58 1.06% 16,753 1.06%
8 Less than monthly 78 1.42% 21,579 1.36%
996 Never 256 4.67% 91,539 5.77%
995 Missing Response 32,129 5,174,507
Total valid 5,487 100.00% 1,585,104 100.00%
Total missing 32,129 5,174,507
Total 37,616 6,759,612
Logic: if adult student or child who attends school or daycare

remote_class_freq

remote_class_freq
Frequency of remote schooling
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 92 1.39% 24,281 1.30%
2 5 days a week 285 4.29% 75,531 4.05%
3 4 days a week 67 1.01% 21,571 1.16%
4 3 days a week 146 2.20% 49,065 2.63%
5 2 days a week 213 3.21% 55,260 2.97%
6 1 day a week 278 4.19% 83,025 4.46%
7 1-3 days a month 92 1.39% 26,356 1.41%
8 Less than monthly 214 3.22% 56,474 3.03%
996 Never 5,249 79.10% 1,471,459 78.98%
995 Missing Response 30,980 4,896,589
Total valid 6,636 100.00% 1,863,022 100.00%
Total missing 30,980 4,896,589
Total 37,616 6,759,612
Logic: if adult student and school_freq is not 6-7 days or child who is not cared for at home or attending daycare and school_freq is not 6 or 7 days

school_in_region

school_in_region
School in study region
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 186 2.88% 50,282 3.12%
1 Yes 6,281 97.12% 1,560,355 96.88%
995 Missing Response 31,149 5,148,975
Total valid 6,467 100.00% 1,610,637 100.00%
Total missing 31,149 5,148,975
Total 37,616 6,759,612
Logic: if attends school in person

school_state

school_state
School location– State
Value Label
Unweighted
Weighted
Count Percent Count Percent
09 Connecticut 31 0.48% 8,395 0.53%
25 Massachusetts 6,281 98.02% 1,560,355 97.98%
33 New Hampshire 26 0.41% 6,009 0.38%
36 New York 16 0.25% 5,642 0.35%
44 Rhode Island 45 0.70% 8,368 0.53%
50 Vermont 9 0.14% 3,747 0.24%
NA No value assigned 31,208 5,167,096
Total valid 6,408 100.00% 1,592,515 100.00%
Total missing 31,208 5,167,096
Total 37,616 6,759,612
Logic: if attends school in person

school_county

school_county
School location– County
Value Label
Unweighted
Weighted
Count Percent Count Percent
09001 Fairfield County 3 0.05% 1,158 0.07%
09003 Hartford County 10 0.16% 2,112 0.13%
09009 New Haven County 4 0.06% 277 0.02%
09013 Tolland County 9 0.14% 1,730 0.11%
09015 Windham County, Connecticut 5 0.08% 3,118 0.20%
25001 Barnstable County 110 1.72% 34,649 2.18%
25003 Berkshire County 69 1.08% 15,069 0.95%
25005 Bristol County 458 7.15% 109,482 6.87%
25007 Dukes County 18 0.28% 1,284 0.08%
25009 Essex County 583 9.10% 155,386 9.76%
25011 Franklin County 64 1.00% 12,777 0.80%
25013 Hampden County 446 6.96% 93,753 5.89%
25015 Hampshire County 232 3.62% 47,161 2.96%
25017 Middlesex County 1,584 24.72% 414,857 26.05%
25019 Nantucket County 27 0.42% 3,000 0.19%
25021 Norfolk County 554 8.65% 144,727 9.09%
25023 Plymouth County 402 6.27% 99,477 6.25%
25025 Suffolk County 869 13.56% 239,075 15.01%
25027 Worcester County 865 13.50% 189,658 11.91%
33001 Belknap County 1 0.02% 383 0.02%
33011 Hillsborough County 10 0.16% 2,318 0.15%
33015 Rockingham County 10 0.16% 1,937 0.12%
33017 Strafford County 5 0.08% 1,372 0.09%
36005 Bronx County 1 0.02% 28 0.00%
36021 Columbia County 4 0.06% 2,445 0.15%
36027 Dutchess County 1 0.02% 53 0.00%
36059 Nassau County 1 0.02% 59 0.00%
36061 New York County 3 0.05% 1,118 0.07%
36067 Onondaga County 2 0.03% 1,401 0.09%
36085 Richmond County 1 0.02% 19 0.00%
36103 Suffolk County, New York 1 0.02% 29 0.00%
36111 Ulster County 1 0.02% 338 0.02%
36119 Westchester County 1 0.02% 151 0.01%
44001 Bristol County, Rhode Island 5 0.08% 421 0.03%
44003 Kent County 3 0.05% 224 0.01%
44005 Newport County 4 0.06% 413 0.03%
44007 Providence County 25 0.39% 6,776 0.43%
44009 Washington County, Rhode Island 8 0.12% 534 0.03%
50003 Dallas County 1 0.02% 0 0.00%
50005 Caledonia County 1 0.02% 783 0.05%
50007 Chittenden County 6 0.09% 2,964 0.19%
50023 Polk County 1 0.02% 0 0.00%
NA No value assigned 31,208 5,167,096
Total valid 6,408 100.00% 1,592,515 100.00%
Total missing 31,208 5,167,096
Total 37,616 6,759,612
Logic: if attends school in person

second_home

second_home
Regularly spends the night at a second home (e.g., another parent or grandparent’s house, partner or spouse’s home, or a vacation home)
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 Does not regularly spend night at second home 33,656 93.83% 6,317,265 93.46%
1 Regularly spends night at second home 2,214 6.17% 442,347 6.54%
995 Missing Response 1,746 0
Total valid 35,870 100.00% 6,759,612 100.00%
Total missing 1,746 0
Total 37,616 6,759,612

can_drive

can_drive
Household member drives
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No, does not drive 3,963 12.80% 795,399 14.25%
1 Yes, drives 26,991 87.20% 4,785,835 85.75%
995 Missing Response 6,662 1,178,377
Total valid 30,954 100.00% 5,581,235 100.00%
Total missing 6,662 1,178,377
Total 37,616 6,759,612
Logic: if surveyable and 16 or over

vehicle

vehicle
Vehicle driven most often
Value Label
Unweighted
Weighted
Count Percent Count Percent
6 Household vehicle 1 14,653 66.52% 2,712,106 60.32%
7 Household vehicle 2 5,575 25.31% 1,246,157 27.72%
8 Household vehicle 3 873 3.96% 265,603 5.91%
9 Household vehicle 4 143 0.65% 56,558 1.26%
10 Household vehicle 5 32 0.15% 12,614 0.28%
11 Household vehicle 6 7 0.03% 2,027 0.05%
12 Household vehicle 7 3 0.01% 229 0.01%
18 A carshare vehicle (e.g., ZipCar) 20 0.09% 7,385 0.16%
996 None (I do not drive a vehicle) 345 1.57% 100,287 2.23%
997 Other vehicle 376 1.71% 93,024 2.07%
995 Missing Response 15,589 2,263,622
Total valid 22,027 100.00% 4,495,989 100.00%
Total missing 15,589 2,263,622
Total 37,616 6,759,612
Logic: if household has vehicle and person drives

transit_freq

transit_freq
Frequency of transit trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 615 2.33% 113,016 2.09%
2 5 days a week 910 3.45% 150,156 2.78%
3 4 days a week 611 2.32% 104,343 1.93%
4 3 days a week 979 3.72% 165,283 3.06%
5 2 days a week 896 3.40% 173,234 3.20%
6 1 day a week 940 3.57% 158,319 2.93%
7 1-3 days a month 2,079 7.89% 392,880 7.27%
8 Less than monthly 7,408 28.11% 1,501,751 27.78%
9 Never 11,914 45.21% 2,646,305 48.96%
995 Missing Response 11,264 1,354,325
Total valid 26,352 100.00% 5,405,287 100.00%
Total missing 11,264 1,354,325
Total 37,616 6,759,612
Logic: if participant

tnc_freq

tnc_freq
Frequency of TNC trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 78 0.62% 26,977 1.11%
2 5 days a week 94 0.74% 26,143 1.08%
3 4 days a week 99 0.78% 28,341 1.17%
4 3 days a week 217 1.72% 45,579 1.88%
5 2 days a week 341 2.70% 77,206 3.19%
6 1 day a week 522 4.13% 96,383 3.98%
7 1-3 days a month 2,852 22.57% 516,289 21.31%
8 Less than monthly 8,433 66.74% 1,605,379 66.28%
995 Missing Response 24,980 4,337,317
Total valid 12,636 100.00% 2,422,295 100.00%
Total missing 24,980 4,337,317
Total 37,616 6,759,612
Logic: if uses smartphone-app ride services

bike_freq

bike_freq
Frequency of bike trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 371 1.42% 65,897 1.23%
2 5 days a week 339 1.30% 53,781 1.00%
3 4 days a week 340 1.30% 57,521 1.07%
4 3 days a week 614 2.35% 115,527 2.15%
5 2 days a week 632 2.42% 123,733 2.30%
6 1 day a week 711 2.72% 148,050 2.75%
7 1-3 days a month 1,807 6.92% 357,144 6.64%
8 Less than monthly 5,980 22.89% 1,247,989 23.21%
996 Never 15,336 58.69% 3,208,010 59.65%
995 Missing Response 11,486 1,381,960
Total valid 26,130 100.00% 5,377,652 100.00%
Total missing 11,486 1,381,960
Total 37,616 6,759,612
Logic: if participant

vanpool_freq

vanpool_freq
Frequency of vanpool trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 5 2.73% 1,953 4.49%
2 5 days a week 14 7.65% 5,351 12.31%
3 4 days a week 5 2.73% 979 2.25%
4 3 days a week 15 8.20% 3,919 9.02%
5 2 days a week 9 4.92% 2,681 6.17%
6 1 day a week 14 7.65% 4,594 10.57%
7 1-3 days a month 26 14.21% 7,463 17.17%
8 Less than monthly 95 51.91% 16,518 38.01%
995 Missing Response 37,433 6,716,154
Total valid 183 100.00% 43,458 100.00%
Total missing 37,433 6,716,154
Total 37,616 6,759,612
Logic: if uses vanpool

bikeshare_freq

bikeshare_freq
Frequency of bike-share trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 41 2.89% 5,884 2.72%
2 5 days a week 34 2.39% 5,378 2.48%
3 4 days a week 35 2.46% 4,573 2.11%
4 3 days a week 61 4.29% 11,262 5.20%
5 2 days a week 71 5.00% 10,483 4.84%
6 1 day a week 81 5.70% 16,591 7.66%
7 1-3 days a month 270 19.00% 33,143 15.30%
8 Less than monthly 828 58.27% 129,339 59.70%
995 Missing Response 36,195 6,542,959
Total valid 1,421 100.00% 216,653 100.00%
Total missing 36,195 6,542,959
Total 37,616 6,759,612
Logic: if uses bikeshare

scootshare_freq

scootshare_freq
Frequency of scooter-share trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 1 0.29% 605 1.06%
2 5 days a week 1 0.29% 19 0.03%
3 4 days a week 1 0.29% 406 0.71%
4 3 days a week 4 1.16% 1,295 2.26%
5 2 days a week 3 0.87% 656 1.14%
6 1 day a week 3 0.87% 995 1.74%
7 1-3 days a month 16 4.62% 4,392 7.66%
8 Less than monthly 317 91.62% 48,979 85.41%
995 Missing Response 37,270 6,702,265
Total valid 346 100.00% 57,347 100.00%
Total missing 37,270 6,702,265
Total 37,616 6,759,612
Logic: if uses scooter share

walk_freq

walk_freq
Frequency of walk trips
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 7,774 29.50% 1,420,733 26.28%
2 5 days a week 3,229 12.25% 648,088 11.99%
3 4 days a week 1,989 7.55% 385,005 7.12%
4 3 days a week 2,767 10.50% 556,070 10.29%
5 2 days a week 1,993 7.56% 428,385 7.93%
6 1 day a week 1,408 5.34% 286,319 5.30%
7 1-3 days a month 1,620 6.15% 343,158 6.35%
8 Less than monthly 5,572 21.14% 1,337,530 24.74%
995 Missing Response 11,264 1,354,325
Total valid 26,352 100.00% 5,405,287 100.00%
Total missing 11,264 1,354,325
Total 37,616 6,759,612
Logic: if participant

micromobility_devices

micromobility_devices
Micromobility device used
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Scooter Micromobility Devices 1 451 2.3% 18,367 92,481 2.5% 3,119,926
Moped Micromobility Devices 2 77 0.4% 18,367 22,679 0.6% 3,119,926
Skateboard or rollerblades Micromobility Devices 3 395 2.1% 18,367 88,525 2.4% 3,119,926
None Micromobility Devices 996 18,233 94.7% 18,367 3,428,633 94.2% 3,119,926
Other Micromobility Devices 997 192 1.0% 18,367 38,477 1.1% 3,119,926
Logic: if rMove or (rMove for Web and person 1)

share

share
Share service used
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Uber, Lyft, or other smartphone-app ride service Share 2 12,636 48.0% 11,265 2,422,295 44.8% 1,354,482
Carshare (e.g., Zipcar) Share 3 540 2.0% 11,265 91,937 1.7% 1,354,482
Peer-to-peer car rental (e.g., Turo) Share 4 392 1.5% 11,265 79,819 1.5% 1,354,482
Bikeshare or bike rental service Share 5 1,421 5.4% 11,265 216,653 4.0% 1,354,482
Vanpool Share 6 183 0.7% 11,265 43,458 0.8% 1,354,482
Scooter share (e.g., Bird, Lime) Share 7 346 1.3% 11,265 57,347 1.1% 1,354,482
None of the above Share 996 13,370 50.7% 11,265 2,905,296 53.8% 1,354,482
Logic: if participant

transit_pass

transit_pass
Ownership/type of transit pass
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 21,033 79.82% 4,466,017 82.62%
1 Yes 5,318 20.18% 939,236 17.38%
995 Missing Response 11,265 1,354,359
Total valid 26,351 100.00% 5,405,253 100.00%
Total missing 11,265 1,354,359
Total 37,616 6,759,612
Logic: if participant

disability

disability
Disability status
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 23,001 87.82% 4,660,243 86.58%
1 Yes 2,004 7.65% 399,484 7.42%
999 Prefer not to answer 1,186 4.53% 322,687 6.00%
995 Missing Response 11,425 1,377,197
Total valid 26,191 100.00% 5,382,415 100.00%
Total missing 11,425 1,377,197
Total 37,616 6,759,612
Logic: if participant

participate

participate
Willingness to participate in future studies
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 6,256 23.74% 1,509,453 27.93%
1 Yes 20,094 76.26% 3,895,535 72.07%
995 Missing Response 11,266 1,354,623
Total valid 26,350 100.00% 5,404,988 100.00%
Total missing 11,266 1,354,623
Total 37,616 6,759,612
Logic: if participant

barriers

barriers
Barrier to making trips
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Parking was too expensive Barriers 1 1,307 5.0% 11,266 287,038 5.3% 1,354,623
Vehicle was not working Barriers 2 533 2.0% 11,266 135,621 2.5% 1,354,623
Vehicle was not available Barriers 3 909 3.4% 11,266 226,251 4.2% 1,354,623
Bus was not running Barriers 4 639 2.4% 11,266 124,018 2.3% 1,354,623
No access to app-based rides (e.g., Uber, Lyft) Barriers 5 195 0.7% 11,266 44,792 0.8% 1,354,623
Inadequate bike parking Barriers 6 259 1.0% 11,266 39,444 0.7% 1,354,623
Concerns about safety Barriers 7 652 2.5% 11,266 133,189 2.5% 1,354,623
This has not happened in the past 7 days Barriers 8 19,410 73.7% 11,266 3,904,961 72.2% 1,354,623
Not enough biking paths/lanes Barriers 9 624 2.4% 11,266 97,005 1.8% 1,354,623
No connection to transit Barriers 10 1,384 5.3% 11,266 240,435 4.4% 1,354,623
Other Barriers 997 1,511 5.7% 11,266 281,936 5.2% 1,354,623
Prefer not to answer Barriers 999 1,392 5.3% 11,266 375,305 6.9% 1,354,623

bike_comfort_lane

bike_comfort_lane
Comfort level riding a bike on a major street with four lanes and a wide bike lane physically separated from traffic by a raised curb, planters, or parked cars
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 5,185 26.94% 999,917 27.47%
2 Somewhat comfortable 5,444 28.28% 1,017,879 27.97%
3 Somewhat uncomfortable 3,214 16.70% 593,735 16.31%
4 Very uncomfortable 5,406 28.08% 1,028,155 28.25%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_local

bike_comfort_local
Comfort level riding a bike on a quiet residential street with bicycle route markings, wide speed humps, and other things to discourage and slow down car traffic
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 9,824 51.04% 1,885,801 51.81%
2 Somewhat comfortable 4,671 24.27% 842,882 23.16%
3 Somewhat uncomfortable 1,747 9.08% 325,418 8.94%
4 Very uncomfortable 3,007 15.62% 585,585 16.09%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_major

bike_comfort_major
Comfort level riding a bike on a major street with four lanes and no bike lane
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 1,845 9.58% 402,348 11.05%
2 Somewhat comfortable 1,072 5.57% 225,779 6.20%
3 Somewhat uncomfortable 2,098 10.90% 387,134 10.64%
4 Very uncomfortable 14,234 73.95% 2,624,425 72.11%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_minor

bike_comfort_minor
Comfort level riding a bike on a minor street with two lanes and no bike lane
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 2,508 13.03% 519,592 14.28%
2 Somewhat comfortable 4,536 23.56% 839,196 23.06%
3 Somewhat uncomfortable 5,884 30.57% 1,047,824 28.79%
4 Very uncomfortable 6,321 32.84% 1,233,073 33.88%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_neighborhood

bike_comfort_neighborhood
Comfort level riding a bike on a quiet residential street
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 9,321 48.42% 1,802,397 49.52%
2 Somewhat comfortable 5,606 29.12% 1,025,269 28.17%
3 Somewhat uncomfortable 1,713 8.90% 298,738 8.21%
4 Very uncomfortable 2,609 13.55% 513,282 14.10%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_paths

bike_comfort_paths
Comfort level riding a bike on a path or trail separate from the street
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 12,031 62.50% 2,240,073 61.55%
2 Somewhat comfortable 3,197 16.61% 621,699 17.08%
3 Somewhat uncomfortable 1,022 5.31% 193,601 5.32%
4 Very uncomfortable 2,999 15.58% 584,312 16.05%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_street

bike_comfort_street
Comfort level riding a bike on a minor street with two lanes and a striped bike lane
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 4,057 21.08% 787,733 21.64%
2 Somewhat comfortable 6,532 33.93% 1,208,222 33.20%
3 Somewhat uncomfortable 4,413 22.93% 849,892 23.35%
4 Very uncomfortable 4,247 22.06% 793,840 21.81%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_comfort_striped

bike_comfort_striped
Comfort level riding a bike on a major street with four lanes and a striped bike lane
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Very comfortable 1,864 9.68% 397,197 10.91%
2 Somewhat comfortable 3,144 16.33% 624,084 17.15%
3 Somewhat uncomfortable 5,844 30.36% 1,070,340 29.41%
4 Very uncomfortable 8,397 43.62% 1,548,065 42.53%
NA No value assigned 18,367 3,119,926
Total valid 19,249 100.00% 3,639,686 100.00%
Total missing 18,367 3,119,926
Total 37,616 6,759,612
Logic: if rMove or bMove person 1

bike_factors

bike_factors
Factor to increase biking frequency
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Safer physical riding conditions for bicyclists (e.g., bike paths separated from motor vehicles) Bike Factors 1 9,708 37.2% 11,488 1,809,828 33.7% 1,382,113
An expanded bike network with more routes between my origin and destination Bike Factors 2 6,592 25.2% 11,488 1,200,157 22.3% 1,382,113
Better knowledge of the best bike route to my destination Bike Factors 3 2,711 10.4% 11,488 526,131 9.8% 1,382,113
More public secure bike storage Bike Factors 4 3,525 13.5% 11,488 639,037 11.9% 1,382,113
More attractive routes (visually pleasing, improve non-traffic related safety) Bike Factors 5 4,041 15.5% 11,488 762,196 14.2% 1,382,113
Lower cost electric bikes or similar equipment (e.g., electric scooters) Bike Factors 6 2,492 9.5% 11,488 506,174 9.4% 1,382,113
Expanded bike share system Bike Factors 7 1,126 4.3% 11,488 199,890 3.7% 1,382,113
Better maintenance of existing bicycle infrastructure (e.g., clearing paths of debris and/or snow during the winter) Bike Factors 8 2,975 11.4% 11,488 520,146 9.7% 1,382,113
Don’t have access to a bike, but may in the future Bike Factors 9 2,450 9.4% 11,488 477,218 8.9% 1,382,113
Don’t have access to a bike and will not in the future Bike Factors 10 2,339 9.0% 11,488 474,398 8.8% 1,382,113
Other, specify Bike Factors 11 1,833 7.0% 11,488 341,433 6.3% 1,382,113
None of the above Bike Factors 12 9,385 35.9% 11,488 2,090,915 38.9% 1,382,113

bike_purpose

bike_purpose
Purpose used bicycle for in the past 30 days
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
To go to/from grocery/food shopping Bike Purpose 1 1,168 24.3% 32,805 187,440 20.3% 5,838,353
To go to/from other shopping (e.g., pharmacy) Bike Purpose 2 1,096 22.8% 32,805 166,794 18.1% 5,838,353
To go to/from medical appointment Bike Purpose 3 479 10.0% 32,805 73,470 8.0% 5,838,353
To visit friends or relatives Bike Purpose 4 1,068 22.2% 32,805 190,001 20.6% 5,838,353
To go to/from work Bike Purpose 5 1,018 21.2% 32,805 145,278 15.8% 5,838,353
For other work-related reason Bike Purpose 6 218 4.5% 32,805 36,932 4.0% 5,838,353
Other Bike Purpose 7 336 7.0% 32,805 54,701 5.9% 5,838,353
For exercise or recreation Bike Purpose 8 4,146 86.2% 32,805 798,119 86.6% 5,838,353
Logic: if bike_freq > never or less than monthly  

bike_safety

bike_safety
Safety concerns preventing bicycle use
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
No safe route to ride my bike Bike Safety 1 2,179 55.1% 33,658 409,734 58.2% 6,055,623
Biking routes don’t put enough seperations between me and moving cars Bike Safety 2 2,194 55.4% 33,658 371,128 52.7% 6,055,623
Crossing intersections is too stressful Bike Safety 3 1,661 42.0% 33,658 287,887 40.9% 6,055,623
Concerns about distracted or impaired drivers Bike Safety 4 2,184 55.2% 33,658 386,395 54.9% 6,055,623
Speed of vehicle traffic is too high Bike Safety 5 1,807 45.7% 33,658 307,386 43.7% 6,055,623
Poor bike path conditions or no bike paths available Bike Safety 6 1,655 41.8% 33,658 288,539 41.0% 6,055,623
Poor or no lighting near bike paths Bike Safety 7 543 13.7% 33,658 108,948 15.5% 6,055,623
Other, specify Bike Safety 8 684 17.3% 33,658 118,720 16.9% 6,055,623
Logic: if why_no_bike = safety concern

bike_store

bike_store
Bicycle storage location
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Inside house/apartment (includes garage, porch, storage area) Bike Store 1 9,179 89.4% 27,348 1,830,334 90.0% 4,725,996
Bike rack Bike Store 2 348 3.4% 27,348 68,176 3.4% 4,725,996
Bike locker Bike Store 3 57 0.6% 27,348 17,239 0.8% 4,725,996
Secured bike room Bike Store 4 256 2.5% 27,348 38,807 1.9% 4,725,996
Locked to other object (e.g., post, tree) Bike Store 5 202 2.0% 27,348 38,210 1.9% 4,725,996
In a parking garage/lot Bike Store 6 258 2.5% 27,348 46,412 2.3% 4,725,996
Unlocked on-street Bike Store 7 54 0.5% 27,348 15,861 0.8% 4,725,996
Other Bike Store 997 395 3.8% 27,348 77,301 3.8% 4,725,996
Logic: if household has at least one bike

carshare_freq

carshare_freq
Carshare use frequency
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 7 1.30% 3,013 3.28%
2 5 days a week 11 2.04% 3,638 3.96%
3 4 days a week 7 1.30% 2,112 2.30%
4 3 days a week 9 1.67% 1,290 1.40%
5 2 days a week 13 2.41% 4,477 4.87%
6 1 day a week 16 2.96% 2,993 3.26%
7 1-3 days a month 93 17.22% 12,167 13.23%
8 Less than monthly 384 71.11% 62,247 67.71%
995 Missing Response 37,076 6,667,675
Total valid 540 100.00% 91,937 100.00%
Total missing 37,076 6,667,675
Total 37,616 6,759,612
Logic: if uses carshare

commute_days

commute_days
Day commuted to workplace last week
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Monday Commute Days 1 8,451 58.1% 23,081 1,900,053 59.6% 3,571,608
Tuesday Commute Days 2 9,896 68.1% 23,081 2,118,335 66.4% 3,571,608
Wednesday Commute Days 3 9,809 67.5% 23,081 2,114,884 66.3% 3,571,608
Thursday Commute Days 4 9,626 66.2% 23,081 2,046,202 64.2% 3,571,608
Friday Commute Days 5 7,829 53.9% 23,081 1,755,828 55.1% 3,571,608
Saturday Commute Days 6 1,874 12.9% 23,081 467,739 14.7% 3,571,608
Sunday Commute Days 7 1,371 9.4% 23,081 359,638 11.3% 3,571,608
None Commute Days 996 1,737 12.0% 23,081 407,236 12.8% 3,571,608
Logic: if employment = full/part/self/volunteer and job_type IS NOT “work only from home” or “drive/bike/travel for work”

ev_subsidies

ev_subsidies
Familiarity rebates/subsidies for purchasing an electric vehicle
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Extremely familiar 902 5.51% 176,407 5.71%
2 Very familiar 1,425 8.70% 256,220 8.29%
3 Moderately familiar 3,350 20.46% 630,927 20.41%
4 Slightly familiar 4,092 25.00% 736,192 23.82%
5 Not at all familiar 6,602 40.33% 1,291,067 41.77%
995 Missing Response 21,245 3,668,798
Total valid 16,371 100.00% 3,090,813 100.00%
Total missing 21,245 3,668,798
Total 37,616 6,759,612
Logic: if rMove or (rMove for Web and person 1)

ev_typical_charge

ev_typical_charge
Electric vehicle charging location
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
At home Ev Typical Charge 1 1,094 85.3% 36,333 206,093 83.8% 6,513,799
At work Ev Typical Charge 2 223 17.4% 36,333 47,221 19.2% 6,513,799
At a commute location (e.g., Park and Ride lot, parking garage) Ev Typical Charge 3 121 9.4% 36,333 24,849 10.1% 6,513,799
At a shopping location (e.g., grocery store, shopping mall) Ev Typical Charge 4 355 27.7% 36,333 69,027 28.1% 6,513,799
At a public location (e.g., hospital, library, government building) Ev Typical Charge 5 265 20.7% 36,333 44,445 18.1% 6,513,799
At a hotel/inn Ev Typical Charge 6 46 3.6% 36,333 7,920 3.2% 6,513,799
Other Ev Typical Charge 997 134 10.4% 36,333 26,749 10.9% 6,513,799
Logic: if fuel type of primary vehicle driven is electric or PHEV

home_vehicle_park

home_vehicle_park
Typical household vehicle parking location
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Home driveway/garage 13,053 79.73% 2,494,556 80.71%
2 Parking lot/garage 1,821 11.12% 306,801 9.93%
3 On-street parking 1,497 9.14% 289,456 9.37%
995 Missing Response 21,245 3,668,798
Total valid 16,371 100.00% 3,090,813 100.00%
Total missing 21,245 3,668,798
Total 37,616 6,759,612
Logic: if household has vehicles and person drives; if rMove or (rMove for Web and person 1)

home_vehicle_park_pay

home_vehicle_park_pay
Pays to park vehicle at home
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 14,782 90.29% 2,837,575 91.81%
1 Yes 1,589 9.71% 253,239 8.19%
995 Missing Response 21,245 3,668,798
Total valid 16,371 100.00% 3,090,813 100.00%
Total missing 21,245 3,668,798
Total 37,616 6,759,612
Logic: if household has vehicles and person drives; if rMove or (rMove for Web and person 1)

home_vehicle_park_permit

home_vehicle_park_permit
Purchased a residential parking pass to park vehicle
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 924 61.72% 194,757 67.28%
1 Yes 573 38.28% 94,698 32.72%
995 Missing Response 36,119 6,470,156
Total valid 1,497 100.00% 289,456 100.00%
Total missing 36,119 6,470,156
Total 37,616 6,759,612
Logic: if household typically parks on-street parking; if rMove or (rMove for Web and person 1)

peerrent_freq

peerrent_freq
Peer-to-peer car rental use frequency
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 4 1.02% 1,316 1.65%
2 5 days a week 8 2.04% 1,984 2.49%
3 4 days a week 7 1.79% 3,064 3.84%
4 3 days a week 4 1.02% 1,593 2.00%
5 2 days a week 8 2.04% 3,969 4.97%
6 1 day a week 5 1.28% 852 1.07%
7 1-3 days a month 13 3.32% 2,005 2.51%
8 Less than monthly 343 87.50% 65,035 81.48%
995 Missing Response 37,224 6,679,793
Total valid 392 100.00% 79,819 100.00%
Total missing 37,224 6,679,793
Total 37,616 6,759,612
Logic: if uses peer-to-peer car rental

telework_days

telework_days
Day teleworked last week
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Monday Telework Days 1 3,336 22.9% 23,072 624,107 19.6% 3,567,862
Tuesday Telework Days 2 2,693 18.5% 23,072 532,572 16.7% 3,567,862
Wednesday Telework Days 3 2,693 18.5% 23,072 523,295 16.4% 3,567,862
Thursday Telework Days 4 2,850 19.6% 23,072 563,824 17.7% 3,567,862
Friday Telework Days 5 4,043 27.8% 23,072 750,186 23.5% 3,567,862
Saturday Telework Days 6 439 3.0% 23,072 93,830 2.9% 3,567,862
Sunday Telework Days 7 368 2.5% 23,072 71,640 2.2% 3,567,862
None Telework Days 996 8,721 60.0% 23,072 2,093,287 65.6% 3,567,862
Logic: if job_type IS NOT “work only from home” or “drive/bike/travel for work”

telework_freq_pre_covid

telework_freq_pre_covid
Days worked from home before March 2020
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 6-7 days a week 527 2.85% 99,004 2.44%
2 5 days a week 1,968 10.65% 437,467 10.77%
3 4 days a week 437 2.36% 108,479 2.67%
4 3 days a week 537 2.91% 116,195 2.86%
5 2 days a week 692 3.74% 141,134 3.48%
6 1 day a week 804 4.35% 158,658 3.91%
7 1-3 days a month 898 4.86% 162,641 4.01%
8 Less than monthly 1,270 6.87% 221,872 5.46%
996 None 11,347 61.40% 2,615,184 64.40%
995 Missing Response 19,136 2,698,977
Total valid 18,480 100.00% 4,060,634 100.00%
Total missing 19,136 2,698,977
Total 37,616 6,759,612
Logic: if job_type IS NOT “work only from home” or “drive/bike/travel for work”

transit_factors

transit_factors
Factor to increase transit use
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Lower cost of transit or free transit pass Transit Factors 1 6,317 24.0% 11,264 1,250,738 23.1% 1,354,325
More reliable transit service Transit Factors 2 8,584 32.6% 11,264 1,577,841 29.2% 1,354,325
More frequent transit service Transit Factors 3 8,506 32.3% 11,264 1,548,646 28.7% 1,354,325
Faster arrival at my destination Transit Factors 4 7,419 28.2% 11,264 1,396,850 25.8% 1,354,325
Transit service provided during different times of the day/week Transit Factors 5 3,241 12.3% 11,264 619,205 11.5% 1,354,325
Transit stops closer to my home/work Transit Factors 6 7,859 29.8% 11,264 1,521,859 28.2% 1,354,325
Higher gas or parking prices Transit Factors 7 1,298 4.9% 11,264 262,769 4.9% 1,354,325
User-friendly transit mobile app Transit Factors 8 2,227 8.5% 11,264 421,477 7.8% 1,354,325
Safer environment in the vehicles Transit Factors 9 1,952 7.4% 11,264 390,700 7.2% 1,354,325
Safer environment at stops and stations Transit Factors 10 2,670 10.1% 11,264 512,366 9.5% 1,354,325
Other Transit Factors 11 1,157 4.4% 11,264 221,092 4.1% 1,354,325
None of the above Transit Factors 12 9,409 35.7% 11,264 2,096,162 38.8% 1,354,325

transit_purpose

transit_purpose
Purpose for using transit in the past 30 days
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
To go to/from grocery/food shopping Transit Purpose 1 1,949 27.7% 30,586 365,210 29.0% 5,502,381
To go to/from other shopping (e.g., pharmacy) Transit Purpose 2 2,215 31.5% 30,586 378,016 30.1% 5,502,381
To go to/from other shopping (e.g., pharmacy) Transit Purpose 3 2,072 29.5% 30,586 374,251 29.8% 5,502,381
To visit friends or relatives Transit Purpose 4 2,813 40.0% 30,586 472,747 37.6% 5,502,381
To go to/from work Transit Purpose 5 3,409 48.5% 30,586 542,391 43.1% 5,502,381
For other work-related reason Transit Purpose 6 788 11.2% 30,586 134,041 10.7% 5,502,381
Other Transit Purpose 7 1,616 23.0% 30,586 294,865 23.5% 5,502,381
Logic: if transit_freq is not never or less than monthly

walk_purpose

walk_purpose
Reason for walking in the past 30 days
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
To go to/from grocery/food shopping Walk Purpose 1 6,099 29.4% 16,838 1,075,270 26.4% 2,692,153
To go to/from other shopping (e.g., pharmacy) Walk Purpose 2 5,861 28.2% 16,838 1,002,666 24.7% 2,692,153
To go to/from medical appointment Walk Purpose 3 2,465 11.9% 16,838 443,452 10.9% 2,692,153
To visit friends or relatives Walk Purpose 4 4,215 20.3% 16,838 778,577 19.1% 2,692,153
To go to/from work Walk Purpose 5 3,240 15.6% 16,838 540,391 13.3% 2,692,153
For other work-related reason Walk Purpose 6 1,451 7.0% 16,838 281,178 6.9% 2,692,153
Other Walk Purpose 7 1,601 7.7% 16,838 309,967 7.6% 2,692,153
For exercise or recreation Walk Purpose 8 18,414 88.6% 16,838 3,524,723 86.7% 2,692,153
Logic: if walk_freq > less than monthly

why_no_bike

why_no_bike
Reasons for not using a bicycle in the past 30 days
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Do not have a personal bicycle or it was not working Why No Bike 1 10,472 49.1% 16,300 2,113,175 47.4% 2,303,613
Other member(s) of my household were using the bicycle Why No Bike 2 193 0.9% 16,300 52,293 1.2% 2,303,613
Monetary cost was too high Why No Bike 3 244 1.1% 16,300 50,627 1.1% 2,303,613
Travel time was too long Why No Bike 4 1,649 7.7% 16,300 326,123 7.3% 2,303,613
Disability Why No Bike 5 1,593 7.5% 16,300 317,736 7.1% 2,303,613
Safety concerns Why No Bike 6 3,958 18.6% 16,300 703,989 15.8% 2,303,613
Weather Why No Bike 7 3,110 14.6% 16,300 646,433 14.5% 2,303,613
Other Why No Bike 8 5,697 26.7% 16,300 1,279,565 28.7% 2,303,613
Logic: if bike_freq = never or less than monthly  

8.3 Day

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 37,817 28.18% 0 0.00%
1 Yes 96,370 71.82% 6,759,612 100.00%
Total valid 134,187 100.00% 6,759,612 100.00%
Total 134,187 6,759,612

num_trips

num_trips
Number of trips
Statistic
Unweighted
Weighted
Value Value
N 134,187.00 6,759,611.78
Min 0.00 0.00
P25 0.00 2.00
Median 2.00 3.00
Mean 3.49 3.56
P75 5.00 5.00
P95 10.00 9.00
Max 68.00 50.00
SD 3.61 2.94

hh_is_complete

hh_is_complete
Household day completion status
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 37,801 28.17% 0 0.00%
1 Yes 96,386 71.83% 6,759,612 100.00%
Total valid 134,187 100.00% 6,759,612 100.00%
Total 134,187 6,759,612

is_participant

is_participant
Active participant (age 18+ and surveyable)
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 22,785 16.98% 1,343,373 19.87%
1 Yes 111,402 83.02% 5,416,239 80.13%
Total valid 134,187 100.00% 6,759,612 100.00%
Total 134,187 6,759,612

begin_day

begin_day
Where participant began their day
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Home 90,957 87.38% 6,330,983 93.66%
2 Someone else’s home 2,012 1.93% 77,276 1.14%
3 Work 863 0.83% 97,971 1.45%
4 Their other home (e.g., other parent, second home) 985 0.95% 91,212 1.35%
5 Traveling (e.g., red-eye flight) 104 0.10% 2,098 0.03%
7 Temporary lodging (e.g., hotel, vacation rental) 1,545 1.48% 54,957 0.81%
997 Other 7,626 7.33% 105,113 1.56%
995 Missing Response 30,095 0
Total valid 104,092 100.00% 6,759,612 100.00%
Total missing 30,095 0
Total 134,187 6,759,612

end_day

end_day
Where participant ended their day
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Home 88,910 85.93% 6,153,682 91.04%
2 Someone else’s home 2,219 2.14% 104,717 1.55%
3 Work 1,174 1.13% 157,317 2.33%
4 Their other home (e.g., other parent, second home) 1,022 0.99% 90,836 1.34%
5 Traveling (e.g., red-eye flight) 133 0.13% 3,092 0.05%
7 Temporary lodging (e.g., hotel, vacation rental) 1,683 1.63% 63,794 0.94%
997 Other 8,329 8.05% 186,174 2.75%
995 Missing Response 30,717 0
Total valid 103,470 100.00% 6,759,612 100.00%
Total missing 30,717 0
Total 134,187 6,759,612

school_daily

school_daily
Student traveled to school
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Yes 10,416 63.54% 942,003 80.68%
2 No 5,976 36.46% 225,552 19.32%
995 Missing Response 117,795 5,592,056
Total valid 16,392 100.00% 1,167,556 100.00%
Total missing 117,795 5,592,056
Total 134,187 6,759,612
Logic: if attends school in -person or daycase at least some of the time

attend_school

attend_school
Traveled to school
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Yes, person traveled to school on date Attend School 1 1,580 76.6% 132,125 368,213 71.5% 6,244,549
Yes, person traveled to another location for school (e.g., friend’s house, parent’s work) on date Attend School 2 32 1.6% 132,125 9,756 1.9% 6,244,549
No, person did not travel to school on date Attend School 3 440 21.3% 132,125 133,478 25.9% 6,244,549
Don’t know Attend School 998 9 0.4% 132,125 3,692 0.7% 6,244,549
Prefer not to answer Attend School 999 21 1.0% 132,125 5,077 1.0% 6,244,549
Logic: if person attends in-person school or daycare at least some of the time AND school was not selected as a purpose on travel day

attend_school_no

attend_school_no
Reason for not attending school
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Person was sick or quarantining Attend School No 1 126 28.6% 133,747 44,002 33.0% 6,626,133
Person attended class online from home Attend School No 2 22 5.0% 133,747 7,478 5.6% 6,626,133
Person attended class online from another location (e.g., friend’s) Attend School No 3 0 0.0% 133,747 0 0.0% 6,626,133
School was scheduled to be closed (e.g., vacation, holiday) Attend School No 4 121 27.5% 133,747 37,793 28.3% 6,626,133
School was closed or adjusted due to weather event (e.g., snow delay) Attend School No 5 15 3.4% 133,747 3,228 2.4% 6,626,133
Other Attend School No 997 142 32.3% 133,747 37,719 28.3% 6,626,133
Don’t know Attend School No 998 2 0.5% 133,747 221 0.2% 6,626,133
Prefer not to answer Attend School No 999 16 3.6% 133,747 4,202 3.1% 6,626,133
Logic: if did not attend school or daycare on travel day

telecommute_time

telecommute_time
Time spent teleworking on travel day (minutes, where 600 = 10+ hours)
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 0 minutes 36,963 60.41% 2,082,982 54.34%
15 15 minutes 173 0.28% 14,690 0.38%
30 30 minutes 452 0.74% 28,667 0.75%
45 45 minutes 273 0.45% 13,484 0.35%
60 1 hour 920 1.50% 63,342 1.65%
75 1 hour, 15 minutes 240 0.39% 15,339 0.40%
90 1 hour, 30 minutes 435 0.71% 32,634 0.85%
105 1 hour, 45 minutes 131 0.21% 8,525 0.22%
120 2 hours 885 1.45% 60,278 1.57%
135 2 hours, 15 minutes 190 0.31% 11,952 0.31%
150 2 hours, 30 minutes 297 0.49% 21,445 0.56%
165 2 hours, 45 minutes 112 0.18% 7,692 0.20%
180 3 hours 635 1.04% 40,453 1.06%
195 3 hours, 15 minutes 144 0.24% 8,487 0.22%
210 3 hours, 30 minutes 233 0.38% 17,737 0.46%
225 3 hours, 45 minutes 102 0.17% 8,045 0.21%
240 4 hours 843 1.38% 62,463 1.63%
255 4 hours, 15 minutes 213 0.35% 15,172 0.40%
270 4 hours, 30 minutes 263 0.43% 13,700 0.36%
285 4 hours, 45 minutes 126 0.21% 7,005 0.18%
300 5 hours 588 0.96% 42,538 1.11%
315 5 hours, 15 minutes 158 0.26% 9,271 0.24%
330 5 hours, 30 minutes 220 0.36% 16,733 0.44%
345 5 hours, 45 minutes 97 0.16% 8,668 0.23%
360 6 hours 715 1.17% 53,002 1.38%
375 6 hours, 15 minutes 148 0.24% 6,662 0.17%
390 6 hours, 30 minutes 340 0.56% 30,945 0.81%
405 6 hours, 45 minutes 136 0.22% 8,701 0.23%
420 7 hours 941 1.54% 66,569 1.74%
435 7 hours, 15 minutes 245 0.40% 20,007 0.52%
450 7 hours, 30 minutes 894 1.46% 63,777 1.66%
465 7 hours, 45 minutes 314 0.51% 22,621 0.59%
480 8 hours 6,435 10.52% 457,377 11.93%
495 8 hours, 15 minutes 927 1.52% 66,071 1.72%
510 8 hours, 30 minutes 1,440 2.35% 119,390 3.11%
525 8 hours, 45 minutes 364 0.59% 28,425 0.74%
540 9 hours 1,340 2.19% 102,783 2.68%
555 9 hours, 15 minutes 252 0.41% 19,765 0.52%
570 9 hours, 30 minutes 271 0.44% 25,250 0.66%
585 9 hours, 45 minutes 67 0.11% 4,377 0.11%
600 10+ hours 1,663 2.72% 125,863 3.28%
NA No value assigned 73,002 2,926,726
Total valid 61,185 100.00% 3,832,886 100.00%
Total missing 73,002 2,926,726
Total 134,187 6,759,612
Logic: if employment = full/part/self/volunteer

delivery

delivery
Type of delivery
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Take-out/prepared food delivered to home Delivery 2 3,719 4.7% 54,542 183,675 5.0% 3,080,918
Someone came to do work at home (e.g., babysitter, housecleaning, lawn) Delivery 3 2,161 2.7% 54,542 134,121 3.6% 3,080,918
Groceries delivered to home Delivery 4 1,770 2.2% 54,542 89,553 2.4% 3,080,918
Received packages at home (e.g., USPS, FedEx, UPS) Delivery 5 22,955 28.8% 54,542 1,263,596 34.3% 3,080,918
Received personal packages at work Delivery 6 284 0.4% 54,542 13,333 0.4% 3,080,918
Received packages at another location (e.g., Amazon Locker, package pick-up point) Delivery 7 1,162 1.5% 54,542 65,506 1.8% 3,080,918
Other item delivered to home (e.g., appliance) Delivery 8 359 0.5% 54,542 23,480 0.6% 3,080,918
None of the above Delivery 996 51,506 64.7% 54,542 2,188,102 59.5% 3,080,918
Logic: if rMove or (rMove for Web and person 1)

made_travel

made_travel
Made trips on travel day
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Yes, person made trips on day 34 14.05% 5,464 11.62%
2 No, person did not go anywhere or make trips on day 179 73.97% 37,029 78.78%
998 Don’t know 14 5.79% 3,056 6.50%
999 Prefer not to answer 15 6.20% 1,455 3.09%
995 Missing Response 133,945 6,712,609
Total valid 242 100.00% 47,003 100.00%
Total missing 133,945 6,712,609
Total 134,187 6,759,612
Logic: if using rMove and has zero trips for the day and did not say they went to school/daycare in attend_school and begin_day = end_day and begin_day is not other

no_travel

no_travel
Reason for no travel on date
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
I did make trips on date No Travel 1 1,825 11.8% 118,777 24,596 2.5% 5,784,720
Not scheduled to work/took day off No Travel 2 1,981 12.9% 118,777 131,043 13.4% 5,784,720
Worked at home for pay (e.g., telework) No Travel 3 3,292 21.4% 118,777 205,801 21.1% 5,784,720
Hung out around home No Travel 4 7,396 48.0% 118,777 419,892 43.1% 5,784,720
Scheduled school closure (e.g., holiday) No Travel 5 32 0.2% 118,777 6,978 0.7% 5,784,720
No available transportation (e.g., no car, no bus) No Travel 6 284 1.8% 118,777 16,633 1.7% 5,784,720
Sick or quarantining (self or others) No Travel 7 1,150 7.5% 118,777 105,358 10.8% 5,784,720
Waited for visitor/delivery (e.g., plumber) No Travel 8 359 2.3% 118,777 20,321 2.1% 5,784,720
Kids did online/remote/home school No Travel 9 104 0.7% 118,777 25,012 2.6% 5,784,720
Weather conditions (e.g., snowstorm) No Travel 11 646 4.2% 118,777 31,984 3.3% 5,784,720
person may have made trips, but I don’t know when or where No Travel 12 9 0.1% 118,777 338 0.0% 5,784,720
Other reason No Travel 99 1,343 8.7% 118,777 145,476 14.9% 5,784,720
Logic: if made zero trips on day

congestion

congestion
Person adjusted travel time to account for congestion
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Yes 6,373 12.49% 580,074 18.11%
2 No 44,656 87.51% 2,623,013 81.89%
995 Missing Response 83,158 3,556,524
Total valid 51,029 100.00% 3,203,088 100.00%
Total missing 83,158 3,556,524
Total 134,187 6,759,612
Logic: if employment = full/part/self/volunteer and job_type IS NOT “work only from home” or “drive/bike/travel for work”

8.4 Vehicle

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 4,079 15.78% 0 0.00%
1 Yes 21,770 84.22% 4,427,881 100.00%
Total valid 25,849 100.00% 4,427,881 100.00%
Total 25,849 4,427,881

make

make
Vehicle make
Value Label
Unweighted
Weighted
Count Percent Count Percent
AMC 2 0.01% 204 0.00%
Acura 327 1.27% 62,015 1.40%
Alfa Romeo 17 0.07% 2,144 0.05%
Audi 467 1.81% 75,830 1.71%
BMW 626 2.42% 114,004 2.57%
Bentley 7 0.03% 2,077 0.05%
Buick 200 0.77% 32,372 0.73%
Cadillac 126 0.49% 24,342 0.55%
Chevrolet 1,410 5.45% 250,217 5.65%
Chrysler 163 0.63% 33,167 0.75%
Dodge 321 1.24% 53,897 1.22%
Ferrari 2 0.01% 189 0.00%
Fiat 31 0.12% 4,617 0.10%
Ford 1,886 7.30% 330,921 7.47%
Freightliner 1 0.00% 321 0.01%
GMC 337 1.30% 53,338 1.20%
Genesis 14 0.05% 2,956 0.07%
Geo 2 0.01% 1,271 0.03%
Honda 3,610 13.97% 633,202 14.30%
Hummer 6 0.02% 1,662 0.04%
Hyundai 1,227 4.75% 194,923 4.40%
Infiniti 109 0.42% 20,850 0.47%
Isuzu 5 0.02% 409 0.01%
Jaguar 20 0.08% 2,099 0.05%
Jeep 1,029 3.98% 172,222 3.89%
Kia 714 2.76% 130,383 2.94%
Lafayette 1 0.00% 1,118 0.03%
Lamborghini 1 0.00% 136 0.00%
Land Rover 53 0.21% 6,965 0.16%
Lexus 497 1.92% 75,714 1.71%
Lincoln 94 0.36% 13,861 0.31%
Lotus 3 0.01% 855 0.02%
Lucid 4 0.02% 1,430 0.03%
MINI 135 0.52% 17,602 0.40%
Maserati 3 0.01% 570 0.01%
Mazda 838 3.24% 132,841 3.00%
Mercedes-Benz 376 1.45% 62,603 1.41%
Mercury 40 0.15% 5,776 0.13%
Mitsubishi 89 0.34% 17,457 0.39%
Nissan 1,308 5.06% 228,550 5.16%
Oldsmobile 4 0.02% 583 0.01%
Opel 1 0.00% 16 0.00%
Other 316 1.22% 60,823 1.37%
Plymouth 1 0.00% 43 0.00%
Polestar 5 0.02% 1,078 0.02%
Pontiac 48 0.19% 8,113 0.18%
Porsche 71 0.27% 9,179 0.21%
Ram 117 0.45% 20,969 0.47%
Rivian 12 0.05% 2,974 0.07%
Rolls Royce 1 0.00% 146 0.00%
Saab 33 0.13% 6,969 0.16%
Saturn 23 0.09% 2,820 0.06%
Scion 29 0.11% 6,239 0.14%
Subaru 2,405 9.30% 379,027 8.56%
Suzuki 23 0.09% 1,943 0.04%
Tesla 354 1.37% 64,440 1.46%
Toyota 5,114 19.78% 888,015 20.06%
Volkswagen 775 3.00% 140,159 3.17%
Volvo 405 1.57% 68,163 1.54%
smart 11 0.04% 1,074 0.02%
Total valid 25,849 100.00% 4,427,881 100.00%
Total 25,849 4,427,881

year

year
Vehicle year
Value Label
Unweighted
Weighted
Count Percent Count Percent
1980 1980 or earlier 149 0.58% 29,593 0.67%
1981 1981 15 0.06% 2,339 0.05%
1982 1982 12 0.05% 2,961 0.07%
1983 1983 10 0.04% 1,887 0.04%
1984 1984 6 0.02% 961 0.02%
1985 1985 4 0.02% 423 0.01%
1986 1986 12 0.05% 1,268 0.03%
1987 1987 19 0.07% 2,509 0.06%
1988 1988 10 0.04% 1,995 0.05%
1989 1989 24 0.09% 3,761 0.08%
1990 1990 22 0.09% 2,922 0.07%
1991 1991 12 0.05% 2,532 0.06%
1992 1992 11 0.04% 1,113 0.03%
1993 1993 20 0.08% 3,975 0.09%
1994 1994 25 0.10% 3,554 0.08%
1995 1995 30 0.12% 4,998 0.11%
1996 1996 22 0.09% 5,035 0.11%
1997 1997 41 0.16% 6,251 0.14%
1998 1998 64 0.25% 8,929 0.20%
1999 1999 89 0.34% 14,061 0.32%
2000 2000 113 0.44% 18,252 0.41%
2001 2001 123 0.48% 18,651 0.42%
2002 2002 156 0.60% 30,941 0.70%
2003 2003 226 0.87% 40,845 0.92%
2004 2004 280 1.08% 49,504 1.12%
2005 2005 335 1.30% 54,163 1.22%
2006 2006 392 1.52% 66,604 1.50%
2007 2007 518 2.00% 92,623 2.09%
2008 2008 608 2.35% 101,414 2.29%
2009 2009 537 2.08% 95,956 2.17%
2010 2010 826 3.20% 128,194 2.90%
2011 2011 901 3.49% 151,340 3.42%
2012 2012 1,085 4.20% 195,852 4.42%
2013 2013 1,271 4.92% 217,805 4.92%
2014 2014 1,429 5.53% 243,425 5.50%
2015 2015 1,645 6.36% 284,814 6.43%
2016 2016 1,768 6.84% 307,108 6.94%
2017 2017 1,888 7.30% 316,643 7.15%
2018 2018 1,838 7.11% 317,770 7.18%
2019 2019 1,817 7.03% 313,589 7.08%
2020 2020 1,456 5.63% 242,033 5.47%
2021 2021 1,517 5.87% 270,814 6.12%
2022 2022 1,392 5.39% 226,936 5.13%
2023 2023 1,446 5.59% 253,377 5.72%
2024 2024 1,393 5.39% 242,014 5.47%
2025 2025 290 1.12% 45,873 1.04%
2026 2026 2 0.01% 273 0.01%
Total valid 25,849 100.00% 4,427,881 100.00%
Total 25,849 4,427,881

fuel_type

fuel_type
Vehicle fuel type
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Gas 22,758 88.04% 3,919,833 88.53%
2 Hybrid (HEV) 1,587 6.14% 249,862 5.64%
3 Plug-in hybrid (PHEV) 429 1.66% 66,937 1.51%
4 Electric (EV) 838 3.24% 150,449 3.40%
5 Diesel 178 0.69% 29,326 0.66%
6 Flex fuel (FFV) 41 0.16% 8,128 0.18%
7 Other (e.g., natural gas, bio-diesel) 14 0.05% 1,917 0.04%
8 Fuel cell electric vehicle (FCEV) 4 0.02% 1,430 0.03%
Total valid 25,849 100.00% 4,427,881 100.00%
Total 25,849 4,427,881

vehicle_ownership

vehicle_ownership
Vehicle ownership status
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Fully owned (not making payments) 18,754 72.55% 3,179,327 71.80%
2 Owned (making payments) 5,873 22.72% 1,026,912 23.19%
3 Leased 946 3.66% 157,106 3.55%
4 Employer provided 145 0.56% 34,203 0.77%
5 Unsure 86 0.33% 19,615 0.44%
997 Other 45 0.17% 10,718 0.24%
Total valid 25,849 100.00% 4,427,881 100.00%
Total 25,849 4,427,881

transponder

transponder
Vehicle has a toll transponder
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 6,569 25.41% 1,189,817 26.87%
1 Yes 19,280 74.59% 3,238,064 73.13%
Total valid 25,849 100.00% 4,427,881 100.00%
Total 25,849 4,427,881

8.5 Location

8.6 Unlinked Trip

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 56,445 12.06% 0 0.00%
1 Yes 411,573 87.94% 30,078,667 100.00%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667

hh_is_complete

hh_is_complete
Household day completion status
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 56,403 12.05% 0 0.00%
1 Yes 411,615 87.95% 30,078,667 100.00%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667

day_is_complete

day_is_complete
Day survey completion status
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 57,329 12.25% 0 0.00%
1 Yes 410,689 87.75% 30,078,667 100.00%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667

o_state

o_state
Origin– State
Value Label
Unweighted
Weighted
Count Percent Count Percent
09 Connecticut 3,054 0.65% 121,909 0.41%
25 Massachusetts 441,684 94.37% 29,136,155 96.87%
33 New Hampshire 6,070 1.30% 248,801 0.83%
36 New York 2,681 0.57% 84,119 0.28%
44 Rhode Island 3,833 0.82% 171,068 0.57%
50 Vermont 1,225 0.26% 39,803 0.13%
None 9,471 2.02% 276,811 0.92%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667
Logic: if state borders MA

d_state

d_state
Destination– State
Value Label
Unweighted
Weighted
Count Percent Count Percent
09 Connecticut 3,062 0.65% 117,353 0.39%
25 Massachusetts 441,295 94.29% 29,132,220 96.85%
33 New Hampshire 6,104 1.30% 238,671 0.79%
36 New York 2,719 0.58% 83,483 0.28%
44 Rhode Island 3,847 0.82% 165,287 0.55%
50 Vermont 1,233 0.26% 40,226 0.13%
None 9,758 2.08% 301,427 1.00%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667
Logic: if state borders MA

mode_1

mode_1
Trip mode 1
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walk (or jog/wheelchair) 80,801 18.79% 3,846,508 13.44%
2 Standard bicycle (my household’s) 5,537 1.29% 265,539 0.93%
3 Borrowed bicycle (e.g., a friend’s) 49 0.01% 1,308 0.00%
4 Other rented bicycle 52 0.01% 832 0.00%
5 Other 2,261 0.53% 255,011 0.89%
6 Household vehicle 1 194,017 45.11% 13,099,797 45.76%
7 Household vehicle 2 68,231 15.86% 5,869,061 20.50%
8 Household vehicle 3 9,049 2.10% 1,041,681 3.64%
9 Household vehicle 4 1,690 0.39% 247,470 0.86%
10 Household vehicle 5 325 0.08% 47,077 0.16%
11 Household vehicle 6 24 0.01% 6,460 0.02%
12 Household vehicle 7 22 0.01% 378 0.00%
13 Household vehicle 8 2 0.00% 0 0.00%
16 Other vehicle in household 3,702 0.86% 366,635 1.28%
17 Rental car 2,325 0.54% 95,841 0.33%
18 Carshare service (e.g., Zipcar) 282 0.07% 14,341 0.05%
21 Vanpool 53 0.01% 7,007 0.02%
22 Other vehicle (not my household’s) 1,805 0.42% 96,634 0.34%
23 Local bus 6,011 1.40% 294,762 1.03%
24 School bus 7,335 1.71% 521,043 1.82%
25 Intercity bus (e.g., Greyhound) 41 0.01% 1,592 0.01%
26 Other private shuttle/bus (e.g., a hotel’s, an airport’s) 362 0.08% 11,969 0.04%
27 Medical transportation service 378 0.09% 57,801 0.20%
28 Other bus 138 0.03% 8,203 0.03%
30 Subway 7,669 1.78% 300,955 1.05%
31 Airplane/helicopter 679 0.16% 37,701 0.13%
33 Car from work 6,293 1.46% 422,522 1.48%
34 Friend/relative/colleague’s car 18,101 4.21% 912,685 3.19%
36 Regular taxi (e.g., Yellow Cab) 254 0.06% 40,321 0.14%
38 University/college shuttle/bus 269 0.06% 23,868 0.08%
39 Light rail/trolley 1,232 0.29% 52,316 0.18%
41 Intercity rail (e.g., Amtrak) 172 0.04% 7,563 0.03%
42 Other rail 54 0.01% 3,486 0.01%
43 Skateboard or rollerblade 72 0.02% 15,419 0.05%
44 Golf cart 237 0.06% 4,987 0.02%
45 ATV 16 0.00% 991 0.00%
47 Other motorcycle in household 221 0.05% 12,950 0.05%
49 Uber, Lyft, or other smartphone-app ride service 3,742 0.87% 285,261 1.00%
54 Other motorcycle (not my household’s) 37 0.01% 2,914 0.01%
55 Express/commuter bus 206 0.05% 13,997 0.05%
56 Other personal bicycle (e.g., cargo, tandem, etc.) 266 0.06% 10,524 0.04%
58 Commuter rail 1,884 0.44% 111,181 0.39%
59 Peer-to-peer car rental (e.g., Turo) 34 0.01% 355 0.00%
60 Other hired car service (e.g., black car, limo) 146 0.03% 18,870 0.07%
61 Rapid transit bus (BRT) 144 0.03% 5,938 0.02%
62 Employer-provided shuttle/bus 391 0.09% 29,514 0.10%
69 Bike-share - standard bicycle 616 0.14% 26,780 0.09%
70 Bike-share - electric bicycle 200 0.05% 6,195 0.02%
74 Segway 1 0.00% 67 0.00%
75 Other micromobility device 37 0.01% 502 0.00%
76 Carpool match (e.g., Waze Carpool) 63 0.01% 4,735 0.02%
77 Personal scooter or moped (not shared) 372 0.09% 18,940 0.07%
78 Other public ferry or water taxi 228 0.05% 11,557 0.04%
79 Vehicle ferry (took vehicle on board) 74 0.02% 12,938 0.05%
80 Other boat (e.g., kayak) 352 0.08% 3,151 0.01%
81 Snowmobile 5 0.00% 38 0.00%
82 Electric bicycle (my household’s) 1,517 0.35% 66,776 0.23%
83 Scooter-share (e.g., Bird, Lime) 5 0.00% 510 0.00%
200 Paratransit/Dial-A-Ride (e.g., The RIDE) 36 0.01% 3,960 0.01%
995 Missing Response 37,901 1,451,252
Total valid 430,117 100.00% 28,627,414 100.00%
Total missing 37,901 1,451,252
Total 468,018 30,078,667

mode_2

mode_2
Trip mode 2
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walk (or jog/wheelchair) 7,949 49.07% 366,657 42.19%
2 Standard bicycle (my household’s) 139 0.86% 3,267 0.38%
3 Borrowed bicycle (e.g., a friend’s) 4 0.02% 466 0.05%
4 Other rented bicycle 6 0.04% 866 0.10%
5 Other 236 1.46% 8,228 0.95%
6 Household vehicle 1 1,072 6.62% 89,998 10.35%
7 Household vehicle 2 552 3.41% 41,279 4.75%
8 Household vehicle 3 33 0.20% 5,563 0.64%
9 Household vehicle 4 7 0.04% 67 0.01%
10 Household vehicle 5 5 0.03% 166 0.02%
16 Other vehicle in household 31 0.19% 6,850 0.79%
17 Rental car 57 0.35% 3,201 0.37%
18 Carshare service (e.g., Zipcar) 10 0.06% 2,802 0.32%
21 Vanpool 33 0.20% 2,046 0.24%
22 Other vehicle (not my household’s) 157 0.97% 15,035 1.73%
23 Local bus 446 2.75% 52,124 6.00%
24 School bus 204 1.26% 50,492 5.81%
25 Intercity bus (e.g., Greyhound) 31 0.19% 844 0.10%
26 Other private shuttle/bus (e.g., a hotel’s, an airport’s) 28 0.17% 5,839 0.67%
27 Medical transportation service 30 0.19% 2,136 0.25%
28 Other bus 131 0.81% 4,196 0.48%
30 Subway 2,169 13.39% 71,381 8.21%
31 Airplane/helicopter 33 0.20% 2,730 0.31%
33 Car from work 31 0.19% 10,675 1.23%
34 Friend/relative/colleague’s car 484 2.99% 26,065 3.00%
36 Regular taxi (e.g., Yellow Cab) 10 0.06% 4,013 0.46%
38 University/college shuttle/bus 116 0.72% 7,594 0.87%
39 Light rail/trolley 152 0.94% 4,364 0.50%
41 Intercity rail (e.g., Amtrak) 13 0.08% 447 0.05%
42 Other rail 74 0.46% 3,423 0.39%
43 Skateboard or rollerblade 4 0.02% 850 0.10%
45 ATV 1 0.01% 0 0.00%
47 Other motorcycle in household 14 0.09% 957 0.11%
49 Uber, Lyft, or other smartphone-app ride service 149 0.92% 6,467 0.74%
54 Other motorcycle (not my household’s) 2 0.01% 50 0.01%
55 Express/commuter bus 204 1.26% 8,937 1.03%
56 Other personal bicycle (e.g., cargo, tandem, etc.) 28 0.17% 182 0.02%
58 Commuter rail 1,091 6.73% 30,602 3.52%
59 Peer-to-peer car rental (e.g., Turo) 13 0.08% 2,201 0.25%
60 Other hired car service (e.g., black car, limo) 12 0.07% 617 0.07%
61 Rapid transit bus (BRT) 24 0.15% 2,449 0.28%
62 Employer-provided shuttle/bus 218 1.35% 12,054 1.39%
69 Bike-share - standard bicycle 63 0.39% 1,471 0.17%
70 Bike-share - electric bicycle 15 0.09% 111 0.01%
75 Other micromobility device 8 0.05% 29 0.00%
76 Carpool match (e.g., Waze Carpool) 7 0.04% 528 0.06%
77 Personal scooter or moped (not shared) 2 0.01% 7 0.00%
78 Other public ferry or water taxi 31 0.19% 1,954 0.22%
79 Vehicle ferry (took vehicle on board) 4 0.02% 0 0.00%
80 Other boat (e.g., kayak) 15 0.09% 166 0.02%
82 Electric bicycle (my household’s) 16 0.10% 260 0.03%
200 Paratransit/Dial-A-Ride (e.g., The RIDE) 36 0.22% 6,441 0.74%
995 Missing Response 451,818 29,209,522
Total valid 16,200 100.00% 869,145 100.00%
Total missing 451,818 29,209,522
Total 468,018 30,078,667

mode_3

mode_3
Trip mode 3
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walk (or jog/wheelchair) 2,085 67.02% 88,986 56.75%
2 Standard bicycle (my household’s) 10 0.32% 35 0.02%
4 Other rented bicycle 2 0.06% 171 0.11%
5 Other 21 0.68% 499 0.32%
6 Household vehicle 1 128 4.11% 13,993 8.92%
7 Household vehicle 2 105 3.38% 8,823 5.63%
8 Household vehicle 3 6 0.19% 34 0.02%
16 Other vehicle in household 14 0.45% 1,794 1.14%
17 Rental car 3 0.10% 1,392 0.89%
18 Carshare service (e.g., Zipcar) 2 0.06% 855 0.55%
21 Vanpool 5 0.16% 68 0.04%
22 Other vehicle (not my household’s) 18 0.58% 1,183 0.75%
23 Local bus 32 1.03% 7,889 5.03%
25 Intercity bus (e.g., Greyhound) 8 0.26% 311 0.20%
26 Other private shuttle/bus (e.g., a hotel’s, an airport’s) 4 0.13% 1,455 0.93%
27 Medical transportation service 1 0.03% 76 0.05%
28 Other bus 14 0.45% 1,707 1.09%
30 Subway 82 2.64% 5,116 3.26%
31 Airplane/helicopter 14 0.45% 162 0.10%
33 Car from work 8 0.26% 225 0.14%
34 Friend/relative/colleague’s car 110 3.54% 4,450 2.84%
36 Regular taxi (e.g., Yellow Cab) 4 0.13% 53 0.03%
38 University/college shuttle/bus 37 1.19% 1,835 1.17%
39 Light rail/trolley 28 0.90% 436 0.28%
42 Other rail 13 0.42% 39 0.02%
45 ATV 1 0.03% 0 0.00%
47 Other motorcycle in household 4 0.13% 144 0.09%
49 Uber, Lyft, or other smartphone-app ride service 19 0.61% 2,968 1.89%
54 Other motorcycle (not my household’s) 1 0.03% 54 0.03%
55 Express/commuter bus 50 1.61% 2,058 1.31%
58 Commuter rail 139 4.47% 2,103 1.34%
59 Peer-to-peer car rental (e.g., Turo) 3 0.10% 819 0.52%
60 Other hired car service (e.g., black car, limo) 3 0.10% 1,993 1.27%
61 Rapid transit bus (BRT) 6 0.19% 2,249 1.43%
62 Employer-provided shuttle/bus 88 2.83% 1,780 1.14%
69 Bike-share - standard bicycle 2 0.06% 0 0.00%
70 Bike-share - electric bicycle 21 0.68% 217 0.14%
74 Segway 3 0.10% 360 0.23%
75 Other micromobility device 3 0.10% 7 0.00%
76 Carpool match (e.g., Waze Carpool) 6 0.19% 435 0.28%
78 Other public ferry or water taxi 6 0.19% 19 0.01%
83 Scooter-share (e.g., Bird, Lime) 1 0.03% 0 0.00%
200 Paratransit/Dial-A-Ride (e.g., The RIDE) 1 0.03% 14 0.01%
995 Missing Response 464,907 29,921,857
Total valid 3,111 100.00% 156,809 100.00%
Total missing 464,907 29,921,857
Total 468,018 30,078,667

mode_4

mode_4
Trip mode 4
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walk (or jog/wheelchair) 299 69.53% 8,536 32.01%
4 Other rented bicycle 1 0.23% 0 0.00%
5 Other 1 0.23% 0 0.00%
6 Household vehicle 1 3 0.70% 2,037 7.64%
7 Household vehicle 2 13 3.02% 3,974 14.90%
8 Household vehicle 3 1 0.23% 1,502 5.63%
16 Other vehicle in household 2 0.47% 1,396 5.23%
21 Vanpool 1 0.23% 0 0.00%
22 Other vehicle (not my household’s) 1 0.23% 0 0.00%
23 Local bus 3 0.70% 3,066 11.50%
24 School bus 2 0.47% 93 0.35%
30 Subway 1 0.23% 26 0.10%
31 Airplane/helicopter 9 2.09% 40 0.15%
34 Friend/relative/colleague’s car 14 3.26% 547 2.05%
36 Regular taxi (e.g., Yellow Cab) 1 0.23% 1,284 4.81%
38 University/college shuttle/bus 7 1.63% 71 0.27%
39 Light rail/trolley 2 0.47% 37 0.14%
42 Other rail 1 0.23% 21 0.08%
43 Skateboard or rollerblade 7 1.63% 200 0.75%
49 Uber, Lyft, or other smartphone-app ride service 4 0.93% 1,317 4.94%
55 Express/commuter bus 3 0.70% 182 0.68%
56 Other personal bicycle (e.g., cargo, tandem, etc.) 1 0.23% 23 0.09%
58 Commuter rail 20 4.65% 832 3.12%
60 Other hired car service (e.g., black car, limo) 4 0.93% 15 0.06%
62 Employer-provided shuttle/bus 3 0.70% 22 0.08%
70 Bike-share - electric bicycle 1 0.23% 0 0.00%
76 Carpool match (e.g., Waze Carpool) 18 4.19% 1,034 3.88%
78 Other public ferry or water taxi 4 0.93% 53 0.20%
83 Scooter-share (e.g., Bird, Lime) 3 0.70% 360 1.35%
995 Missing Response 467,588 30,051,997
Total valid 430 100.00% 26,670 100.00%
Total missing 467,588 30,051,997
Total 468,018 30,078,667

transit_egress

transit_egress
Mode used to leave transit stop
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walked (or jogged/wheelchair) 34,864 76.05% 1,889,560 72.72%
2 Bicycle 466 1.02% 43,954 1.69%
3 Transferred to another bus 3,710 8.09% 229,396 8.83%
4 Micromobility (e.g., scooter, moped, skateboard) 119 0.26% 3,600 0.14%
5 Transferred to other transit (e.g., rail, air) 3,753 8.19% 211,727 8.15%
6 Uber/Lyft, taxi, or car service 159 0.35% 9,390 0.36%
7 Drove my own household’s vehicle (or motorcycle) 1,081 2.36% 59,567 2.29%
8 Drove another vehicle (or motorcycle) 152 0.33% 5,958 0.23%
9 Got picked up in my own household’s vehicle (or motorcycle) 453 0.99% 44,618 1.72%
10 Got picked up in another vehicle (or motorcycle) 341 0.74% 18,666 0.72%
997 Other 746 1.63% 82,064 3.16%
995 Missing Response 422,174 27,480,167
Total valid 45,844 100.00% 2,598,499 100.00%
Total missing 422,174 27,480,167
Total 468,018 30,078,667
Logic: if mode = bus or rail

transit_access

transit_access
Mode used to access transit stop
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Walked (or jogged/wheelchair) 34,764 75.83% 1,895,901 72.96%
2 Bicycle 551 1.20% 49,890 1.92%
3 Transferred from another bus 3,586 7.82% 225,998 8.70%
4 Micromobility (e.g., scooter, moped, skateboard) 121 0.26% 2,447 0.09%
5 Transferred from other transit (e.g., rail, air) 3,253 7.10% 130,975 5.04%
6 Uber/Lyft, taxi, or car service 187 0.41% 10,477 0.40%
7 Drove and parked my own household’s vehicle (or motorcycle) 1,413 3.08% 96,979 3.73%
8 Drove and parked another vehicle (or motorcycle) 195 0.43% 6,678 0.26%
9 Got dropped off in my own household’s vehicle (or motorcycle) 605 1.32% 71,714 2.76%
10 Got dropped off in another vehicle (or motorcycle) 451 0.98% 20,358 0.78%
997 Other 721 1.57% 87,082 3.35%
995 Missing Response 422,171 27,480,167
Total valid 45,847 100.00% 2,598,499 100.00%
Total missing 422,171 27,480,167
Total 468,018 30,078,667
Logic: if mode = bus or rail

ev_charge_station

ev_charge_station
Electric vehicle charging stations at stop
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 No 6,580 57.65% 390,097 53.46%
2 Yes, and I did NOT charge the vehicle here before my next trip 2,981 26.12% 211,432 28.98%
3 Yes, and I charged the vehicle here before my next trip 1,565 13.71% 105,143 14.41%
998 Don’t know 288 2.52% 22,967 3.15%
995 Missing Response 456,604 29,349,027
Total valid 11,414 100.00% 729,640 100.00%
Total missing 456,604 29,349,027
Total 468,018 30,078,667
Logic: if used household electric vehicle on trip

ev_charge_station_level

ev_charge_station_level
Charge station level
Option Variable
Unweighted
Weighted
Selected Percent Missing Selected Percent Missing
Level 1 (2-5 miles of range per 1 hour of charging) Ev Charge Station Level 1 978 21.5% 463,472 59,955 18.9% 29,762,091
Level 2 (10-20 miles of range per 1 hour of charging) Ev Charge Station Level 2 3,178 69.9% 463,472 217,500 68.7% 29,762,091
Level 3/DC Fast (60+ miles of range per 1 hour of charging) Ev Charge Station Level 3 276 6.1% 463,472 16,827 5.3% 29,762,091
Don’t know Ev Charge Station Level 998 226 5.0% 463,472 31,673 10.0% 29,762,091
Logic: if EV charge stations were at destination

ev_charge_station_decision

ev_charge_station_decision
Electric vehicle charging stations influenced decision to stop here
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Agree 209 56.18% 5,894 36.38%
2 Neutral 77 20.70% 4,613 28.47%
3 Disagree 86 23.12% 5,697 35.16%
995 Missing Response 467,646 30,062,463
Total valid 372 100.00% 16,204 100.00%
Total missing 467,646 30,062,463
Total 468,018 30,078,667
Logic: if used EV charge station at destination and destination is not home/work/school location

o_purpose_category

o_purpose_category
Origin purpose category
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,100 0.88% 3,542 0.01%
1 Home 143,857 30.81% 10,302,583 34.75%
2 Work 28,035 6.00% 2,781,172 9.38%
3 Work related 23,911 5.12% 1,248,352 4.21%
4 School 14,890 3.19% 1,436,866 4.85%
5 School related 2,508 0.54% 160,014 0.54%
6 Escort 29,130 6.24% 2,668,397 9.00%
7 Shop 46,898 10.04% 2,790,710 9.41%
8 Meal 34,126 7.31% 1,510,383 5.10%
9 Social recreation 55,188 11.82% 2,248,138 7.58%
10 Errand 23,948 5.13% 1,749,146 5.90%
11 Change mode 38,956 8.34% 1,768,184 5.96%
12 Overnight 15,835 3.39% 546,346 1.84%
13 Other 5,512 1.18% 429,660 1.45%
995 Missing Response 1,124 435,173
Total valid 466,894 100.00% 29,643,494 100.00%
Total missing 1,124 435,173
Total 468,018 30,078,667

o_purpose_category_reported

o_purpose_category_reported
Reported Origin purpose category
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Home 94,081 24.06% 4,194,035 18.93%
2 Work 26,098 6.67% 2,562,067 11.56%
3 Work related 23,114 5.91% 1,323,044 5.97%
4 School 5,786 1.48% 1,021,250 4.61%
5 School related 743 0.19% 46,364 0.21%
6 Escort 30,972 7.92% 2,814,054 12.70%
7 Shop 43,951 11.24% 2,793,144 12.60%
8 Meal 32,443 8.30% 1,529,340 6.90%
9 Social recreation 53,574 13.70% 2,309,221 10.42%
10 Errand 22,669 5.80% 1,756,065 7.92%
11 Change mode 21,638 5.53% 582,843 2.63%
12 Overnight 14,968 3.83% 551,387 2.49%
13 Other 20,952 5.36% 677,175 3.06%
995 Missing Response 49,740 2,259,010
NA No value assigned 27,289 5,659,666
Total valid 390,989 100.00% 22,159,990 100.00%
Total missing 77,029 7,918,677
Total 468,018 30,078,667

o_purpose

o_purpose
Origin purpose
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,100 0.88% 3,542 0.01%
1 Went home 143,493 30.73% 10,273,842 34.66%
10 Went to primary workplace 27,960 5.99% 2,759,190 9.31%
11 Went to work-related activity (e.g., meeting, delivery, worksite) 20,168 4.32% 1,045,138 3.53%
13 Volunteering 1,108 0.24% 69,623 0.23%
14 Other work-related 2,041 0.44% 92,839 0.31%
21 Attend K-12 school 14,311 3.07% 1,333,066 4.50%
22 Attend college/university 216 0.05% 30,375 0.10%
23 Attend other type of class (e.g., cooking class) 187 0.04% 49,633 0.17%
24 Attend other education-related activity (e.g., field trip) 2,367 0.51% 149,243 0.50%
25 Attend vocational education class 6 0.00% 2,581 0.01%
26 Attend daycare or preschool 122 0.03% 12,564 0.04%
30 Grocery shopping 22,057 4.72% 1,422,666 4.80%
31 Got gas 5,191 1.11% 284,795 0.96%
32 Other routine shopping (e.g., pharmacy) 17,607 3.77% 948,339 3.20%
33 Errand without appointment (e.g., post office) 10,359 2.22% 606,781 2.05%
34 Medical visit (e.g., doctor, dentist) 6,559 1.40% 625,642 2.11%
36 Shopping for major item (e.g., furniture, car) 1,120 0.24% 47,210 0.16%
37 Errand with appointment (e.g., haircut) 3,156 0.68% 187,653 0.63%
44 Other activity only (e.g., attend meeting, pick-up or drop-off item) 2,089 0.45% 169,112 0.57%
45 Pick someone up 11,171 2.39% 975,065 3.29%
46 Drop someone off 12,813 2.74% 1,162,883 3.92%
47 Accompany someone only (e.g., go along for the ride) 1,207 0.26% 186,865 0.63%
48 BOTH pick up AND drop off 1,162 0.25% 110,499 0.37%
50 Dined out, got coffee, or take-out 33,739 7.23% 1,492,947 5.04%
51 Exercise or recreation (e.g., gym, jog, bike, walk dog) 26,615 5.70% 1,213,552 4.09%
52 Social activity (e.g., visit friends/relatives) 9,516 2.04% 373,796 1.26%
53 Leisure/entertainment/cultural (e.g., cinema, museum, park) 11,167 2.39% 340,058 1.15%
54 Religious/civic/volunteer activity 2,735 0.59% 109,375 0.37%
56 Family activity (e.g., watch child’s game) 3,210 0.69% 109,401 0.37%
60 Changed or transferred mode (e.g., waited for bus or exited bus) 37,939 8.13% 1,740,310 5.87%
61 Other errand 599 0.13% 199,101 0.67%
62 Other leisure 162 0.03% 49,568 0.17%
99 Other reason 14,865 3.18% 921,839 3.11%
150 Went to another residence (e.g., someone else’s home, second home) 13,394 2.87% 487,580 1.64%
152 Went to temporary lodging (e.g., hotel, vacation rental) 2,382 0.51% 54,355 0.18%
995 Missing Response 1,125 437,638
Total valid 466,893 100.00% 29,641,028 100.00%
Total missing 1,125 437,638
Total 468,018 30,078,667

d_purpose_category

d_purpose_category
Destination purpose category
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,155 0.89% 2,715 0.01%
1 Home 143,208 30.60% 10,294,853 34.23%
2 Work 27,914 5.96% 2,869,092 9.54%
3 Work related 24,260 5.18% 1,276,062 4.24%
4 School 15,041 3.21% 1,396,923 4.64%
5 School related 2,612 0.56% 163,089 0.54%
6 Escort 29,844 6.38% 2,743,198 9.12%
7 Shop 47,233 10.09% 2,825,102 9.39%
8 Meal 34,457 7.36% 1,550,718 5.16%
9 Social recreation 55,836 11.93% 2,327,498 7.74%
10 Errand 24,226 5.18% 1,772,820 5.89%
11 Change mode 38,973 8.33% 1,774,445 5.90%
12 Overnight 16,926 3.62% 686,303 2.28%
13 Other 3,333 0.71% 395,846 1.32%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667

d_purpose

d_purpose
Destination purpose
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,155 0.89% 2,715 0.01%
1 Went home 142,791 30.51% 10,261,722 34.12%
10 Went to primary workplace 27,837 5.95% 2,843,592 9.45%
11 Went to work-related activity (e.g., meeting, delivery, worksite) 20,496 4.38% 1,071,417 3.56%
13 Volunteering 1,114 0.24% 70,714 0.24%
14 Other work-related 2,045 0.44% 94,131 0.31%
21 Attend K-12 school 14,522 3.10% 1,305,384 4.34%
22 Attend college/university 186 0.04% 29,521 0.10%
23 Attend other type of class (e.g., cooking class) 192 0.04% 41,442 0.14%
24 Attend other education-related activity (e.g., field trip) 2,468 0.53% 153,143 0.51%
25 Attend vocational education class 5 0.00% 1,868 0.01%
26 Attend daycare or preschool 85 0.02% 11,090 0.04%
30 Grocery shopping 22,211 4.75% 1,441,348 4.79%
31 Got gas 5,216 1.11% 286,933 0.95%
32 Other routine shopping (e.g., pharmacy) 17,721 3.79% 957,118 3.18%
33 Errand without appointment (e.g., post office) 10,433 2.23% 607,594 2.02%
34 Medical visit (e.g., doctor, dentist) 6,658 1.42% 643,020 2.14%
36 Shopping for major item (e.g., furniture, car) 1,154 0.25% 49,969 0.17%
37 Errand with appointment (e.g., haircut) 3,202 0.68% 189,561 0.63%
44 Other activity only (e.g., attend meeting, pick-up or drop-off item) 2,119 0.45% 173,752 0.58%
45 Pick someone up 10,917 2.33% 971,495 3.23%
46 Drop someone off 13,661 2.92% 1,213,381 4.03%
47 Accompany someone only (e.g., go along for the ride) 1,258 0.27% 199,477 0.66%
48 BOTH pick up AND drop off 1,181 0.25% 118,874 0.40%
50 Dined out, got coffee, or take-out 34,064 7.28% 1,533,230 5.10%
51 Exercise or recreation (e.g., gym, jog, bike, walk dog) 26,826 5.73% 1,230,149 4.09%
52 Social activity (e.g., visit friends/relatives) 9,705 2.07% 395,686 1.32%
53 Leisure/entertainment/cultural (e.g., cinema, museum, park) 11,297 2.41% 361,574 1.20%
54 Religious/civic/volunteer activity 2,760 0.59% 112,379 0.37%
56 Family activity (e.g., watch child’s game) 3,271 0.70% 118,416 0.39%
60 Changed or transferred mode (e.g., waited for bus or exited bus) 37,943 8.11% 1,745,414 5.80%
61 Other errand 604 0.13% 200,856 0.67%
62 Other leisure 171 0.04% 52,952 0.18%
99 Other reason 12,893 2.75% 908,620 3.02%
150 Went to another residence (e.g., someone else’s home, second home) 14,214 3.04% 599,317 1.99%
152 Went to temporary lodging (e.g., hotel, vacation rental) 2,643 0.56% 80,814 0.27%
Total valid 468,018 100.00% 30,078,667 100.00%
Total 468,018 30,078,667

d_purpose_reported

d_purpose_reported
Reported destination purpose
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Went home 114,799 27.45% 9,069,557 32.60%
10 Went to primary workplace 25,944 6.20% 2,679,546 9.63%
11 Went to work-related activity (e.g., meeting, delivery, worksite) 19,588 4.68% 1,151,720 4.14%
13 Volunteering 1,136 0.27% 77,973 0.28%
14 Other work-related 2,045 0.49% 102,658 0.37%
21 Attend K-12 school 2,130 0.51% 640,377 2.30%
22 Attend college/university 1,899 0.45% 195,703 0.70%
23 Attend other type of class (e.g., cooking class) 1,195 0.29% 82,308 0.30%
24 Attend other education-related activity (e.g., field trip) 625 0.15% 36,815 0.13%
25 Attend vocational education class 49 0.01% 17,711 0.06%
26 Attend daycare or preschool 484 0.12% 126,576 0.45%
30 Grocery shopping 21,280 5.09% 1,459,042 5.24%
31 Got gas 4,783 1.14% 275,631 0.99%
32 Other routine shopping (e.g., pharmacy) 16,232 3.88% 943,451 3.39%
33 Errand without appointment (e.g., post office) 9,783 2.34% 603,161 2.17%
34 Medical visit (e.g., doctor, dentist) 6,345 1.52% 645,755 2.32%
36 Shopping for major item (e.g., furniture, car) 1,003 0.24% 46,317 0.17%
37 Errand with appointment (e.g., haircut) 2,987 0.71% 189,394 0.68%
44 Other activity only (e.g., attend meeting, pick-up or drop-off item) 1,874 0.45% 173,469 0.62%
45 Pick someone up 11,085 2.65% 996,358 3.58%
46 Drop someone off 15,411 3.68% 1,295,856 4.66%
47 Accompany someone only (e.g., go along for the ride) 1,257 0.30% 206,387 0.74%
48 BOTH pick up AND drop off 1,178 0.28% 120,799 0.43%
50 Dined out, got coffee, or take-out 32,313 7.73% 1,559,848 5.61%
51 Exercise or recreation (e.g., gym, jog, bike, walk dog) 28,311 6.77% 1,347,052 4.84%
52 Social activity (e.g., visit friends/relatives) 8,455 2.02% 382,320 1.37%
53 Leisure/entertainment/cultural (e.g., cinema, museum, park) 10,148 2.43% 330,813 1.19%
54 Religious/civic/volunteer activity 2,608 0.62% 115,950 0.42%
56 Family activity (e.g., watch child’s game) 2,825 0.68% 107,775 0.39%
60 Changed or transferred mode (e.g., waited for bus or exited bus) 20,715 4.95% 553,550 1.99%
61 Other errand 4,917 1.18% 323,054 1.16%
62 Other leisure 1,775 0.42% 87,674 0.32%
99 Other reason 27,265 6.52% 1,192,208 4.29%
150 Went to another residence (e.g., someone else’s home, second home) 13,497 3.23% 604,034 2.17%
152 Went to temporary lodging (e.g., hotel, vacation rental) 2,337 0.56% 79,193 0.28%
995 Missing Response 49,740 2,258,632
Total valid 418,278 100.00% 27,820,035 100.00%
Total missing 49,740 2,258,632
Total 468,018 30,078,667

bike_park_loc

bike_park_loc
Bicycle parking location
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Inside house/apartment (includes garage, porch, storage area) 2,616 30.65% 127,424 32.75%
2 Bike rack 2,038 23.88% 93,766 24.10%
3 Bike locker 118 1.38% 3,410 0.88%
4 Secured bike room 421 4.93% 14,418 3.71%
5 Locked to other object (e.g., post, tree) 771 9.03% 37,603 9.67%
6 Bike-share designated docking station 845 9.90% 34,941 8.98%
7 Unlocked on-street 392 4.59% 7,241 1.86%
8 In a parking garage/lot 314 3.68% 13,693 3.52%
10 Carried it with me 683 8.00% 37,260 9.58%
997 Other 336 3.94% 19,285 4.96%
995 Missing Response 459,484 29,689,626
Total valid 8,534 100.00% 389,040 100.00%
Total missing 459,484 29,689,626
Total 468,018 30,078,667
Logic: if mode or transit_access or transit_egress = bicycle

scooter_park_location

scooter_park_location
Scooter parking location
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Inside house/apartment (includes garage, porch, storage area) 166 29.48% 9,146 24.84%
2 Bike/scooter rack 62 11.01% 1,495 4.06%
3 Locker for bikes/scooters 7 1.24% 140 0.38%
4 Secured room 28 4.97% 2,560 6.95%
5 Locked to other object (e.g., post, tree) 18 3.20% 95 0.26%
6 Scooter-share designated docking station 3 0.53% 86 0.23%
7 Unlocked on-street 17 3.02% 33 0.09%
8 In a parking garage/lot 33 5.86% 1,809 4.91%
10 Carried it with me 210 37.30% 21,150 57.45%
997 Other 19 3.37% 301 0.82%
995 Missing Response 467,455 30,041,851
Total valid 563 100.00% 36,816 100.00%
Total missing 467,455 30,041,851
Total 468,018 30,078,667
Logic: if mode or transit_access or transit_egress = micromobility

park_cost

park_cost
Amount paid for to park
Statistic
Unweighted
Weighted
Value Value
N 21,337.00 950,830.20
Min 0.00 0.00
P25 0.00 0.00
Median 0.00 0.00
Mean 3.09 5.64
P75 0.00 4.00
P95 16.00 30.00
Max 800.00 800.00
SD 15.50 20.40

taxi_cost

taxi_cost
Amount paid for taxi
Statistic
Unweighted
Weighted
Value Value
N 3,124.00 267,658.93
Min 0.00 0.00
P25 11.00 11.00
Median 16.00 15.00
Mean 23.84 22.76
P75 25.00 27.00
P95 52.00 52.00
Max 800.00 800.00
SD 42.72 28.90

taxi_pay

taxi_pay
Knows amount paid for taxi
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Knows amount paid 3,124 87.17% 267,659 88.25%
2 Don’t know 460 12.83% 35,644 11.75%
995 Missing Response 464,434 29,775,364
Total valid 3,584 100.00% 303,302 100.00%
Total missing 464,434 29,775,364
Total 468,018 30,078,667
Logic: if taxi_type = I paid, employer paid, split/shared

taxi_type

taxi_type
Type of taxi used on trip
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 I paid the fare myself (no reimbursement) 2,789 62.25% 253,877 69.98%
2 Employer paid (I am reimbursed) 402 8.97% 31,308 8.63%
3 Split/shared fare with other(s) 394 8.79% 18,117 4.99%
4 Someone else paid 100% (all of fare) 748 16.70% 48,219 13.29%
5 Other 147 3.28% 11,252 3.10%
995 Missing Response 463,538 29,715,894
Total valid 4,480 100.00% 362,773 100.00%
Total missing 463,538 29,715,894
Total 468,018 30,078,667
Logic: if mode or transit_access or transit_egress = taxi

tnc_type

tnc_type
Shared smartphone-app ride service
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Pooled (e.g., UberPool, Lyft Shared) 175 4.34% 16,734 5.63%
2 Regular (e.g., UberX, UberXL, Lyft, LyftXL) 3,725 92.39% 262,700 88.45%
3 Premium (e.g., UberBlack, Lyft Lux) 58 1.44% 3,412 1.15%
998 Don’t know 74 1.84% 14,174 4.77%
995 Missing Response 463,986 29,781,646
Total valid 4,032 100.00% 297,021 100.00%
Total missing 463,986 29,781,646
Total 468,018 30,078,667
Logic: if mode_taxi = Uber/Lyft

transit_type

transit_type
Payment method for transit
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Free (no cost at all) 5,977 13.51% 428,445 19.73%
2 Used transit pass (any type) (e.g., LinkPass, CharlieCard, Senior CharlieCard, Transportation Access Pass (TAP), etc.) 28,983 65.52% 1,282,215 59.04%
3 Cash, credit card, or ticket(s) 8,947 20.23% 432,569 19.92%
4 Don’t know 145 0.33% 16,929 0.78%
6 Used a transfer from a previous transit trip 185 0.42% 11,506 0.53%
995 Missing Response 423,781 27,907,002
Total valid 44,237 100.00% 2,171,664 100.00%
Total missing 423,781 27,907,002
Total 468,018 30,078,667
Logic: if mode = bus (except school bus) or rail

vehicle_park_pay

vehicle_park_pay
Knows amount paid to park vehicle
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Knows amount paid 4,974 89.11% 366,183 92.65%
2 Don’t know 608 10.89% 29,060 7.35%
995 Missing Response 462,436 29,683,424
Total valid 5,582 100.00% 395,243 100.00%
Total missing 462,436 29,683,424
Total 468,018 30,078,667
Logic: if vehicle_park_type = Paid via cash, credit card, tickets or parking service

8.7 Linked Trip

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 53,283 12.70% 0 0.00%
1 Yes 366,186 87.30% 28,050,396 100.00%
Total valid 419,469 100.00% 28,050,396 100.00%
Total 419,469 28,050,396

o_purpose_category

o_purpose_category
Origin purpose category
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,094 0.98% 3,542 0.01%
1 Home 143,832 34.38% 10,299,144 37.29%
2 Work 27,818 6.65% 2,775,581 10.05%
3 Work related 23,509 5.62% 1,234,736 4.47%
4 School 14,796 3.54% 1,432,089 5.19%
5 School related 2,469 0.59% 158,005 0.57%
6 Escort 28,697 6.86% 2,646,418 9.58%
7 Shop 46,652 11.15% 2,784,853 10.08%
8 Meal 33,766 8.07% 1,498,014 5.42%
9 Social recreation 47,478 11.35% 2,059,517 7.46%
10 Errand 23,735 5.67% 1,743,895 6.31%
11 Change mode 412 0.10% 10,431 0.04%
12 Overnight 15,722 3.76% 543,199 1.97%
13 Other 5,368 1.28% 426,636 1.54%
995 Missing Response 1,121 434,336
Total valid 418,348 100.00% 27,616,061 100.00%
Total missing 1,121 434,336
Total 419,469 28,050,396

o_purpose

o_purpose
Origin purpose
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,094 0.98% 3,542 0.01%
1 Went home 143,492 34.30% 10,273,824 37.21%
10 Went to primary workplace 27,743 6.63% 2,753,599 9.97%
11 Went to work-related activity (e.g., meeting, delivery, worksite) 19,814 4.74% 1,030,193 3.73%
13 Volunteering 1,098 0.26% 68,730 0.25%
14 Other work-related 1,984 0.47% 91,570 0.33%
21 Attend K-12 school 14,219 3.40% 1,328,676 4.81%
22 Attend college/university 214 0.05% 29,988 0.11%
23 Attend other type of class (e.g., cooking class) 187 0.04% 49,633 0.18%
24 Attend other education-related activity (e.g., field trip) 2,332 0.56% 147,296 0.53%
25 Attend vocational education class 6 0.00% 2,581 0.01%
26 Attend daycare or preschool 122 0.03% 12,564 0.05%
30 Grocery shopping 21,939 5.24% 1,419,595 5.14%
31 Got gas 5,168 1.24% 284,457 1.03%
32 Other routine shopping (e.g., pharmacy) 17,525 4.19% 946,287 3.43%
33 Errand without appointment (e.g., post office) 10,262 2.45% 604,634 2.19%
34 Medical visit (e.g., doctor, dentist) 6,538 1.56% 624,686 2.26%
36 Shopping for major item (e.g., furniture, car) 1,107 0.26% 47,017 0.17%
37 Errand with appointment (e.g., haircut) 3,139 0.75% 187,384 0.68%
44 Other activity only (e.g., attend meeting, pick-up or drop-off item) 2,068 0.49% 167,551 0.61%
45 Pick someone up 11,058 2.64% 971,192 3.52%
46 Drop someone off 12,568 3.00% 1,147,069 4.15%
47 Accompany someone only (e.g., go along for the ride) 1,187 0.28% 186,643 0.68%
48 BOTH pick up AND drop off 1,146 0.27% 110,146 0.40%
50 Dined out, got coffee, or take-out 33,379 7.98% 1,480,578 5.36%
51 Exercise or recreation (e.g., gym, jog, bike, walk dog) 19,393 4.64% 1,035,732 3.75%
52 Social activity (e.g., visit friends/relatives) 9,346 2.23% 369,901 1.34%
53 Leisure/entertainment/cultural (e.g., cinema, museum, park) 10,985 2.63% 336,780 1.22%
54 Religious/civic/volunteer activity 2,722 0.65% 109,241 0.40%
56 Family activity (e.g., watch child’s game) 3,150 0.75% 106,894 0.39%
60 Changed or transferred mode (e.g., waited for bus or exited bus) 377 0.09% 9,715 0.04%
61 Other errand 599 0.14% 199,101 0.72%
62 Other leisure 162 0.04% 49,568 0.18%
99 Other reason 13,560 3.24% 888,438 3.22%
150 Went to another residence (e.g., someone else’s home, second home) 13,312 3.18% 485,264 1.76%
152 Went to temporary lodging (e.g., hotel, vacation rental) 2,352 0.56% 53,523 0.19%
995 Missing Response 1,122 436,801
Total valid 418,347 100.00% 27,613,595 100.00%
Total missing 1,122 436,801
Total 419,469 28,050,396

d_purpose_category

d_purpose_category
Destination purpose category
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,149 0.99% 2,715 0.01%
1 Home 123,068 29.37% 9,357,341 33.38%
2 Work 27,697 6.61% 2,863,501 10.21%
3 Work related 23,831 5.69% 1,258,867 4.49%
4 School 14,947 3.57% 1,392,145 4.97%
5 School related 2,572 0.61% 161,063 0.57%
6 Escort 29,411 7.02% 2,721,219 9.71%
7 Shop 46,987 11.21% 2,819,246 10.06%
8 Meal 34,097 8.14% 1,538,349 5.49%
9 Social recreation 48,126 11.48% 2,138,877 7.63%
10 Errand 24,013 5.73% 1,767,570 6.31%
12 Overnight 16,813 4.01% 683,157 2.44%
13 Other 3,189 0.76% 392,822 1.40%
14 Loop trip 20,140 4.81% 937,512 3.34%
995 Missing Response 429 16,012
Total valid 419,040 100.00% 28,034,384 100.00%
Total missing 429 16,012
Total 419,469 28,050,396

d_purpose

d_purpose
Destination purpose
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Not imputable 4,149 0.99% 2,715 0.01%
1 Went home 142,791 34.04% 10,261,722 36.58%
10 Went to primary workplace 27,620 6.58% 2,838,001 10.12%
11 Went to work-related activity (e.g., meeting, delivery, worksite) 20,142 4.80% 1,056,472 3.77%
13 Volunteering 1,104 0.26% 69,821 0.25%
14 Other work-related 1,988 0.47% 92,861 0.33%
21 Attend K-12 school 14,430 3.44% 1,300,993 4.64%
22 Attend college/university 184 0.04% 29,133 0.10%
23 Attend other type of class (e.g., cooking class) 192 0.05% 41,442 0.15%
24 Attend other education-related activity (e.g., field trip) 2,432 0.58% 151,178 0.54%
25 Attend vocational education class 5 0.00% 1,868 0.01%
26 Attend daycare or preschool 85 0.02% 11,090 0.04%
30 Grocery shopping 22,093 5.27% 1,438,277 5.13%
31 Got gas 5,193 1.24% 286,595 1.02%
32 Other routine shopping (e.g., pharmacy) 17,639 4.21% 955,067 3.40%
33 Errand without appointment (e.g., post office) 10,336 2.46% 605,447 2.16%
34 Medical visit (e.g., doctor, dentist) 6,637 1.58% 642,064 2.29%
36 Shopping for major item (e.g., furniture, car) 1,141 0.27% 49,776 0.18%
37 Errand with appointment (e.g., haircut) 3,185 0.76% 189,292 0.67%
44 Other activity only (e.g., attend meeting, pick-up or drop-off item) 2,098 0.50% 172,191 0.61%
45 Pick someone up 10,804 2.58% 967,622 3.45%
46 Drop someone off 13,416 3.20% 1,197,567 4.27%
47 Accompany someone only (e.g., go along for the ride) 1,238 0.30% 199,255 0.71%
48 BOTH pick up AND drop off 1,165 0.28% 118,521 0.42%
50 Dined out, got coffee, or take-out 33,704 8.03% 1,520,861 5.42%
51 Exercise or recreation (e.g., gym, jog, bike, walk dog) 19,604 4.67% 1,052,329 3.75%
52 Social activity (e.g., visit friends/relatives) 9,535 2.27% 391,791 1.40%
53 Leisure/entertainment/cultural (e.g., cinema, museum, park) 11,115 2.65% 358,295 1.28%
54 Religious/civic/volunteer activity 2,747 0.65% 112,245 0.40%
56 Family activity (e.g., watch child’s game) 3,211 0.77% 115,908 0.41%
60 Changed or transferred mode (e.g., waited for bus or exited bus) 377 0.09% 13,982 0.05%
61 Other errand 604 0.14% 200,856 0.72%
62 Other leisure 171 0.04% 52,952 0.19%
99 Other reason 11,589 2.76% 875,219 3.12%
150 Went to another residence (e.g., someone else’s home, second home) 14,132 3.37% 597,002 2.13%
152 Went to temporary lodging (e.g., hotel, vacation rental) 2,613 0.62% 79,982 0.29%
Total valid 419,469 100.00% 28,050,396 100.00%
Total 419,469 28,050,396

linked_trip_mode

linked_trip_mode
Linked trip mode
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Missing 13,644 3.25% 210,517 0.75%
1 School bus 7,533 1.80% 570,723 2.03%
11 Regional transit 2,951 0.70% 171,584 0.61%
12 Local transit 12,020 2.87% 635,369 2.27%
22 HOV 3+ persons 61,536 14.67% 3,804,399 13.56%
23 HOV 2 persons 92,892 22.15% 6,080,657 21.68%
24 SOV 148,073 35.30% 12,270,538 43.74%
25 Bike 6,853 1.63% 314,951 1.12%
26 Personal mobility 874 0.21% 37,895 0.14%
27 Shared 680 0.16% 27,348 0.10%
28 TNC 3,755 0.90% 328,760 1.17%
29 Walk 66,268 15.80% 3,392,171 12.09%
30 Long Distance Passenger 881 0.21% 49,803 0.18%
99 Other 1,509 0.36% 155,682 0.56%
Total valid 419,469 100.00% 28,050,396 100.00%
Total 419,469 28,050,396

joint_status

joint_status
Indicates whether tour is individual, partially joint, or fully joint
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Not joint 268,409 63.99% 18,211,766 64.93%
2 Partially joint 6,153 1.47% 198,957 0.71%
3 Fully joint 144,907 34.55% 9,639,673 34.37%
Total valid 419,469 100.00% 28,050,396 100.00%
Total 419,469 28,050,396

escort_category

escort_category
No escort, escorted drop-off, escorted pick-up, escorting drop-off, or escorting pick-up
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No escort 406,564 96.92% 26,256,025 93.60%
1 Escorted dropoff 3,567 0.85% 451,376 1.61%
2 Escorted pickup 3,304 0.79% 388,883 1.39%
3 Escorting dropoff 3,126 0.75% 511,424 1.82%
4 Escorting pickup 2,908 0.69% 442,689 1.58%
Total valid 419,469 100.00% 28,050,396 100.00%
Total 419,469 28,050,396

8.8 Tour

is_complete

is_complete
Record is complete
Value Label
Unweighted
Weighted
Count Percent Count Percent
0 No 20,851 13.02% 0 0.00%
1 Yes 139,240 86.98% 11,466,591 100.00%
Total valid 160,091 100.00% 11,466,591 100.00%
Total 160,091 11,466,591

joint_status

joint_status
Indicates whether tour is individual, partially joint, or fully joint
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Not joint 96,014 59.97% 6,744,141 58.82%
2 Partially joint 36,966 23.09% 2,845,148 24.81%
3 Fully joint 27,111 16.93% 1,877,303 16.37%
Total valid 160,091 100.00% 11,466,591 100.00%
Total 160,091 11,466,591

tour_category

tour_category
Tour category (mandatory, non-mandatory)
Value Label
Unweighted
Weighted
Count Percent Count Percent
1 Individual mandatory 43,714 27.31% 4,123,034 35.96%
2 Individual non-mandatory 86,335 53.93% 5,275,480 46.01%
3 At work subtour 2,931 1.83% 190,774 1.66%
4 Joint 27,111 16.93% 1,877,303 16.37%
Total valid 160,091 100.00% 11,466,591 100.00%
Total 160,091 11,466,591

tour_mode

tour_mode
Tour mode
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Missing 3,974 2.48% 62,085 0.54%
1 School bus 4,024 2.51% 337,732 2.95%
11 Regional transit 1,851 1.16% 112,487 0.98%
12 Local transit 6,743 4.21% 378,399 3.30%
22 HOV 3+ persons 27,491 17.17% 1,945,481 16.97%
23 HOV 2 persons 35,382 22.10% 2,555,322 22.28%
24 SOV 46,627 29.13% 4,171,222 36.38%
25 Bike 2,965 1.85% 134,493 1.17%
26 Personal mobility 234 0.15% 18,640 0.16%
27 Shared 289 0.18% 12,966 0.11%
28 TNC 1,215 0.76% 129,771 1.13%
29 Walk 28,186 17.61% 1,507,669 13.15%
30 Long Distance Passenger 734 0.46% 45,190 0.39%
99 Other 376 0.23% 55,134 0.48%
Total valid 160,091 100.00% 11,466,591 100.00%
Total 160,091 11,466,591

tour_purpose

tour_purpose
Tour purpose
Value Label
Unweighted
Weighted
Count Percent Count Percent
-1 Missing 1,331 0.83% 1,672 0.01%
2 Work 21,087 13.19% 2,459,087 21.47%
3 Work-related 11,117 6.95% 601,549 5.25%
4 School 13,928 8.71% 1,252,401 10.93%
5 School-related 2,240 1.40% 177,458 1.55%
6 Escort 14,327 8.96% 1,377,469 12.03%
7 Shop 20,209 12.64% 1,284,455 11.21%
8 Meal 11,980 7.49% 574,132 5.01%
9 Social/Recreation 23,896 14.95% 1,249,734 10.91%
10 Errand 12,291 7.69% 966,850 8.44%
12 Overnight 3,145 1.97% 161,080 1.41%
13 Other 807 0.50% 146,253 1.28%
14 Loop 23,499 14.70% 1,202,129 10.50%
995 Missing Response 234 12,323
Total valid 159,857 100.00% 11,454,269 100.00%
Total missing 234 12,323
Total 160,091 11,466,591

9 How to Use This Guide

This handbook provides practical, end-to-end guidance for analysts working with the Massachusetts Travel Study dataset. It focuses on how to work from the prepared study tables and codebook tables, join, filter, weight, and analyze the data with reproducible examples. It should serve as the primary resource for descriptive analysis, common metrics, and design-aware inference using these data.

This guide is written in R, but the same principles apply in other statistical software such as Python, Stata, or SAS. The examples below assume the prepared tables and codebook objects are already available in hts and codebook.

Use this handbook alongside the dataset overview in Section 6 and the codebook in Section 7. The dataset overview explains which tables are included, the codebook explains what variables mean, and the later sections of this handbook explain analytic units, weights, and common metrics.

10 Setup and Initial Exploration

10.1 System Requirements and Software

This guide focuses on using R for analysis. Many of the same ideas also apply in other software such as Python, Stata, or SAS.

To follow the examples in this guide, you will need:

  • R (tested with version 4.4.3)
  • An R development environment such as Positron or RStudio

The following packages are used throughout:

  • data.table for large table workflows
  • dplyr and tidyr for data manipulation
  • srvyr for survey-weighted analysis
  • ggplot2 and gt for figures and tables
  • stringr for string handling
  • lubridate for date/time processing

Install these packages if they are not already available in your environment.

suppressPackageStartupMessages({
  library(data.table)
  library(dplyr)
  library(tidyr)
  library(srvyr)
  library(ggplot2)
  library(gt)
  library(stringr)
  library(lubridate)
})

10.2 Load Data

This code assumes you have manually unzipped the dataset to a local folder. Adjust the data_dir variable to point to your unzipped dataset location.

The list of .csv files should include:

hh.csv
persons.csv
day.csv
vehicle.csv
trip_unlinked.csv
trip_linked.csv
tour.csv
location.csv

Additionally, you should have two .csv files for the codebook:

value_labels.csv
variable_list.csv

The code below reads all CSV files into a list-of-data.frames called hts (for household travel survey), plus a separate list codebook for the codebook tables. If your study also includes a standalone sample plan CSV, read that file separately rather than expecting it inside the delivered dataset ZIP.

We use data.table::fread() for efficient reading of large CSV files; this can be replaced with read.csv() or other functions as needed, but only if you handle large integers manually. Both base::read.csv() and readr::read_csv cast long IDs as floating-point numbers, which can lead to duplicate IDs, particularly for linked trips. If using those functions, specify colClasses to read ID columns as character or use the bit64 package to handle 64-bit integers.

# Folder where you manually unzipped the dataset
data_dir <- "data_cache"

csv_paths <- list.files(
  data_dir,
  pattern = "\\.csv$",
  full.names = TRUE,
  recursive = TRUE
)

object_names <- tolower(gsub("\\.csv$", "", basename(csv_paths)))
object_names <- make.names(object_names, unique = TRUE)

# Read all csvs
all_data <- setNames(
  lapply(csv_paths, data.table::fread),
  object_names
)

# Separate codebook tables
codebook <- list(
  value_labels = all_data$value_labels,
  variable_list = all_data$variable_list
)

# Separate core HTS tables
hts <- all_data[setdiff(names(all_data), c("value_labels", "variable_list"))]

# Optional: standalone weighting sample plan CSV
sample_plan_path <- "path/to/sample_plan.csv"
sample_plan <- data.table::fread(sample_plan_path)

rm(all_data)

For MassDOT, one of the first setup steps after loading the data should be defining the complete-household analytic universe. Most person-, day-, trip-, and vehicle-level analyses should be limited to households where hts$hh$is_complete == 1.

complete_hh_ids <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::pull(hh_id)

Use complete_hh_ids when you need to restrict lower-level tables to complete households.

10.3 Inspect Tables

Once the files are loaded, inspect the tables before starting analysis.

Get List of Tables

Start by listing the prepared tables that are available in hts.

table_names <- data.frame(
  table = names(hts),
  stringsAsFactors = FALSE
)

Table 32 confirms which prepared HTS tables are loaded for analysis.

Code
gt::gt(table_names) %>%
  gt::tab_header(title = "Loaded HTS Tables")
Loaded HTS Tables
table
hh
person
day
vehicle
location
trip_unlinked
trip_linked
tour
value_labels
variable_list
Table 32: Loaded HTS tables.

Glimpse Data

Each table includes a mix of identifiers, survey variables, and often one or more weight columns. A quick glance at the person table is usually a good starting point.

dplyr::glimpse(hts$person)
#> Rows: 37,616
#> Columns: 269
#> $ person_id                 <chr> "2400008901", "2400008902", "2400012201", "2…
#> $ person_num                <int> 1, 2, 1, 1, 2, 3, 4, 1, 1, 2, 1, 1, 2, 1, 1,…
#> $ hh_id                     <chr> "24000089", "24000089", "24000122", "2400014…
#> $ surveyable                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ is_participant            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ is_proxy                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ has_proxy                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ has_phone                 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ phone_type                <int> 1, 1, 2, 995, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, …
#> $ hh_is_complete            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ is_complete               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ num_days_complete         <int> 1, 1, 7, 7, 1, 1, 1, 1, 1, 1, 1, 7, 7, 7, 6,…
#> $ num_trips                 <int> 2, 5, 39, 45, 2, 4, 2, 0, 4, 3, 3, 36, 13, 1…
#> $ relationship              <int> 0, 1, 0, 0, 2, 2, 1, 0, 0, 1, 0, 0, 1, 0, 0,…
#> $ age                       <int> 8, 9, 5, 9, 5, 4, 8, 7, 9, 9, 6, 10, 9, 10, …
#> $ gender                    <int> 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2,…
#> $ race_other                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ ethnicity_other           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ employment                <int> 5, 5, 5, 1, 1, 2, 1, 1, 5, 2, 1, 5, 5, 5, 1,…
#> $ work_mode                 <int> 995, 995, 995, 104, 100, 100, 1, 100, 995, 1…
#> $ job_type                  <int> 995, 995, 995, 5, 1, 5, 1, 1, 995, 2, 1, 995…
#> $ num_jobs                  <int> 995, 995, 995, 1, 1, 2, 1, 1, 995, 1, 2, 995…
#> $ work_lon                  <dbl> NA, NA, NA, -71.05214, -71.79986, -72.67345,…
#> $ work_lat                  <dbl> NA, NA, NA, 42.35606, 42.26745, 41.76257, 42…
#> $ work_in_region            <int> 995, 995, 995, 1, 1, 0, 1, 1, 995, 995, 1, 9…
#> $ work_state                <chr> NA, NA, NA, "25", "25", "09", "25", "25", NA…
#> $ work_county               <chr> NA, NA, NA, "25025", "25027", "09003", "2502…
#> $ work_bg_2010              <chr> NA, NA, NA, "250250701018", "250277317001", …
#> $ work_bg_2020              <chr> NA, NA, NA, "250250701042", "250277317002", …
#> $ work_puma_2012            <chr> NA, NA, NA, "03302", "00300", "00302", "0030…
#> $ work_puma_2022            <chr> NA, NA, NA, "00802", "00505", "20201", "0050…
#> $ education                 <int> 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 6, 2, 3, 6,…
#> $ student                   <int> 2, 2, 0, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
#> $ school_mode               <int> 995, 995, 1, 995, 995, 995, 995, 995, 995, 9…
#> $ school_type               <int> 995, 995, 13, 995, 995, 13, 995, 995, 995, 9…
#> $ school_freq               <int> 995, 995, 4, 995, 995, 995, 995, 995, 995, 9…
#> $ remote_class_freq         <int> 995, 995, 996, 995, 995, 2, 995, 995, 995, 9…
#> $ school_in_region          <int> 995, 995, 1, 995, 995, 995, 995, 995, 995, 9…
#> $ school_state              <chr> NA, NA, "25", NA, NA, NA, NA, NA, NA, NA, NA…
#> $ school_county             <chr> NA, NA, "25017", NA, NA, NA, NA, NA, NA, NA,…
#> $ school_puma_2012          <chr> NA, NA, "03400", NA, NA, NA, NA, NA, NA, NA,…
#> $ school_puma_2022          <chr> NA, NA, "00613", NA, NA, NA, NA, NA, NA, NA,…
#> $ school_bg_2010            <chr> NA, NA, "250173736002", NA, NA, NA, NA, NA, …
#> $ school_bg_2020            <chr> NA, NA, "250173736002", NA, NA, NA, NA, NA, …
#> $ school_lon                <dbl> NA, NA, -71.16924, NA, NA, NA, NA, NA, NA, N…
#> $ school_lat                <dbl> NA, NA, 42.33609, NA, NA, NA, NA, NA, NA, NA…
#> $ second_home               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ second_home_in_region     <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ second_home_state         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_county        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_puma_2012     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_puma_2022     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_bg_2010       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_bg_2020       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_lon           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_lat           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ can_drive                 <int> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ vehicle                   <int> 6, 7, 995, 6, 6, 9, 7, 8, 6, 7, 6, 6, 6, 6, …
#> $ transit_freq              <int> 4, 8, 5, 4, 9, 9, 9, 9, 8, 9, 8, 9, 9, 9, 8,…
#> $ tnc_freq                  <int> 7, 8, 8, 8, 995, 995, 995, 995, 995, 995, 8,…
#> $ bike_freq                 <int> 8, 996, 8, 996, 8, 996, 8, 996, 996, 996, 8,…
#> $ vanpool_freq              <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bikeshare_freq            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ scootshare_freq           <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ walk_freq                 <int> 4, 5, 1, 2, 5, 5, 1, 1, 2, 2, 1, 1, 8, 8, 4,…
#> $ transit_pass              <int> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ disability                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,…
#> $ participate               <int> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1,…
#> $ barriers_1                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ barriers_10               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_2                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_3                <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_4                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_5                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_6                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_7                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_8                <int> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1,…
#> $ barriers_9                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_997              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_999              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_other            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bicycle_other             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bicycle_type_1            <int> 1, 995, 1, 995, 995, 995, 995, 1, 1, 995, 99…
#> $ bicycle_type_2            <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bicycle_type_997          <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_comfort_lane         <int> 3, NA, 2, 1, NA, NA, NA, 2, 4, NA, 2, 4, 4, …
#> $ bike_comfort_local        <int> 1, NA, 1, 2, NA, NA, NA, 1, 1, NA, 2, 3, 4, …
#> $ bike_comfort_major        <int> 4, NA, 4, 1, NA, NA, NA, 4, 4, NA, 4, 4, 4, …
#> $ bike_comfort_minor        <int> 3, NA, 4, 1, NA, NA, NA, 2, 3, NA, 3, 4, 4, …
#> $ bike_comfort_neighborhood <int> 2, NA, 2, 2, NA, NA, NA, 1, 1, NA, 1, 3, 4, …
#> $ bike_comfort_paths        <int> 1, NA, 1, 2, NA, NA, NA, 1, 1, NA, 1, 1, 4, …
#> $ bike_comfort_street       <int> 2, NA, 3, 2, NA, NA, NA, 1, 2, NA, 2, 4, 4, …
#> $ bike_comfort_striped      <int> 3, NA, 3, 1, NA, NA, NA, 3, 4, NA, 3, 4, 4, …
#> $ bike_factors_1            <int> 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,…
#> $ bike_factors_10           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_11           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_12           <int> 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1,…
#> $ bike_factors_2            <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_3            <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_4            <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_5            <int> 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,…
#> $ bike_factors_6            <int> 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0,…
#> $ bike_factors_7            <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_8            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_9            <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_other        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bike_purpose_1            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_2            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_3            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_4            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_5            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_6            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_7            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_8            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_other        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bike_safety_1             <int> 0, 995, 1, 995, 995, 995, 995, 995, 1, 0, 0,…
#> $ bike_safety_2             <int> 1, 995, 1, 995, 995, 995, 995, 995, 0, 0, 1,…
#> $ bike_safety_3             <int> 0, 995, 1, 995, 995, 995, 995, 995, 0, 0, 1,…
#> $ bike_safety_4             <int> 0, 995, 1, 995, 995, 995, 995, 995, 0, 0, 1,…
#> $ bike_safety_5             <int> 1, 995, 0, 995, 995, 995, 995, 995, 0, 0, 0,…
#> $ bike_safety_6             <int> 0, 995, 0, 995, 995, 995, 995, 995, 0, 0, 0,…
#> $ bike_safety_7             <int> 0, 995, 0, 995, 995, 995, 995, 995, 0, 0, 0,…
#> $ bike_safety_8             <int> 0, 995, 0, 995, 995, 995, 995, 995, 0, 1, 0,…
#> $ bike_safety_other         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "At our …
#> $ bike_store_1              <int> 1, 995, 1, 995, 995, 995, 995, 1, 0, 995, 99…
#> $ bike_store_2              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_3              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_4              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_5              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_6              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_7              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_997            <int> 0, 995, 0, 995, 995, 995, 995, 0, 1, 995, 99…
#> $ carshare_freq             <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ commute_days_1            <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_days_2            <int> 995, 995, 995, 1, 1, 0, 1, 0, 995, 0, 1, 995…
#> $ commute_days_3            <int> 995, 995, 995, 1, 1, 0, 1, 0, 995, 0, 1, 995…
#> $ commute_days_4            <int> 995, 995, 995, 1, 1, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_days_5            <int> 995, 995, 995, 0, 1, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_days_6            <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_days_7            <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_days_996          <int> 995, 995, 995, 0, 0, 1, 0, 1, 995, 1, 0, 995…
#> $ commute_subsidy_1         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_subsidy_10        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_11        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_subsidy_12        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_13        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_14        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_2         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_3         <int> 995, 995, 995, 0, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ commute_subsidy_4         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_subsidy_5         <int> 995, 995, 995, 1, 0, 0, 0, 1, 995, 0, 0, 995…
#> $ commute_subsidy_6         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_7         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_8         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_9         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_996       <int> 995, 995, 995, 0, 1, 1, 0, 0, 995, 1, 0, 995…
#> $ commute_subsidy_998       <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_use_1     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_10    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_11    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_12    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_13    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_14    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_2     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_3     <int> 995, 995, 995, 0, 995, 995, 1, 1, 995, 995, …
#> $ commute_subsidy_use_4     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_5     <int> 995, 995, 995, 1, 995, 995, 0, 1, 995, 995, …
#> $ commute_subsidy_use_6     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_7     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_8     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_9     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_996   <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ ethnicity_1               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ ethnicity_2               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_3               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_4               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_997             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_999             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ev_subsidies              <int> 4, 995, 995, 5, 995, 995, 995, 5, 5, 995, 2,…
#> $ ev_typical_charge_1       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_2       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_3       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_4       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_5       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_6       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_997     <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ home_vehicle_park         <int> 1, 995, 995, 3, 995, 995, 995, 1, 1, 995, 1,…
#> $ home_vehicle_park_pay     <int> 0, 995, 995, 0, 995, 995, 995, 0, 0, 995, 0,…
#> $ home_vehicle_park_permit  <int> 995, 995, 995, 1, 995, 995, 995, 995, 995, 9…
#> $ micromobility_devices_1   <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ micromobility_devices_2   <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ micromobility_devices_3   <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ micromobility_devices_996 <int> 1, 995, 1, 1, 995, 995, 995, 1, 1, 995, 1, 1…
#> $ micromobility_devices_997 <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ num_bicycles              <int> 1, 995, 1, 0, 995, 995, 995, 2, 2, 995, 0, 0…
#> $ peerrent_freq             <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ race_1                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_2                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_3                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ race_4                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_5                    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,…
#> $ race_997                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_999                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_2                   <int> 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,…
#> $ share_3                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_4                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_5                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ share_6                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_7                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_996                 <int> 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,…
#> $ telework_days_1           <int> 995, 995, 995, 1, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ telework_days_2           <int> 995, 995, 995, 0, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ telework_days_3           <int> 995, 995, 995, 0, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ telework_days_4           <int> 995, 995, 995, 0, 0, 0, 0, 1, 995, 0, 0, 995…
#> $ telework_days_5           <int> 995, 995, 995, 1, 0, 0, 1, 0, 995, 0, 0, 995…
#> $ telework_days_6           <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ telework_days_7           <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ telework_days_996         <int> 995, 995, 995, 0, 1, 1, 0, 0, 995, 1, 1, 995…
#> $ telework_freq_pre_covid   <int> 995, 995, 3, 8, 996, 8, 8, 7, 995, 996, 996,…
#> $ transit_factors_1         <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,…
#> $ transit_factors_10        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ transit_factors_11        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_12        <int> 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0,…
#> $ transit_factors_2         <int> 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,…
#> $ transit_factors_3         <int> 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,…
#> $ transit_factors_4         <int> 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_5         <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_6         <int> 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0,…
#> $ transit_factors_7         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,…
#> $ transit_factors_8         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_9         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_other     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ transit_purpose_1         <int> 0, 995, 1, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_2         <int> 1, 995, 1, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_3         <int> 0, 995, 0, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_4         <int> 0, 995, 1, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_5         <int> 0, 995, 0, 1, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_6         <int> 0, 995, 0, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_7         <int> 1, 995, 0, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_other     <chr> "cultural events", NA, NA, NA, NA, NA, NA, N…
#> $ walk_purpose_1            <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_2            <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_3            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_4            <int> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_5            <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 995, 995…
#> $ walk_purpose_6            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_7            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_8            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 995, 995…
#> $ walk_purpose_other        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ why_no_bike_1             <int> 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,…
#> $ why_no_bike_2             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ why_no_bike_3             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ why_no_bike_4             <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ why_no_bike_5             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
#> $ why_no_bike_6             <int> 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,…
#> $ why_no_bike_7             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ why_no_bike_8             <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,…
#> $ person_type               <int> 4, 4, 3, 1, 1, 3, 1, 1, 4, 2, 1, 5, 4, 5, 1,…
#> $ person_weight             <dbl> 157.42583, 157.42583, 103.34663, 188.21553, …
#> $ race_imputed              <chr> "white", "white", "white", "white", "white",…
#> $ ethnicity_imputed         <chr> "not_hispanic", "not_hispanic", "not_hispani…
#> $ gender_imputed            <chr> "female", "male", "female", "female", "male"…
#> $ person_weight_tue         <dbl> 187.88768, 187.88768, 59.87428, 458.44733, 2…
#> $ person_weight_fri         <dbl> NA, NA, 75.95139, 655.71500, NA, NA, NA, NA,…
#> $ person_weight_mon         <dbl> NA, NA, 65.09989, 546.06550, NA, NA, NA, NA,…
#> $ person_weight_sat         <dbl> NA, NA, 76.29594, 651.46690, NA, NA, NA, NA,…
#> $ person_weight_sun         <dbl> NA, NA, 74.33908, 640.43646, NA, NA, NA, NA,…
#> $ person_weight_thu         <dbl> NA, NA, 64.49532, 489.71295, NA, NA, NA, NA,…
#> $ person_weight_wed         <dbl> NA, NA, 58.57788, 463.14780, NA, NA, NA, NA,…

View Table Dimensions

Table sizes reflect the hierarchical structure of the dataset: households contain people, people contain travel days, and days may contain zero or more trips.

table_dimensions <- data.frame(
  table = character(),
  rows = integer(),
  columns = integer(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  table_dimensions <- rbind(
    table_dimensions,
    data.frame(
      table = table_name,
      rows = nrow(hts[[table_name]]),
      columns = ncol(hts[[table_name]]),
      stringsAsFactors = FALSE
    )
  )
}

Use Table 33 to confirm that the row counts follow the expected household-to-person-to-day-to-trip hierarchy.

Code
gt::gt(table_dimensions) %>%
  gt::fmt_number(
    columns = c(rows, columns),
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    table = "Table",
    rows = "Rows",
    columns = "Columns"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )
Table Rows Columns
hh 18,122 52
person 37,616 269
day 134,187 65
vehicle 25,849 19
location 8,607,225 9
trip_unlinked 468,018 139
trip_linked 419,469 58
tour 160,091 57
value_labels 2,422 3
variable_list 567 19
Table 33: HTS table dimensions.

For MassDOT, it is also useful to compare delivered row counts with the complete-household subset before calculating any substantive estimate.

complete_household_table_dimensions <- data.frame(
  table = character(),
  complete_household_rows = integer(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  if ("hh_id" %in% names(hts[[table_name]])) {
    complete_household_rows <- sum(hts[[table_name]]$hh_id %in% complete_hh_ids)
  } else {
    complete_household_rows <- NA_integer_
  }

  complete_household_table_dimensions <- rbind(
    complete_household_table_dimensions,
    data.frame(
      table = table_name,
      complete_household_rows = complete_household_rows,
      stringsAsFactors = FALSE
    )
  )
}

Table 34 shows the number of rows that belong to complete households in each table with a household identifier.

Code
gt::gt(complete_household_table_dimensions) %>%
  gt::fmt_number(
    columns = complete_household_rows,
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    table = "Table",
    complete_household_rows = "Complete-Household Rows"
  )
Table Complete-Household Rows
hh 15,641
person 31,255
day 96,370
vehicle 21,770
location NA
trip_unlinked 411,573
trip_linked 366,186
tour 139,240
value_labels NA
variable_list NA
Table 34: Rows belonging to complete households.

View Sample Records

Before summarizing a table, it is often useful to preview a few records and confirm that the key fields look as expected.

person_preview <- head(hts$person)

Table 35 shows the first few person records from the prepared data.

Code
gt::gt(person_preview) %>%
  gt::tab_header(title = "Sample Person Records")
Sample Person Records
person_id person_num hh_id surveyable is_participant is_proxy has_proxy has_phone phone_type hh_is_complete is_complete num_days_complete num_trips relationship age gender race_other ethnicity_other employment work_mode job_type num_jobs work_lon work_lat work_in_region work_state work_county work_bg_2010 work_bg_2020 work_puma_2012 work_puma_2022 education student school_mode school_type school_freq remote_class_freq school_in_region school_state school_county school_puma_2012 school_puma_2022 school_bg_2010 school_bg_2020 school_lon school_lat second_home second_home_in_region second_home_state second_home_county second_home_puma_2012 second_home_puma_2022 second_home_bg_2010 second_home_bg_2020 second_home_lon second_home_lat can_drive vehicle transit_freq tnc_freq bike_freq vanpool_freq bikeshare_freq scootshare_freq walk_freq transit_pass disability participate barriers_1 barriers_10 barriers_2 barriers_3 barriers_4 barriers_5 barriers_6 barriers_7 barriers_8 barriers_9 barriers_997 barriers_999 barriers_other bicycle_other bicycle_type_1 bicycle_type_2 bicycle_type_997 bike_comfort_lane bike_comfort_local bike_comfort_major bike_comfort_minor bike_comfort_neighborhood bike_comfort_paths bike_comfort_street bike_comfort_striped bike_factors_1 bike_factors_10 bike_factors_11 bike_factors_12 bike_factors_2 bike_factors_3 bike_factors_4 bike_factors_5 bike_factors_6 bike_factors_7 bike_factors_8 bike_factors_9 bike_factors_other bike_purpose_1 bike_purpose_2 bike_purpose_3 bike_purpose_4 bike_purpose_5 bike_purpose_6 bike_purpose_7 bike_purpose_8 bike_purpose_other bike_safety_1 bike_safety_2 bike_safety_3 bike_safety_4 bike_safety_5 bike_safety_6 bike_safety_7 bike_safety_8 bike_safety_other bike_store_1 bike_store_2 bike_store_3 bike_store_4 bike_store_5 bike_store_6 bike_store_7 bike_store_997 carshare_freq commute_days_1 commute_days_2 commute_days_3 commute_days_4 commute_days_5 commute_days_6 commute_days_7 commute_days_996 commute_subsidy_1 commute_subsidy_10 commute_subsidy_11 commute_subsidy_12 commute_subsidy_13 commute_subsidy_14 commute_subsidy_2 commute_subsidy_3 commute_subsidy_4 commute_subsidy_5 commute_subsidy_6 commute_subsidy_7 commute_subsidy_8 commute_subsidy_9 commute_subsidy_996 commute_subsidy_998 commute_subsidy_use_1 commute_subsidy_use_10 commute_subsidy_use_11 commute_subsidy_use_12 commute_subsidy_use_13 commute_subsidy_use_14 commute_subsidy_use_2 commute_subsidy_use_3 commute_subsidy_use_4 commute_subsidy_use_5 commute_subsidy_use_6 commute_subsidy_use_7 commute_subsidy_use_8 commute_subsidy_use_9 commute_subsidy_use_996 ethnicity_1 ethnicity_2 ethnicity_3 ethnicity_4 ethnicity_997 ethnicity_999 ev_subsidies ev_typical_charge_1 ev_typical_charge_2 ev_typical_charge_3 ev_typical_charge_4 ev_typical_charge_5 ev_typical_charge_6 ev_typical_charge_997 home_vehicle_park home_vehicle_park_pay home_vehicle_park_permit micromobility_devices_1 micromobility_devices_2 micromobility_devices_3 micromobility_devices_996 micromobility_devices_997 num_bicycles peerrent_freq race_1 race_2 race_3 race_4 race_5 race_997 race_999 share_2 share_3 share_4 share_5 share_6 share_7 share_996 telework_days_1 telework_days_2 telework_days_3 telework_days_4 telework_days_5 telework_days_6 telework_days_7 telework_days_996 telework_freq_pre_covid transit_factors_1 transit_factors_10 transit_factors_11 transit_factors_12 transit_factors_2 transit_factors_3 transit_factors_4 transit_factors_5 transit_factors_6 transit_factors_7 transit_factors_8 transit_factors_9 transit_factors_other transit_purpose_1 transit_purpose_2 transit_purpose_3 transit_purpose_4 transit_purpose_5 transit_purpose_6 transit_purpose_7 transit_purpose_other walk_purpose_1 walk_purpose_2 walk_purpose_3 walk_purpose_4 walk_purpose_5 walk_purpose_6 walk_purpose_7 walk_purpose_8 walk_purpose_other why_no_bike_1 why_no_bike_2 why_no_bike_3 why_no_bike_4 why_no_bike_5 why_no_bike_6 why_no_bike_7 why_no_bike_8 person_type person_weight race_imputed ethnicity_imputed gender_imputed person_weight_tue person_weight_fri person_weight_mon person_weight_sat person_weight_sun person_weight_thu person_weight_wed
2400008901 1 24000089 1 1 0 0 1 1 1 1 1 2 0 8 1 NA NA 5 995 995 995 NA NA 995 NA NA NA NA NA NA 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 6 4 7 8 995 995 995 4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 1 0 0 3 1 4 3 2 1 2 3 1 0 0 0 1 1 0 0 0 0 0 0 NA 995 995 995 995 995 995 995 995 NA 0 1 0 0 1 0 0 0 NA 1 0 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 4 995 995 995 995 995 995 995 1 0 995 0 0 0 1 0 1 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 0 0 0 0 1 1 0 1 1 0 0 0 NA 0 1 0 0 0 0 1 cultural events 0 1 0 1 0 0 0 1 NA 0 0 0 0 0 1 0 0 4 157.42583 white not_hispanic female 187.88768 NA NA NA NA NA NA
2400008902 2 24000089 1 1 0 0 1 1 1 1 1 5 1 9 2 NA NA 5 995 995 995 NA NA 995 NA NA NA NA NA NA 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 7 8 8 996 995 995 995 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 NA NA NA NA NA NA NA NA 0 0 0 1 0 0 0 0 0 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 0 0 0 0 1 1 1 1 1 0 0 0 NA 995 995 995 995 995 995 995 NA 0 0 0 1 0 0 0 1 NA 0 0 0 0 0 0 0 1 4 157.42583 white not_hispanic male 187.88768 NA NA NA NA NA NA
2400012201 1 24000122 1 1 0 0 1 2 1 1 7 39 0 5 1 NA NA 5 995 995 995 NA NA 995 NA NA NA NA NA NA 7 0 1 13 4 996 1 25 25017 03400 00613 250173736002 250173736002 -71.16924 42.33609 0 995 NA NA NA NA NA NA NA NA 0 995 5 8 8 995 995 995 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 1 0 0 2 1 4 4 2 1 3 3 1 0 0 0 1 0 0 0 0 1 0 0 NA 995 995 995 995 995 995 995 995 NA 1 1 1 1 0 0 0 0 NA 1 0 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 0 0 0 1 0 1 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 995 995 995 995 995 995 995 995 3 1 0 0 0 1 0 1 0 0 0 0 0 NA 1 1 0 1 0 0 0 NA 1 1 0 1 1 0 0 1 NA 0 0 0 1 0 1 0 0 3 103.34663 white not_hispanic female 59.87428 75.95139 65.09989 76.29594 74.33908 64.49532 58.57788
2400014001 1 24000140 1 1 0 0 1 995 1 1 7 45 0 9 1 NA NA 1 104 5 1 -71.05214 42.35606 1 25 25025 250250701018 250250701042 03302 00802 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 6 4 8 996 995 995 995 2 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 1 2 1 1 2 2 2 1 0 0 0 1 0 0 0 0 0 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 5 995 995 995 995 995 995 995 3 0 1 0 0 0 1 0 0 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 0 NA 0 0 0 0 1 0 0 NA 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 1 188.21553 white not_hispanic female 458.44733 655.71500 546.06550 651.46690 640.43646 489.71295 463.14780
2400015802 2 24000158 1 1 0 0 1 2 1 1 1 2 2 5 2 NA NA 1 100 1 1 -71.79986 42.26745 1 25 25027 250277317001 250277317002 00300 00505 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 6 9 995 8 995 995 995 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 NA NA NA NA NA NA NA NA 1 0 0 0 0 0 1 0 1 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 996 0 0 0 0 0 0 1 0 0 0 0 0 NA 995 995 995 995 995 995 995 NA 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 1 50.69754 white not_hispanic male 216.88728 NA NA NA NA NA NA
2400015803 3 24000158 1 1 0 0 1 2 1 1 1 4 2 4 2 NA NA 2 100 5 2 -72.67345 41.76257 0 09 09003 090035021002 090035021001 00302 20201 6 3 995 13 995 2 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 9 9 995 996 995 995 995 5 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 1 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 8 0 0 0 0 0 0 0 0 1 0 0 0 NA 995 995 995 995 995 995 995 NA 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 3 50.69754 white not_hispanic male 216.88728 NA NA NA NA NA NA
Table 35: Sample person records.

Inspect Weight Columns

Weights are typically included in the household, person, day, trip, and tour tables when weighted estimates are supported.

weight_summaries <- data.frame(
  table = character(),
  weight_column = character(),
  min = numeric(),
  median = numeric(),
  mean = numeric(),
  max = numeric(),
  zero_count = integer(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  weight_columns <- grep("_weight$", names(hts[[table_name]]), value = TRUE)

  if (length(weight_columns) == 0L) {
    next
  }

  weight_col <- weight_columns[[1]]
  weight_vec <- hts[[table_name]][[weight_col]]

  weight_summaries <- rbind(
    weight_summaries,
    data.frame(
      table = table_name,
      weight_column = weight_col,
      min = min(weight_vec, na.rm = TRUE),
      median = stats::median(weight_vec, na.rm = TRUE),
      mean = mean(weight_vec, na.rm = TRUE),
      max = max(weight_vec, na.rm = TRUE),
      zero_count = sum(weight_vec == 0, na.rm = TRUE),
      stringsAsFactors = FALSE
    )
  )
}

Table 36 is a quick way to check whether the main weight columns are present and populated.

Code
gt::gt(weight_summaries) %>%
  gt::fmt_number(
    columns = c(min, median, mean, max),
    decimals = 3
  ) %>%
  gt::fmt_number(
    columns = zero_count,
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    table = "Table",
    weight_column = "Weight Column",
    min = "Min",
    median = "Median",
    mean = "Mean",
    max = "Max",
    zero_count = "Zero Count"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )
Table Weight Column Min Median Mean Max Zero Count
hh hh_weight 15.780 106.787 180.980 1,129.933 0
person person_weight 0.000 112.787 217.239 4,230.993 1,556
day day_weight 0.000 30.002 108.071 3,570.328 13,520
vehicle hh_weight 15.780 116.151 204.342 1,129.933 0
trip_unlinked trip_weight 0.000 28.591 117.434 5,604.895 56,013
trip_linked linked_trip_weight 0.000 29.896 124.146 5,604.895 51,963
tour tour_weight 0.000 30.100 128.477 5,373.668 21,243
Table 36: Weight summaries by table.

The codebook object can also be inspected immediately after loading.

variable_list_preview <- head(codebook$variable_list)
value_labels_preview <- head(codebook$value_labels)

Review Table 37 and Table 38 before analysis so you can confirm variable definitions and labeled response values.

Code
gt::gt(variable_list_preview) %>%
  gt::tab_header(title = "Variable List Preview")
Variable List Preview
order source variable is_checkbox hh person day vehicle location unlinked_trip linked_trip tour logic description data_type write_to_export exclude_from_frequencies category exclude
1 pipeline hh_id 0 1 1 1 1 0 1 1 1 NA Household ID integer TRUE TRUE NA NA
2 pipeline is_complete 0 1 1 1 1 0 1 1 1 NA Record is complete integer/categorical TRUE FALSE NA NA
3 pipeline num_trips 0 1 1 1 0 0 0 0 0 NA Number of trips integer TRUE FALSE NA NA
4 pipeline num_days_complete 0 1 1 0 0 0 0 0 0 NA Number of complete days integer/categorical TRUE FALSE NA NA
5 pipeline first_travel_date 0 1 0 0 0 0 1 0 0 NA First travel date date TRUE TRUE NA NA
6 pipeline last_travel_date 0 1 0 0 0 0 1 0 0 NA Last travel date date TRUE TRUE NA NA
Table 37: Variable list preview.
Code
gt::gt(value_labels_preview) %>%
  gt::tab_header(title = "Value Labels Preview")
Value Labels Preview
variable value label
added_trip 0 No
added_trip 1 Yes
age 1 Age under 5
age 10 Age 75-84
age 11 Age 85 up
age 2 Age 5-15
Table 38: Value label preview.

11 Data Structure and Joins

The dataset is organized as a set of related tables. Each table represents a different unit of observation, such as households, people, travel days, vehicles, trips, locations, or tours. Most analyses require using more than one table, so it is important to understand how records are linked before joining tables. The examples below use the same dplyr join pattern used throughout the analyst handbook.

For MassDOT, the main linkage pattern is:

  • Households are the primary sampling unit and are identified by hh_id.
  • Persons are nested within households and linked by hh_id.
  • Each person can have multiple travel days, linked by person_id and day_id.
  • Each day can record zero or more trips, linked from the day and person tables by day_id and person_id.
  • Trips can be analyzed at the linked or unlinked level depending on the research question.
  • Location records provide point-level context along trips and are linked by trip and day identifiers.
  • Tours summarize sequences of linked trips that begin and end at the same anchor location.

Figure 9 repeats the dataset overview figure so the table hierarchy is visible while working through joins.

Diagram showing how prepared study tables relate to each other.
Figure 9: Data linkages across prepared tables.

The first step in any join workflow is to inspect the identifier columns that connect the tables.

id_columns_summary <- data.frame(
  table = character(),
  id_columns = character(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  id_columns <- grep("_id$", names(hts[[table_name]]), value = TRUE)

  if (length(id_columns) == 0L) {
    next
  }

  id_columns_summary <- rbind(
    id_columns_summary,
    data.frame(
      table = table_name,
      id_columns = paste(id_columns, collapse = ", "),
      stringsAsFactors = FALSE
    )
  )
}

Use Table 39 to confirm which keys are available before you start joining tables.

Code
gt::gt(id_columns_summary) %>%
  gt::cols_label(
    table = "Table",
    id_columns = "Identifier Columns"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )
Table Identifier Columns
hh hh_id
person person_id, hh_id
day day_id, person_id, hh_id
vehicle hh_id, vehicle_id
location trip_id
trip_unlinked trip_id, day_id, hh_id, person_id, linked_trip_id, joint_trip_id, tour_id
trip_linked linked_trip_id, hh_id, person_id, day_id, joint_trip_id, tour_id
tour tour_id, hh_id, person_id, day_id, out_chauffeur_id, inb_chauffeur_id, out_chauffeur_tour_id, inb_chauffeur_tour_id, parent_tour_id, joint_tour_id
Table 39: Identifier columns by table.

11.1 Common Join Patterns

The most common joins follow the hierarchy from households to lower-level records.

For MassDOT, many substantive analyses should also carry the household completion rule through the join process. A common pattern is to join hh$is_complete or filter lower-level tables with hh_id %in% complete_hh_ids before summarizing.

For example, to join household characteristics to people, first select the household fields that belong in the person-level analysis file.

household_join_fields <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::mutate(hh_is_complete = is_complete)

if ("income_detailed" %in% names(hts$hh)) {
  household_join_fields <- household_join_fields %>%
    dplyr::mutate(household_income = income_detailed)
}

if ("num_vehicles" %in% names(hts$hh)) {
  household_join_fields <- household_join_fields %>%
    dplyr::mutate(household_vehicles = num_vehicles)
}

household_join_fields <- household_join_fields %>%
  dplyr::select(
    hh_id,
    hh_is_complete,
    dplyr::any_of(c("household_income", "household_vehicles"))
  )

person_with_household <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    household_join_fields,
    by = "hh_id"
  )

Table 40 shows the person table after household variables have been joined in.

Code
gt::gt(head(person_with_household)) %>%
  gt::tab_header(title = "Person Records Joined to Household Variables")
Person Records Joined to Household Variables
person_id person_num hh_id surveyable is_participant is_proxy has_proxy has_phone phone_type hh_is_complete.x is_complete num_days_complete num_trips relationship age gender race_other ethnicity_other employment work_mode job_type num_jobs work_lon work_lat work_in_region work_state work_county work_bg_2010 work_bg_2020 work_puma_2012 work_puma_2022 education student school_mode school_type school_freq remote_class_freq school_in_region school_state school_county school_puma_2012 school_puma_2022 school_bg_2010 school_bg_2020 school_lon school_lat second_home second_home_in_region second_home_state second_home_county second_home_puma_2012 second_home_puma_2022 second_home_bg_2010 second_home_bg_2020 second_home_lon second_home_lat can_drive vehicle transit_freq tnc_freq bike_freq vanpool_freq bikeshare_freq scootshare_freq walk_freq transit_pass disability participate barriers_1 barriers_10 barriers_2 barriers_3 barriers_4 barriers_5 barriers_6 barriers_7 barriers_8 barriers_9 barriers_997 barriers_999 barriers_other bicycle_other bicycle_type_1 bicycle_type_2 bicycle_type_997 bike_comfort_lane bike_comfort_local bike_comfort_major bike_comfort_minor bike_comfort_neighborhood bike_comfort_paths bike_comfort_street bike_comfort_striped bike_factors_1 bike_factors_10 bike_factors_11 bike_factors_12 bike_factors_2 bike_factors_3 bike_factors_4 bike_factors_5 bike_factors_6 bike_factors_7 bike_factors_8 bike_factors_9 bike_factors_other bike_purpose_1 bike_purpose_2 bike_purpose_3 bike_purpose_4 bike_purpose_5 bike_purpose_6 bike_purpose_7 bike_purpose_8 bike_purpose_other bike_safety_1 bike_safety_2 bike_safety_3 bike_safety_4 bike_safety_5 bike_safety_6 bike_safety_7 bike_safety_8 bike_safety_other bike_store_1 bike_store_2 bike_store_3 bike_store_4 bike_store_5 bike_store_6 bike_store_7 bike_store_997 carshare_freq commute_days_1 commute_days_2 commute_days_3 commute_days_4 commute_days_5 commute_days_6 commute_days_7 commute_days_996 commute_subsidy_1 commute_subsidy_10 commute_subsidy_11 commute_subsidy_12 commute_subsidy_13 commute_subsidy_14 commute_subsidy_2 commute_subsidy_3 commute_subsidy_4 commute_subsidy_5 commute_subsidy_6 commute_subsidy_7 commute_subsidy_8 commute_subsidy_9 commute_subsidy_996 commute_subsidy_998 commute_subsidy_use_1 commute_subsidy_use_10 commute_subsidy_use_11 commute_subsidy_use_12 commute_subsidy_use_13 commute_subsidy_use_14 commute_subsidy_use_2 commute_subsidy_use_3 commute_subsidy_use_4 commute_subsidy_use_5 commute_subsidy_use_6 commute_subsidy_use_7 commute_subsidy_use_8 commute_subsidy_use_9 commute_subsidy_use_996 ethnicity_1 ethnicity_2 ethnicity_3 ethnicity_4 ethnicity_997 ethnicity_999 ev_subsidies ev_typical_charge_1 ev_typical_charge_2 ev_typical_charge_3 ev_typical_charge_4 ev_typical_charge_5 ev_typical_charge_6 ev_typical_charge_997 home_vehicle_park home_vehicle_park_pay home_vehicle_park_permit micromobility_devices_1 micromobility_devices_2 micromobility_devices_3 micromobility_devices_996 micromobility_devices_997 num_bicycles peerrent_freq race_1 race_2 race_3 race_4 race_5 race_997 race_999 share_2 share_3 share_4 share_5 share_6 share_7 share_996 telework_days_1 telework_days_2 telework_days_3 telework_days_4 telework_days_5 telework_days_6 telework_days_7 telework_days_996 telework_freq_pre_covid transit_factors_1 transit_factors_10 transit_factors_11 transit_factors_12 transit_factors_2 transit_factors_3 transit_factors_4 transit_factors_5 transit_factors_6 transit_factors_7 transit_factors_8 transit_factors_9 transit_factors_other transit_purpose_1 transit_purpose_2 transit_purpose_3 transit_purpose_4 transit_purpose_5 transit_purpose_6 transit_purpose_7 transit_purpose_other walk_purpose_1 walk_purpose_2 walk_purpose_3 walk_purpose_4 walk_purpose_5 walk_purpose_6 walk_purpose_7 walk_purpose_8 walk_purpose_other why_no_bike_1 why_no_bike_2 why_no_bike_3 why_no_bike_4 why_no_bike_5 why_no_bike_6 why_no_bike_7 why_no_bike_8 person_type person_weight race_imputed ethnicity_imputed gender_imputed person_weight_tue person_weight_fri person_weight_mon person_weight_sat person_weight_sun person_weight_thu person_weight_wed hh_is_complete.y household_income household_vehicles
2400008901 1 24000089 1 1 0 0 1 1 1 1 1 2 0 8 1 NA NA 5 995 995 995 NA NA 995 NA NA NA NA NA NA 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 6 4 7 8 995 995 995 4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 1 0 0 3 1 4 3 2 1 2 3 1 0 0 0 1 1 0 0 0 0 0 0 NA 995 995 995 995 995 995 995 995 NA 0 1 0 0 1 0 0 0 NA 1 0 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 4 995 995 995 995 995 995 995 1 0 995 0 0 0 1 0 1 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 0 0 0 0 1 1 0 1 1 0 0 0 NA 0 1 0 0 0 0 1 cultural events 0 1 0 1 0 0 0 1 NA 0 0 0 0 0 1 0 0 4 157.42583 white not_hispanic female 187.88768 NA NA NA NA NA NA 1 999 2
2400008902 2 24000089 1 1 0 0 1 1 1 1 1 5 1 9 2 NA NA 5 995 995 995 NA NA 995 NA NA NA NA NA NA 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 7 8 8 996 995 995 995 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 NA NA NA NA NA NA NA NA 0 0 0 1 0 0 0 0 0 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 0 0 0 0 1 1 1 1 1 0 0 0 NA 995 995 995 995 995 995 995 NA 0 0 0 1 0 0 0 1 NA 0 0 0 0 0 0 0 1 4 157.42583 white not_hispanic male 187.88768 NA NA NA NA NA NA 1 999 2
2400012201 1 24000122 1 1 0 0 1 2 1 1 7 39 0 5 1 NA NA 5 995 995 995 NA NA 995 NA NA NA NA NA NA 7 0 1 13 4 996 1 25 25017 03400 00613 250173736002 250173736002 -71.16924 42.33609 0 995 NA NA NA NA NA NA NA NA 0 995 5 8 8 995 995 995 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 1 0 0 2 1 4 4 2 1 3 3 1 0 0 0 1 0 0 0 0 1 0 0 NA 995 995 995 995 995 995 995 995 NA 1 1 1 1 0 0 0 0 NA 1 0 0 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 0 0 0 1 0 1 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 995 995 995 995 995 995 995 995 3 1 0 0 0 1 0 1 0 0 0 0 0 NA 1 1 0 1 0 0 0 NA 1 1 0 1 1 0 0 1 NA 0 0 0 1 0 1 0 0 3 103.34663 white not_hispanic female 59.87428 75.95139 65.09989 76.29594 74.33908 64.49532 58.57788 1 4 0
2400014001 1 24000140 1 1 0 0 1 995 1 1 7 45 0 9 1 NA NA 1 104 5 1 -71.05214 42.35606 1 25 25025 250250701018 250250701042 03302 00802 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 6 4 8 996 995 995 995 2 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 1 2 1 1 2 2 2 1 0 0 0 1 0 0 0 0 0 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 5 995 995 995 995 995 995 995 3 0 1 0 0 0 1 0 0 995 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 0 NA 0 0 0 0 1 0 0 NA 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 1 188.21553 white not_hispanic female 458.44733 655.71500 546.06550 651.46690 640.43646 489.71295 463.14780 1 9 1
2400015802 2 24000158 1 1 0 0 1 2 1 1 1 2 2 5 2 NA NA 1 100 1 1 -71.79986 42.26745 1 25 25027 250277317001 250277317002 00300 00505 7 2 995 995 995 995 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 6 9 995 8 995 995 995 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 NA NA NA NA NA NA NA NA 1 0 0 0 0 0 1 0 1 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 996 0 0 0 0 0 0 1 0 0 0 0 0 NA 995 995 995 995 995 995 995 NA 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 1 50.69754 white not_hispanic male 216.88728 NA NA NA NA NA NA 1 999 4
2400015803 3 24000158 1 1 0 0 1 2 1 1 1 4 2 4 2 NA NA 2 100 5 2 -72.67345 41.76257 0 09 09003 090035021002 090035021001 00302 20201 6 3 995 13 995 2 995 NA NA NA NA NA NA NA NA 0 995 NA NA NA NA NA NA NA NA 1 9 9 995 996 995 995 995 5 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 NA NA 995 995 995 NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 1 0 0 0 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 NA 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 1 0 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 8 0 0 0 0 0 0 0 0 1 0 0 0 NA 995 995 995 995 995 995 995 NA 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 3 50.69754 white not_hispanic male 216.88728 NA NA NA NA NA NA 1 999 4
Table 40: Person records with household fields.

To join person characteristics to trips, build a person-level lookup first and then join it to the trip table.

person_join_fields <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::transmute(
    person_id,
    hh_id,
    person_age = age,
    person_gender = gender,
    person_employment = employment
  )

trip_with_person <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    person_join_fields,
    by = "person_id"
  )

Table 41 shows the trip table after person-level fields have been added.

Code
gt::gt(head(trip_with_person)) %>%
  gt::tab_header(title = "Trip Records Joined to Person Variables")
Trip Records Joined to Person Variables
trip_id day_id trip_num hh_id.x first_travel_date last_travel_date person_id travel_date travel_dow day_num hh_day_complete hh_is_complete day_is_complete trip_survey_complete depart_time depart_date depart_dow depart_hour depart_minute depart_seconds arrive_time arrive_date arrive_dow arrive_hour arrive_minute arrive_second distance_meters distance_miles duration_seconds duration_minutes dwell_mins speed_mph speed_flag o_in_region o_state o_county o_puma_2012 o_puma_2022 o_bg_2010 o_bg_2020 o_lon o_lat d_in_region d_state d_county d_puma_2012 d_puma_2022 d_bg_2010 d_bg_2020 d_lon d_lat mode_type mode_1 mode_2 mode_3 mode_4 mode_other_specify transit_egress transit_access num_travelers num_hh_travelers num_non_hh_travelers hh_member_1 hh_member_2 hh_member_3 hh_member_4 hh_member_5 hh_member_6 hh_member_7 hh_member_8 driver ev_charge_station ev_charge_station_decision o_purpose_category_reported o_purpose_category o_purpose_reported o_purpose d_purpose_category_reported d_purpose_reported d_purpose_category d_purpose d_purpose_other park_location park_type bike_park_loc scooter_park_location park_cost taxi_cost taxi_pay taxi_type tnc_type transit_type user_merged user_split user_deleted added_type copied_from_proxy unlinked_split split_loop days_first_trip days_last_trip is_transit_leg linked_trip_id is_transit is_access is_egress has_access has_egress transit_quality_flag has_synthetic_access has_synthetic_egress added_trip person_num ev_charge_station_level_1 ev_charge_station_level_2 ev_charge_station_level_3 ev_charge_station_level_998 other_bicycle vehicle_park_pay distance_beeline joint_trip_num joint_trip_id corrected_hh_members imputed_record_type imputed_host_trip imputed_joint_trip home_distance linked_trip_num tour_num tour_id trip_weight trip_weight_tue trip_weight_fri trip_weight_mon trip_weight_sat trip_weight_sun trip_weight_thu trip_weight_wed is_complete hh_id.y person_age person_gender person_employment
2400008901001 240000890101 1 24000089 2024-06-11 2024-06-11 2400008901 2024-06-11 2 1 1 1 1 1 2024-06-11 18:20:00 2024-06-11 2 14 20 0 2024-06-11 18:27:00 2024-06-11 2 14 27 0 4710 2.9266656 420 7 95 25.085705 0 1 25 25009 00703 00706 250092174003 250092174023 -70.87968 42.54132 1 25 25009 00703 00706 250092176004 250092176012 -70.85492 42.57236 8 6 995 995 995 None 995 995 1 1 0 1 0 0 0 0 0 0 0 1 995 995 NA 1 NA 1 9 54 9 54 None 1 995 995 995 NA NA 995 995 995 995 995 995 0 NA 0 0 0 1 0 0 2400008901010101 0 995 995 995 995 None 995 995 1 1 995 995 995 995 None 995 4007.6875 -1 -1 0 0 -1 0 2.9798525 1 1 24000089010101 247.1357 316.6946 NA NA NA NA NA NA 1 24000089 8 1 5
2400008901002 240000890101 2 24000089 2024-06-11 2024-06-11 2400008901 2024-06-11 2 1 1 1 1 1 2024-06-11 20:02:00 2024-06-11 2 16 2 0 2024-06-11 20:15:00 2024-06-11 2 16 15 0 4730 2.9390930 780 13 645 13.565045 0 1 25 25009 00703 00706 250092176004 250092176012 -70.85492 42.57236 1 25 25009 00703 00706 250092174003 250092174023 -70.87968 42.54132 8 6 995 995 995 None 995 995 1 1 0 1 0 0 0 0 0 0 0 1 995 995 9 9 54 54 1 1 1 1 None 1 995 995 995 NA NA 995 995 995 995 995 995 0 NA 0 0 0 0 1 0 2400008901010102 0 995 995 995 995 None 995 995 1 1 995 995 995 995 None 995 4007.6875 -1 -1 0 0 -1 0 0.0000000 2 1 24000089010101 247.1357 316.6946 NA NA NA NA NA NA 1 24000089 8 1 5
2400008902001 240000890201 1 24000089 2024-06-11 2024-06-11 2400008902 2024-06-11 2 1 1 1 1 1 2024-06-11 12:00:00 2024-06-11 2 8 0 0 2024-06-11 12:05:00 2024-06-11 2 8 5 0 666 0.4138342 300 5 75 4.966011 0 1 25 25009 00703 00706 250092174003 250092174023 -70.87968 42.54132 1 25 25009 00703 00706 250092174004 250092174022 -70.88607 42.54144 8 7 995 995 995 None 995 995 1 1 0 0 1 0 0 0 0 0 0 1 995 995 NA 1 NA 1 8 50 8 50 None 4 1 995 995 NA NA 995 995 995 995 995 995 0 NA 0 0 0 1 0 0 2400008902010101 0 995 995 995 995 None 995 995 1 2 995 995 995 995 None 995 524.2716 -1 -1 0 0 -1 0 0.7113409 1 1 24000089020101 247.1357 316.6946 NA NA NA NA NA NA 1 24000089 9 2 5
2400008902002 240000890201 2 24000089 2024-06-11 2024-06-11 2400008902 2024-06-11 2 1 1 1 1 1 2024-06-11 13:20:00 2024-06-11 2 9 20 0 2024-06-11 13:25:00 2024-06-11 2 9 25 0 1025 0.6369071 300 5 15 7.642885 0 1 25 25009 00703 00706 250092174004 250092174022 -70.88607 42.54144 1 25 25009 00703 00706 250092174002 250092174013 -70.88079 42.54626 8 7 995 995 995 None 995 995 1 1 0 0 1 0 0 0 0 0 0 1 995 995 8 8 50 50 10 33 10 33 None 4 1 995 995 NA NA 995 995 995 995 995 995 0 NA 0 0 0 0 0 0 2400008902010102 0 995 995 995 995 None 995 995 1 2 995 995 995 995 None 995 689.5092 -1 -1 0 0 -1 0 0.2184301 2 1 24000089020101 247.1357 316.6946 NA NA NA NA NA NA 1 24000089 9 2 5
2400008902003 240000890201 3 24000089 2024-06-11 2024-06-11 2400008902 2024-06-11 2 1 1 1 1 1 2024-06-11 13:40:00 2024-06-11 2 9 40 0 2024-06-11 13:45:00 2024-06-11 2 9 45 0 1084 0.6735680 300 5 95 8.082817 0 1 25 25009 00703 00706 250092174002 250092174013 -70.88079 42.54626 1 25 25009 00703 00706 250092174003 250092174023 -70.87968 42.54132 8 7 995 995 995 None 995 995 1 1 0 0 1 0 0 0 0 0 0 1 995 995 10 10 33 33 1 1 1 1 None 1 995 995 995 NA NA 995 995 995 995 995 995 0 NA 0 0 0 0 0 0 2400008902010103 0 995 995 995 995 None 995 995 1 2 995 995 995 995 None 995 557.4029 -1 -1 0 0 -1 0 0.0000000 3 1 24000089020101 247.1357 316.6946 NA NA NA NA NA NA 1 24000089 9 2 5
2400008902004 240000890201 4 24000089 2024-06-11 2024-06-11 2400008902 2024-06-11 2 1 1 1 1 1 2024-06-11 15:20:00 2024-06-11 2 11 20 0 2024-06-11 15:30:00 2024-06-11 2 11 30 0 1782 1.1072862 600 10 35 6.643717 0 1 25 25009 00703 00706 250092174003 250092174023 -70.87968 42.54132 1 25 25009 00703 00706 250092173003 250092173003 -70.87901 42.55242 8 7 995 995 995 None 995 995 1 1 0 0 1 0 0 0 0 0 0 1 995 995 1 1 1 1 10 37 10 37 None 4 1 995 995 NA NA 995 995 995 995 995 995 0 NA 0 0 0 0 0 0 2400008902010201 0 995 995 995 995 None 995 995 1 2 995 995 995 995 None 995 1236.8675 -1 -1 0 0 -1 0 0.4115587 1 2 24000089020102 247.1357 316.6946 NA NA NA NA NA NA 1 24000089 9 2 5
Table 41: Trip records with person fields.

If the dataset includes separate linked and unlinked trip tables, choose the trip table that matches the analysis question before joining. Linked trips are usually the better starting point for origin-destination, purpose, and whole-trip analyses. Unlinked trips are more appropriate when the question depends on leg detail, transfer behavior, or segment-level mode information.

If tours are included, they summarize groups of trips into larger travel patterns. Tour joins should be approached cautiously because the same household, person, day, or trip can appear in multiple downstream analytic datasets depending on the question being asked.

11.2 Join Cautions

Always choose the analytic unit before joining tables. A join can change the number of rows in an analysis dataset if the relationship between tables is one-to-many or many-to-many.

For example:

  • Joining household data to person data creates one row per person, not one row per household.
  • Joining person data to trip data creates one row per trip, not one row per person.
  • Joining trip data to location data may duplicate trip records if multiple locations are associated with one trip or household.
  • Joining lower-level tables back to higher-level tables can change the interpretation of counts and rates.
  • Shared travel fields can create the appearance of duplicated movements because one physical trip may be represented on multiple person-trip records.

Before calculating summaries, check that the resulting row count still matches the intended analytic unit.

trip_row_count_before <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  nrow()

person_join_fields <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(person_id, hh_id)

trip_with_person <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    person_join_fields,
    by = "person_id"
  )

trip_row_count_after <- nrow(trip_with_person)

row_count_check <- data.frame(
  metric = c("Before join", "After join", "Difference"),
  n = c(
    trip_row_count_before,
    trip_row_count_after,
    trip_row_count_after - trip_row_count_before
  ),
  stringsAsFactors = FALSE
)

Table 42 helps confirm that the join preserved the trip-level analytic unit.

Code
gt::gt(row_count_check) %>%
  gt::fmt_number(columns = n, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    metric = "Metric",
    n = "Rows"
  )
Metric Rows
Before join 411,573
After join 411,573
Difference 0
Table 42: Trip row counts before and after the join.

If the row count changes unexpectedly, review the join keys and confirm that the joined table has the intended unit of observation. A row-count check is often the quickest way to catch a mistaken join before it affects a summary table or chart.

For MassDOT, it is often worth checking both row preservation and universe preservation: after the join, confirm that the analysis file still contains only records from complete households unless the question intentionally includes incomplete households.

11.3 Joining Trip and Vehicle Tables

When the selected DUG trip table preserves household vehicle numbering in the detailed mode fields, reshape those mode fields to long format, extract the household vehicle number, and then join to the vehicle table.

mode_value_labels <- codebook$value_labels %>%
  dplyr::filter(
    grepl("^mode_[0-9]+$", variable),
    variable %in% trip_vehicle_mode_columns
  ) %>%
  dplyr::transmute(
    mode_num = variable,
    mode_value = as.character(value),
    mode_label = label
  )

vehicle_trips_long <- hts$trip_unlinked %>%
  dplyr::select(
    hh_id,
    person_id,
    day_id,
    trip_id,
    dplyr::all_of(trip_vehicle_mode_columns)
  ) %>%
  tidyr::pivot_longer(
    cols = dplyr::all_of(trip_vehicle_mode_columns),
    names_to = "mode_num",
    values_to = "mode_value"
  ) %>%
  dplyr::mutate(mode_value = as.character(mode_value)) %>%
  dplyr::left_join(
    mode_value_labels,
    by = c("mode_num", "mode_value")
  ) %>%
  dplyr::mutate(
    vehicle_num = ifelse(
      grepl("Household vehicle [0-9]+", mode_label),
      as.integer(stringr::str_extract(mode_label, "[0-9]+")),
      NA_integer_
    )
  ) %>%
  dplyr::left_join(
    hts$vehicle %>%
      dplyr::select(hh_id, vehicle_num, vehicle_id),
    by = c("hh_id", "vehicle_num")
  ) %>%
  dplyr::filter(!is.na(vehicle_id))

Table 43 shows the trip and vehicle records after the long-format reshape and join steps are complete.

Code
gt::gt(head(vehicle_trips_long)) %>%
  gt::tab_header(title = "Trip-to-Vehicle Linkage Preview")
Trip-to-Vehicle Linkage Preview
hh_id person_id day_id trip_id mode_num mode_value mode_label vehicle_num vehicle_id
24000089 2400008901 240000890101 2400008901001 mode_1 6 Household vehicle 1 1 2400008901
24000089 2400008901 240000890101 2400008901002 mode_1 6 Household vehicle 1 1 2400008901
24000089 2400008902 240000890201 2400008902001 mode_1 7 Household vehicle 2 2 2400008902
24000089 2400008902 240000890201 2400008902002 mode_1 7 Household vehicle 2 2 2400008902
24000089 2400008902 240000890201 2400008902003 mode_1 7 Household vehicle 2 2 2400008902
24000089 2400008902 240000890201 2400008902004 mode_1 7 Household vehicle 2 2 2400008902
Table 43: Trip-to-vehicle linkage preview.

12 Choosing the Right Analytic Unit

Section 6 describes the structure of households, persons, days, trips, linked trips, tours, and vehicles. This section shifts from structure to practice: how do you choose the correct analytic unit for the question you want to answer? Most analyses in an HTS fail not because of weighting errors but because the wrong table was chosen as the starting point. These examples assume the prepared tables are already available in hts.

Choosing the analytic unit is the first design decision in any analysis. The correct unit aligns with three things:

  1. Who or what is being measured? (a household, a person, a person-day, a trip, a tour, or a vehicle)
  2. What the variable conceptually describes (a household attribute, a person characteristic, a daily behavior, a movement, or a chain of movements)
  3. At what level the population is represented in sampling weights

Table 44 connects each analytic unit to its best use cases, without repeating definitions already given in the Dataset Overview.

Analytic Unit Starting Table Typical Use
Household hh Household characteristics and household-level summaries
Person person Demographics, employment, student status, and person-level summaries
Person-day day Daily behavior, zero-trip days, deliveries, and trip-rate denominators
Linked trip trip_linked Origin-destination, purpose, and whole-trip summaries
Unlinked trip trip_unlinked Leg-level mode detail, transfers, and segment-level summaries
Tour tour Tour-pattern analysis across linked travel chains
Vehicle vehicle Household fleet summaries and vehicle characteristics
Table 44: Prepared tables and common uses by analytic unit.

12.1 Household-Level Analyses

Use households as the analytic unit when the phenomenon is shared or decided collectively: income, vehicle fleet, home location, delivery behavior, household makeup, or whether a household has zero vehicles. Even if a household variable is influenced by individual people, the household is still the right level because sampling occurred at the household level.

Example question: “What is the average household income in the study area?”

12.2 Person-Level Analyses

Analyses about people, such as demographics, employment status, student status, or attitudinal questions, belong at the person level. Each person’s weight represents them in the population. Use day or trip tables only when the metric you want to measure exists at those levels.

Example question: “What percentage of people in the study area are employed?”

12.3 Day-Level Analyses

Use person-days when studying daily behavior: trip rates, telework frequency, deliveries, or analyses that depend on people who made zero trips. The day table keeps all sampled days in scope, not only days with trips.

Example question: “What is the average number of trips per person-day?”

12.4 Trip-Level Analyses

Most movement-based analyses start with trips. Linked trips are usually the better starting point for origin-destination, purpose, and whole-trip summaries. Unlinked trips are appropriate for leg-level mode detail, transfer behavior, or segment-specific metrics.

Example question: “What is the average trip distance for work trips?”

12.5 Tour-Level Analyses

Use tours when the analysis focuses on full activity patterns or concepts aligned with activity-based modeling: stop-making, work subtours, escorting, home-based versus non-home-based travel, or mode hierarchy across a chain of trips.

Example question: “What percentage of tours include a stop at a school?”

12.6 Vehicle-Level Analyses

The vehicle table is the correct unit for vehicle fleet summaries, EV prevalence, fuel-type distributions, household fleet size, or daily mileage when paired with trip data. Vehicles belong to households, so vehicle analyses usually rely on household weights.

Example question: “What is the average daily mileage for electric vehicles?”

13 Working with Variables

13.1 Categorical Response Data

The majority of data collected in an HTS are categorical variables, where respondents select from a predefined list of options. These appear as:

  1. Single-response categorical variables (SRCVs) where respondents select one option from a predefined list
  2. Multiple-response categorical variables (MRCVs) where respondents can select multiple options
  3. Grouped categorical variables, sometimes called “question batteries,” stored as sets of related indicator columns
  4. Count variables with top-coding where the highest category is open ended (e.g., “5 or more vehicles”)

General Considerations

When working with categorical response data, keep the following best practices in mind:

  • Start with the codebook. Confirm variable definitions, valid values, table membership, and skip logic there before opening the questionnaire for extra context.
  • Do not mistake missing for no. Only recode missing to zero when the respondent was logically not asked the question. For example, if a respondent was not asked about how long they teleworked on a given day because they are not employed, then it is appropriate to recode missing telework duration to zero. However, if a respondent was asked about telework duration but did not answer, then the missing value should be retained as missing rather than recoded to zero.

Single-Response Categorical Variables (SRCVs)

Single-response categorical variables are variables where respondents select one option from a predefined list. Examples include gender, employment status, or broad income category.

For example, the household income variable income_broad can be labeled using the codebook before it is summarized.

income_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "income_broad") %>%
  dplyr::transmute(
    income_code = value,
    income_key = as.character(value),
    income_broad_label = label
  )

hh_income <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::mutate(income_key = as.character(income_broad)) %>%
  dplyr::left_join(
    income_value_labels %>%
      dplyr::select(income_key, income_broad_label),
    by = "income_key"
  ) %>%
  dplyr::mutate(
    income_broad_label = factor(
      income_broad_label,
      levels = income_value_labels$income_broad_label
    )
  )

hh_income_counts <- hh_income %>%
  dplyr::group_by(income_broad_label) %>%
  dplyr::summarize(n = dplyr::n(), .groups = "drop")

After joining the value labels, group the labeled variable to produce the counts used in the final table. Table 45 shows the resulting counts of households by broad income category.

Code
gt::gt(hh_income_counts) %>%
  gt::fmt_number(columns = n, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    income_broad_label = "Household Income",
    n = "Households"
  ) %>%
  gt::tab_header(title = "Household Counts by Income Category")
Household Counts by Income Category
Household Income Households
Under $25,000 1,855
$25,000-$49,999 1,892
$50,000-$74,999 1,963
$75,000-$99,999 2,028
$100,000-$199,999 4,210
$200,000 or more 2,170
Prefer not to answer 1,523
Table 45: Household counts by income category.

Multiple-Response Categorical Variables (MRCVs)

Multiple-response variables are often delivered as groups of checkbox columns. When checkbox-style variables are present, reshaping them to long format is usually the clearest way to count selections and label the results.

delivery_checkbox_regex <- "^delivery_"

delivery_variable_list <- codebook$variable_list %>%
  dplyr::filter(
    day == 1,
    is_checkbox == 1,
    stringr::str_detect(variable, delivery_checkbox_regex)
  ) %>%
  dplyr::select(variable, description)

delivery_none_of_above <- "delivery_996"

delivery_variables <- delivery_variable_list %>%
  dplyr::filter(variable != delivery_none_of_above) %>%
  dplyr::pull(variable)

delivery_descriptions <- delivery_variable_list %>%
  dplyr::filter(variable %in% delivery_variables)

delivery_long <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(day_id, dplyr::all_of(delivery_variables)) %>%
  tidyr::pivot_longer(
    cols = dplyr::all_of(delivery_variables),
    names_to = "variable",
    values_to = "selected"
  ) %>%
  dplyr::filter(selected == 1) %>%
  dplyr::left_join(
    delivery_descriptions,
    by = "variable"
  ) %>%
  dplyr::count(description, name = "n_days") %>%
  dplyr::arrange(dplyr::desc(n_days))

In MassDOT, these delivery questions are stored on the day table as checkbox columns named delivery_*. This example excludes delivery_996, which is the codebook field for “None of the above,” so the table focuses on positive delivery types.

Table 46 makes it easier to review how often each delivery type was selected across reported person-days.

Code
gt::gt(delivery_long) %>%
  gt::fmt_number(columns = n_days, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    description = "Delivery Variable",
    n_days = "Days"
  ) %>%
  gt::tab_header(title = "Delivery Checkboxes Selected Across Days")
Delivery Checkboxes Selected Across Days
Delivery Variable Days
Type of delivery: Received packages at home (e.g., USPS, FedEx, UPS) 20,759
Type of delivery: Take-out/prepared food delivered to home 3,351
Type of delivery: Someone came to do work at home (e.g., babysitter, housecleaning, lawn) 1,995
Type of delivery: Groceries delivered to home 1,588
Type of delivery: Received packages at another location (e.g., Amazon Locker, package pick-up point) 1,012
Type of delivery: Other item delivered to home (e.g., appliance) 316
Type of delivery: Received personal packages at work 248
Table 46: Delivery checkbox counts.

Missing Categorical Data

Missing categorical data should be handled explicitly rather than silently dropped. The codebook labels are often the quickest way to confirm whether a value represents a valid response category, an inapplicable record, or a nonresponse code.

gender_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "gender") %>%
  dplyr::transmute(
    gender_key = as.character(value),
    label
  )

gender_counts <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::count(gender, name = "n") %>%
  dplyr::mutate(gender_key = as.character(gender)) %>%
  dplyr::left_join(
    gender_value_labels,
    by = "gender_key"
  ) %>%
  dplyr::select(-gender_key) %>%
  dplyr::arrange(gender)

Table 47 helps distinguish valid response codes from missing or special-case values.

Code
gt::gt(gender_counts) %>%
  gt::fmt_number(columns = n, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    gender = "Gender Code",
    n = "Persons",
    label = "Gender Label"
  ) %>%
  gt::tab_header(title = "Gender Counts with Codebook Labels")
Gender Counts with Codebook Labels
Gender Code Persons Gender Label
1 15,253 Female
2 13,165 Male
4 273 Non-binary
995 1,552 Missing Response
997 71 Other/prefer to self-describe
999 941 Prefer not to answer
Table 47: Gender counts with labels.

Count Variables with Top-Coding

Variables such as num_vehicles are often stored as integer-coded categories rather than unconstrained numeric counts. Treat them as categories unless the codebook clearly indicates that they are true numeric measures.

vehicle_count_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "num_vehicles") %>%
  dplyr::transmute(
    vehicle_count_code = value,
    vehicle_count_key = as.character(value),
    num_vehicles_label = label
  )

vehicle_count_distribution <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::transmute(vehicle_count_code = num_vehicles) %>%
  dplyr::count(vehicle_count_code, name = "n_households") %>%
  dplyr::mutate(vehicle_count_key = as.character(vehicle_count_code)) %>%
  dplyr::left_join(
    vehicle_count_labels %>%
      dplyr::select(vehicle_count_key, num_vehicles_label),
    by = "vehicle_count_key"
  ) %>%
  dplyr::select(-vehicle_count_key) %>%
  dplyr::arrange(vehicle_count_code)

Table 48 treats the coded vehicle-count variable as a set of labeled categories rather than a continuous measure.

Code
gt::gt(vehicle_count_distribution) %>%
  gt::fmt_number(columns = n_households, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    vehicle_count_code = "Vehicles",
    n_households = "Households",
    num_vehicles_label = "Vehicle Count Label"
  ) %>%
  gt::tab_header(title = "Household Vehicle Count Distribution")
Household Vehicle Count Distribution
Vehicles Households Vehicle Count Label
0 2,161 0 (no vehicles in my household)
1 7,122 1 vehicle
2 4,925 2 vehicles
3 1,066 3 vehicles
4 274 4 vehicles
5 68 5 vehicles
6 16 6 vehicles
7 4 7 vehicles
8 5 8 or more vehicles
Table 48: Household vehicle count categories.

13.2 Numeric Variables

The HTS dataset contains several numeric variables, such as trip distances, durations, and speeds. These variables can include extreme values or outliers that affect analysis results, so it is good practice to inspect definitions and distributions before calculating summaries.

For most analytic examples in this chapter, the code filters to complete households first. If your goal is delivered-data quality assurance rather than population analysis, you may choose a broader universe.

Consult the Codebook First

Before analyzing any numeric variable, verify its meaning and units from the codebook. Many errors stem from assuming what a variable represents rather than confirming it.

distance_variables <- codebook$variable_list %>%
  dplyr::filter(
    stringr::str_detect(variable, "distance"),
    data_type %in% c("integer", "numeric")
  ) %>%
  dplyr::select(variable, description)

Review Table 49 before choosing which distance field to summarize.

Code
gt::gt(distance_variables) %>%
  gt::tab_header(title = "Distance Variables in the Codebook")
Distance Variables in the Codebook
variable description
distance_meters Distance (meters)
distance_beeline Beeline distance (meters)
distance_miles Distance (miles)
home_distance Trip distance from home (miles)
Table 49: Distance variables in the codebook.

Inspect the Data

Before calculating any metric:

  • check for missing values
  • generate a quick summary()
  • inspect the distribution with a histogram or boxplot
  • review the minimum and maximum for plausibility

Start with a direct summary of the trip-distance field to understand its central tendency and range.

summary(
  hts$trip_unlinked %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(distance_miles)
)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
#>     0.000     0.680     2.225     8.577     6.004 12348.275     11747

Then visualize the distribution in Figure 10 so you can see the shape of the variable and the upper tail.

Code
trip_distance_plot_data <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(!is.na(distance_miles), distance_miles > 0)

trip_distance_histogram <- ggplot2::ggplot(
  trip_distance_plot_data,
  ggplot2::aes(x = distance_miles)
) +
  ggplot2::geom_histogram(bins = 40, fill = "#17384e", color = "white") +
  ggplot2::scale_x_log10(
    labels = scales::label_number(big.mark = ",")
  ) +
  ggplot2::labs(
    title = "Distribution of Trip Distance",
    subtitle = "Trip-distance histogram with a log-scaled distance axis",
    x = "Distance (miles, log scale)",
    y = "Trips"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    panel.grid.minor = ggplot2::element_blank()
  )

trip_distance_histogram
Figure 10: Trip distance distribution.

Handle Outliers

Outlier handling depends on the study context and the variable being analyzed. In many cases, the first step is simply to identify the upper tail before deciding whether trimming, filtering, or a different summary statistic is appropriate.

trip_distance_quantiles <- stats::quantile(
  hts$trip_unlinked %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(distance_miles),
  probs = c(0.5, 0.9, 0.95, 0.99),
  na.rm = TRUE
)

trip_distance_summary <- data.frame(
  statistic = names(trip_distance_quantiles),
  miles = as.numeric(trip_distance_quantiles),
  stringsAsFactors = FALSE
)

long_trips <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(distance_miles > trip_distance_quantiles[[4]]) %>%
  dplyr::transmute(
    trip_id,
    household_id = hh_id,
    person_id,
    distance_miles,
    duration_minutes
  )

Start by reviewing Table 50 to locate the upper tail of the distance distribution.

Code
gt::gt(trip_distance_summary) %>%
  gt::fmt_number(columns = miles, decimals = 2) %>%
  gt::cols_label(
    statistic = "Statistic",
    miles = "Miles"
  ) %>%
  gt::tab_header(title = "Trip Distance Quantiles")
Trip Distance Quantiles
Statistic Miles
50% 2.22
90% 15.01
95% 24.69
99% 69.37
Table 50: Trip distance quantiles.

Then inspect Table 51 for a small sample of trips above the 99th percentile to see which records deserve follow-up.

Code
gt::gt(dplyr::slice_head(long_trips, n = 10)) %>%
  gt::fmt_number(
    columns = c(distance_miles, duration_minutes),
    decimals = 2
  ) %>%
  gt::tab_header(title = "Trips Above the 99th Percentile of Distance")
Trips Above the 99th Percentile of Distance
trip_id household_id person_id distance_miles duration_minutes
2400322201024 24003222 2400322201 87.77 216.00
2400322201026 24003222 2400322201 90.21 81.00
2400322202009 24003222 2400322202 90.21 81.00
2400346501001 24003465 2400346501 106.80 140.00
2400361001075 24003610 2400361001 73.16 79.00
2400453801032 24004538 2400453801 74.66 89.00
2400478201047 24004782 2400478201 1,189.56 243.00
2400478201048 24004782 2400478201 84.40 83.00
2400478201053 24004782 2400478201 1,129.97 0.00
2400478202036 24004782 2400478202 1,189.56 243.00
Table 51: Trips above the 99th percentile.

Example: Trip Speeds by Mode

Speed summaries are a good example of why both labels and outlier checks matter.

mode_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "mode") %>%
  dplyr::transmute(
    mode_code = value,
    mode_key = as.character(value),
    mode_label = label
  )

speed_by_mode <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(!is.na(speed_mph), speed_mph >= 0) %>%
  dplyr::mutate(mode_key = as.character(mode_type)) %>%
  dplyr::left_join(
    mode_value_labels %>%
      dplyr::select(mode_key, mode_label),
    by = "mode_key"
  ) %>%
  dplyr::group_by(mode_label) %>%
  dplyr::summarize(
    mean_speed_mph = mean(speed_mph, na.rm = TRUE),
    median_speed_mph = stats::median(speed_mph, na.rm = TRUE),
    n_trips = dplyr::n(),
    .groups = "drop"
  ) %>%
  dplyr::arrange(dplyr::desc(mean_speed_mph))

Table 52 combines codebook labels with simple speed summaries so that mode-level differences are easy to review.

Code
gt::gt(speed_by_mode) %>%
  gt::fmt_number(
    columns = c(mean_speed_mph, median_speed_mph),
    decimals = 1
  ) %>%
  gt::fmt_number(columns = n_trips, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    mode_label = "Mode",
    mean_speed_mph = "Mean Speed (mph)",
    median_speed_mph = "Median Speed (mph)",
    n_trips = "Trips"
  ) %>%
  gt::tab_header(title = "Trip Speeds by Mode")
Trip Speeds by Mode
Mode Mean Speed (mph) Median Speed (mph) Trips
Long Distance Passenger 92,224.8 260.0 926
Shuttle / Vanpool 2,335.7 9.6 939
Car share 1,325.1 13.5 372
Missing Response 694.8 11.9 15,716
Ferry 649.5 12.3 265
Walk 385.4 2.5 84,494
Other 175.1 9.7 2,739
Car 130.8 19.7 264,558
TNC 111.9 13.5 3,337
Bike share 101.6 6.1 785
Transit 97.0 8.9 17,443
Bike 66.2 8.4 6,602
Taxi 40.7 12.6 325
School bus 11.6 7.9 1,319
Scooter share 6.2 6.2 6
Table 52: Trip speeds by mode.

13.3 Date and Time Data

Trip departure and arrival times are stored as a set of split date and time components rather than a single datetime object. This structure avoids timezone conversion issues when importing data across different software environments and makes it straightforward to work with time-of-day components (e.g., filtering by hour) without parsing a full timestamp.

Time Variables on the Trip Table

The following fields together define when each trip departed and arrived:

Variable Type Description
travel_date Date (YYYY-MM-DD) Diary-day date carried on the trip record.
depart_date Date (YYYY-MM-DD) Calendar date of departure.
depart_hour Integer (0-23) Hour of departure in 24-hour time, local to the study area.
depart_minute Integer (0-59) Minute of departure.
depart_seconds Integer (0-59) Second of departure.
arrive_date Date (YYYY-MM-DD) Calendar date of arrival. Will differ from depart_date for trips crossing midnight.
arrive_hour Integer (0-23) Hour of arrival in 24-hour time.
arrive_minute Integer (0-59) Minute of arrival.
arrive_second Integer (0-59) Second of arrival.

Timezone. When reconstructing timestamps in this guide, use the study timezone from settings.yml, which is America/New_York for this MassDOT delivery.

Travel day boundary. In the prepared MassDOT trip table, trips departing before 3:00 AM are attached to the previous diary day. For those records, travel_date reflects the diary day while depart_date reflects the wall-clock calendar date.

Cross-midnight trips. For trips that depart before midnight and arrive after midnight, arrive_date will be one calendar day later than depart_date. When reconstructing durations, always use the full datetime (date + time) rather than differencing hours alone.

Reconstructing Timestamps in R

When you need a full POSIXct timestamp, for example to calculate duration, filter by time window, or plot a time series, recombine the split fields explicitly using lubridate:

trip_with_datetime <- hts$trip_unlinked %>%
  dplyr::mutate(
    depart_datetime = lubridate::ymd_hms(
      paste(
        depart_date,
        sprintf("%02d:%02d:%02d", depart_hour, depart_minute, depart_seconds)
      ),
      tz = study_timezone
    ),
    arrive_datetime = lubridate::ymd_hms(
      paste(
        arrive_date,
        sprintf("%02d:%02d:%02d", arrive_hour, arrive_minute, arrive_second)
      ),
      tz = study_timezone
    )
  )

The preview below shows the reconstructed departure and arrival timestamps for the first few trip records.

Code
gt::gt(
  trip_with_datetime %>%
    dplyr::select(trip_id, depart_datetime, arrive_datetime) %>%
    utils::head()
) %>%
  gt::tab_header(title = "Reconstructed Trip Timestamps")
Table 53: Reconstructed trip timestamps.

Common Pitfalls

  • Using depart_hour alone for peak-period analysis is fine for most purposes, but for trips near the hour boundary (e.g., 7:58 AM), use the full datetime if precision matters.
  • Differencing arrive_hour - depart_hour will produce incorrect (negative) durations for cross-midnight trips. Always difference full datetimes or use the pre-calculated duration_minutes field instead.

14 Weights and Inference

14.1 Getting Started

Weights exist so that the sample can represent the target population. In household travel surveys, analysts usually work with household, person, day, trip, and sometimes tour weights. The correct weight depends on the final analytic unit, not just the first table that was opened.

For analyses that compare travel behavior across specific weekdays, use the day-of-week workflow described in Section 16. The standard household, person, day, and trip weights remain the default for overall average-day estimates.

For MassDOT, most weighted estimates should also be restricted to complete households. The examples below define that universe from hts$hh$is_complete and then carry it into lower-level files through hh_id.

Choosing the Right Weight

Begin by matching each analytic unit to the starting table and weight column used in the prepared data.

weight_lookup <- data.frame(
  analytic_unit = character(),
  starting_table = character(),
  weight_variable = character(),
  stringsAsFactors = FALSE
)

weight_lookup <- rbind(
  weight_lookup,
  data.frame(
    analytic_unit = "Household",
    starting_table = "hh",
    weight_variable = "hh_weight"
  )
)

weight_lookup <- rbind(
  weight_lookup,
  data.frame(
    analytic_unit = "Person",
    starting_table = "person",
    weight_variable = "person_weight"
  )
)

weight_lookup <- rbind(
  weight_lookup,
  data.frame(
    analytic_unit = "Person-day",
    starting_table = "day",
    weight_variable = "day_weight"
  )
)

if ("trip_linked" %in% names(hts)) {
  weight_lookup <- rbind(
    weight_lookup,
    data.frame(
      analytic_unit = "Linked trip",
      starting_table = "trip_linked",
      weight_variable = "trip_weight"
    )
  )
}

if ("trip_unlinked" %in% names(hts)) {
  weight_lookup <- rbind(
    weight_lookup,
    data.frame(
      analytic_unit = "Unlinked trip",
      starting_table = "trip_unlinked",
      weight_variable = "trip_weight"
    )
  )
}

if ("tour" %in% names(hts) && "tour_weight" %in% names(hts$tour)) {
  weight_lookup <- rbind(
    weight_lookup,
    data.frame(
      analytic_unit = "Tour",
      starting_table = "tour",
      weight_variable = "tour_weight"
    )
  )
}

Use Table 54 to confirm the correct weight before building any weighted estimate.

Code
gt::gt(weight_lookup) %>%
  gt::cols_label(
    analytic_unit = "Analytic Unit",
    starting_table = "Starting Table",
    weight_variable = "Weight Variable"
  ) %>%
  gt::tab_header(title = "Recommended Weight by Analytic Unit")
Recommended Weight by Analytic Unit
Analytic Unit Starting Table Weight Variable
Household hh hh_weight
Person person person_weight
Person-day day day_weight
Linked trip trip_linked trip_weight
Unlinked trip trip_unlinked trip_weight
Tour tour tour_weight
Table 54: Recommended weights by analytic unit.

Calculating Simple Weighted Estimates

Before moving to design-aware inference, it is useful to confirm the weighted numerator and denominator directly.

zero_vehicle_share <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::summarize(
    weighted_zero_vehicle_households = sum(
      hh_weight * (num_vehicles == 0),
      na.rm = TRUE
    ),
    weighted_households = sum(hh_weight, na.rm = TRUE),
    share_zero_vehicle = weighted_zero_vehicle_households / weighted_households
  )

Table 55 shows the numerator, denominator, and resulting share in one table.

Code
gt::gt(zero_vehicle_share) %>%
  gt::fmt_number(
    columns = c(weighted_zero_vehicle_households, weighted_households),
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::fmt_percent(columns = share_zero_vehicle, decimals = 1) %>%
  gt::tab_header(title = "Simple Weighted Share of Zero-Vehicle Households")
Simple Weighted Share of Zero-Vehicle Households
weighted_zero_vehicle_households weighted_households share_zero_vehicle
325,071 2,814,595 11.5%
Table 55: Weighted zero-vehicle household share.

14.2 Survey-Aware Methods for Inference

Simple weighted proportions are often enough for descriptive summaries, but they are not enough when you need valid standard errors, confidence intervals, or hypothesis tests. Those cases require a survey design object that respects clustering, stratification, and weights.

When Do You Need Survey-Aware Methods?

Use survey-aware methods when:

  • reporting standard errors or confidence intervals
  • comparing estimates across groups
  • fitting regression models
  • working with small subgroups where design effects matter
  • estimating totals, means, or proportions for publication or external reporting

Specifying the Survey Design

In Massachusetts Travel Study, the household is the primary sampling unit (PSU; see Section 2.1.2). Even when analyzing person-, day-, or trip-level records, observations remain clustered within households.

The examples below use:

  • hh_id as the PSU
  • the weight that matches the analytic unit
  • sample_segment as the design strata

Start by defining a household-level survey design object with the fields needed for the analysis.

hh_design <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::transmute(
    hh_id,
    sample_segment,
    analysis_weight = hh_weight,
    vehicle_count = num_vehicles
  ) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(analysis_weight),
    analysis_weight > 0
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = analysis_weight
  )

For trip- or day-level analysis, join the design fields needed for clustering, stratification, and weights before building the survey object.

trip_design <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  dplyr::mutate(analysis_weight = trip_weight) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(analysis_weight),
    analysis_weight > 0
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = analysis_weight
  )

Set the lonely-PSU handling before running summaries that request standard errors or confidence intervals.

options(
  survey.lonely.psu = "adjust",
  srvyr.lonely.psu = "adjust"
)

Using the Survey Design for Weighted Estimates

Once the design object is defined, use survey_mean(), survey_total(), or survey_prop() instead of manually calculating standard errors.

hh_vehicle_summary <- hh_design %>%
  dplyr::group_by(vehicle_count) %>%
  dplyr::summarize(
    weighted_households = srvyr::survey_total(vartype = c("se", "ci")),
    weighted_share = srvyr::survey_prop(vartype = c("se", "ci")),
    .groups = "drop"
  )

Table 56 shows the weighted totals and shares with their uncertainty measures.

Code
gt::gt(hh_vehicle_summary) %>%
  gt::fmt_number(
    columns = c(weighted_households, weighted_households_se, weighted_households_low, weighted_households_upp),
    decimals = 1,
    sep_mark = ","
  ) %>%
  gt::fmt_percent(
    columns = c(weighted_share, weighted_share_se, weighted_share_low, weighted_share_upp),
    decimals = 1
  ) %>%
  gt::tab_header(title = "Weighted Household Vehicle Summary")
Weighted Household Vehicle Summary
vehicle_count weighted_households weighted_households_se weighted_households_low weighted_households_upp weighted_share weighted_share_se weighted_share_low weighted_share_upp
0 325,070.5 10,284.6 304,911.6 345,229.5 11.5% 0.4% 10.9% 12.3%
1 1,096,480.0 17,548.7 1,062,082.5 1,130,877.4 39.0% 0.6% 37.8% 40.1%
2 1,002,864.0 20,037.9 963,587.3 1,042,140.6 35.6% 0.6% 34.5% 36.8%
3 275,203.7 12,361.9 250,972.9 299,434.5 9.8% 0.4% 9.0% 10.6%
4 84,670.2 7,340.9 70,281.2 99,059.3 3.0% 0.3% 2.5% 3.6%
5 22,895.8 3,708.6 15,626.5 30,165.1 0.8% 0.1% 0.6% 1.1%
6 5,609.9 2,114.4 1,465.5 9,754.3 0.2% 0.1% 0.1% 0.4%
7 1,166.8 848.5 −496.3 2,829.8 0.0% 0.0% 0.0% 0.2%
8 634.5 343.3 −38.5 1,307.5 0.0% 0.0% 0.0% 0.1%
Table 56: Weighted household vehicle summary.

Filtering Data vs. Filtering the Survey Design

Filtering records before defining the survey design is not always the same as defining the survey design first and then subsetting it. The difference matters most when a subgroup should remain part of the original design rather than being treated as a new standalone sample.

For example, an adult-only trip analysis should define adulthood from the person table and then carry that flag into the trip-level survey design.

adult_age_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "age") %>%
  dplyr::transmute(
    age_key = as.character(value),
    age_label = sub("^Age\\s+", "", label)
  ) %>%
  dplyr::mutate(
    age_label = ifelse(age_label == "85 up", "85 or older", age_label)
  )

adult_trip_design <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::transmute(
        person_id,
        age_key = as.character(age)
      ) %>%
      dplyr::left_join(
        adult_age_labels,
        by = "age_key"
      ),
    by = "person_id"
  ) %>%
  dplyr::filter(
    age_label %in% c(
      "18-24",
      "25-34",
      "35-44",
      "45-54",
      "55-64",
      "65-74",
      "75-84",
      "85 or older"
    ),
    !is.na(trip_weight),
    trip_weight > 0
  ) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = trip_weight
  )

Calculating Estimate Reliability (RSE)

One simple reliability check is to compare the estimate to its own standard error using a relative standard error (RSE).

trip_mode_rse <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  dplyr::mutate(mode_type = as.character(mode_type)) %>%
  dplyr::left_join(
    codebook$value_labels %>%
      dplyr::filter(variable == "mode") %>%
      dplyr::transmute(
        analysis_mode_key = as.character(value),
        analysis_mode = label
      ),
    by = c("mode_type" = "analysis_mode_key")
  ) %>%
  dplyr::mutate(
    analysis_weight = trip_weight
  ) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(analysis_weight),
    !is.na(analysis_mode),
    analysis_weight > 0
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = analysis_weight
  ) %>%
  dplyr::group_by(analysis_mode) %>%
  dplyr::summarize(
    share = srvyr::survey_prop(vartype = "se"),
    .groups = "drop"
  ) %>%
  dplyr::mutate(
    rse = ifelse(share > 0, share_se / share, NA_real_)
  )

Table 57 reports each estimate beside its standard error and relative standard error.

Code
gt::gt(trip_mode_rse) %>%
  gt::fmt_percent(columns = c(share, share_se, rse), decimals = 1) %>%
  gt::cols_label(
    analysis_mode = "Trip Mode",
    share = "Share",
    share_se = "Std. Error",
    rse = "RSE"
  ) %>%
  gt::tab_header(title = "Relative Standard Errors for Trip Mode Shares")
Relative Standard Errors for Trip Mode Shares
Trip Mode Share Std. Error RSE
Bike 1.2% 0.1% 8.7%
Bike share 0.1% 0.0% 31.7%
Car 73.3% 0.5% 0.7%
Car share 0.1% 0.0% 40.3%
Ferry 0.1% 0.0% 26.8%
Long Distance Passenger 0.2% 0.0% 12.7%
Missing Response 2.3% 0.1% 4.8%
Other 0.9% 0.1% 9.4%
School bus 1.3% 0.1% 7.1%
Scooter share 0.0% 0.0% 49.7%
Shuttle / Vanpool 0.3% 0.0% 15.0%
TNC 0.9% 0.1% 10.4%
Taxi 0.2% 0.0% 19.0%
Transit 2.8% 0.1% 3.9%
Walk 16.4% 0.4% 2.4%
Table 57: Relative standard errors by trip mode.

Working with Small Sample Sizes

When estimates are unstable:

  1. Broaden the reporting domain.
  2. Collapse sparse categories where it is substantively reasonable.
  3. Keep zero-valued days or households in the denominator when they are part of the analytic universe.
  4. Consider model-based approaches rather than repeated subgroup slicing.
  5. Report uncertainty clearly instead of presenting small-cell estimates as precise.

15 Common Travel Metrics

15.1 Mode Share

What Is Mode Share?

Mode share describes the proportion of trips made by each travel mode out of all trips in the dataset. It is one of the most commonly reported metrics in travel demand analysis, used to understand how people get around, evaluate transportation investments, set policy targets, and calibrate travel models. For MassDOT, mode share estimates from this survey provide a baseline picture of travel behavior across the Commonwealth.

Example: Mode Share by Trip Geography

In the Massachusetts Travel Study dataset, trips that stay entirely within the study region are much shorter on average than trips with at least one end outside the region. Mixing these two populations in a single mode-share calculation can obscure differences that matter for regional planning.

The trip table includes two binary flags to separate these populations:

Trip Type o_in_region d_in_region
Fully within region 1 1
Leaving region 1 0
Entering region 0 1
Fully outside region 0 0

Fully outside-region trips are uncommon. For most regional planning applications, fully within-region trips are the primary analytic population, with cross-boundary trips treated separately.

Start by pulling mode labels from the codebook to use in the final output. In the Massachusetts Travel Study dataset, the trip column is mode_type, but the corresponding value labels are stored under mode in the codebook.

mode_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "mode") %>%
  dplyr::transmute(
    mode_key  = as.character(value),
    mode_label = label
  )

Next, classify each trip by its geographic pattern and join the mode labels.

trips_classified <- hts[[default_trip_table_name]] %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(!is.na(o_in_region), !is.na(d_in_region)) %>%
  dplyr::mutate(
    trip_geography = dplyr::case_when(
      o_in_region == 1 & d_in_region == 1 ~ "Within region",
      TRUE                                 ~ "At least one end outside region"
    ),
    mode_key = as.character(mode_type)
  ) %>%
  dplyr::left_join(mode_value_labels, by = "mode_key") %>%
  dplyr::filter(!is.na(mode_label), !is.na(trip_weight), trip_weight > 0)

Then calculate weighted trip counts and mode shares within each geographic group.

mode_share_by_region <- trips_classified %>%
  dplyr::group_by(trip_geography, mode_label) %>%
  dplyr::summarize(
    wtd_trips = sum(trip_weight, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::group_by(trip_geography) %>%
  dplyr::mutate(mode_share = wtd_trips / sum(wtd_trips)) %>%
  dplyr::ungroup() %>%
  dplyr::arrange(trip_geography, dplyr::desc(mode_share))

The table below shows how mode choice differs between trips that stay within the region and those that cross its boundary.

Code
gt::gt(mode_share_by_region) %>%
  gt::fmt_percent(columns = mode_share, decimals = 1) %>%
  gt::fmt_number(columns = wtd_trips, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    trip_geography = "Trip Geography",
    mode_label     = "Mode",
    wtd_trips      = "Weighted Trips",
    mode_share     = "Mode Share"
  ) %>%
  gt::tab_header(title = "Mode Share by Trip Geography") %>%
  gt::opt_row_striping()
Mode Share by Trip Geography
Trip Geography Mode Weighted Trips Mode Share
At least one end outside region Car 1,044,672 77.5%
At least one end outside region Walk 149,343 11.1%
At least one end outside region Long Distance Passenger 48,373 3.6%
At least one end outside region TNC 27,346 2.0%
At least one end outside region Transit 25,895 1.9%
At least one end outside region Other 14,578 1.1%
At least one end outside region Missing Response 12,135 0.9%
At least one end outside region Shuttle / Vanpool 7,982 0.6%
At least one end outside region Taxi 7,355 0.5%
At least one end outside region Bike 5,215 0.4%
At least one end outside region School bus 3,558 0.3%
At least one end outside region Ferry 741 0.1%
At least one end outside region Car share 363 0.0%
At least one end outside region Scooter share 31 0.0%
At least one end outside region Bike share 4 0.0%
Within region Car 21,007,329 73.1%
Within region Walk 4,769,101 16.6%
Within region Transit 822,747 2.9%
Within region Missing Response 685,980 2.4%
Within region School bus 394,266 1.4%
Within region Bike 341,726 1.2%
Within region Other 256,593 0.9%
Within region TNC 252,902 0.9%
Within region Shuttle / Vanpool 70,780 0.2%
Within region Taxi 46,813 0.2%
Within region Bike share 30,984 0.1%
Within region Car share 24,759 0.1%
Within region Ferry 22,638 0.1%
Within region Long Distance Passenger 3,005 0.0%
Within region Scooter share 1,452 0.0%
Table 58: Weighted mode share by trip geography.

Notes

  • NA flags indicate missing or unmatched coordinates; these records are excluded here and should be excluded from any analysis using o_in_region or d_in_region.
  • o_state / d_state provide finer detail for characterizing cross-border travel when you need more than a simple in-region / out-of-region split.
  • Reliability. Out-of-region trips are a smaller share of the sample, so their mode share estimates carry more uncertainty. Check reliability using the RSE approach in Section 14 before reporting.

15.2 Trip Rates

Understanding trip rates requires aligning the unit of analysis with the survey’s hierarchical structure and weighting design. This section introduces the recommended analytic units for trips and person-days, outlines how to calculate weighted trip rates correctly, and highlights key pitfalls to avoid.

For analyses that compare trip-making across specific weekdays, use the day-of-week workflow described in Section 16. The standard day and trip weights remain the default for overall average-day trip-rate estimates.

For MassDOT, these examples use complete households as the default analytic universe.

To calculate a weighted trip rate, divide the weighted count of trips by the weighted count of person-days. This approach keeps both travelers and non-travelers represented correctly.

When both linked and unlinked trip tables are available, linked trips are usually the better numerator for whole-trip rates. Unlinked trips are more appropriate when the analysis is explicitly about trip legs or segments.

Start by calculating the weighted numerator and denominator directly.

weighted_trip_rate <- sum(
  hts$trip_unlinked %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(trip_weight),
  na.rm = TRUE
) / sum(
  hts$day %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(day_weight),
  na.rm = TRUE
)

Table 59 reports the resulting weighted trip rate.

Code
gt::gt(data.frame(weighted_trip_rate = weighted_trip_rate)) %>%
  gt::fmt_number(columns = weighted_trip_rate, decimals = 2) %>%
  gt::tab_header(title = "Weighted Trip Rate")
Weighted Trip Rate
weighted_trip_rate
4.45
Table 59: Weighted trip rate.

Why the Denominator (Household, Person, Day) Weights Matter

Trip rates depend on both the number of trips recorded and the number of diary days those trips came from. Without day weights, respondents who provide more usable diary days can exert disproportionate influence, and zero-trip days can drop out of the denominator.

Why Trip Weights Matter

Trip weights expand recorded trips to population-level trip totals. A correct trip-rate calculation therefore uses:

  • trip weights in the numerator
  • household-, person-, or day-level weights in the denominator, depending on the metric

Why Zero-Travel Days Matter

Even after correcting for nonresponse and trip underreporting, people who did not travel on a given day remain part of the analytic universe. Excluding zero-trip days overstates trip rates because the denominator omits valid person-days with no travel.

Constructing a Person-Day Trip Rate Dataset

A typical workflow begins by aggregating trips to the day level, joining that summary back to the day table, and filling in zeros for days without travel.

weighted_trips <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::group_by(day_id) %>%
  dplyr::summarize(
    weighted_trips = sum(trip_weight, na.rm = TRUE),
    .groups = "drop"
  )

day_trip_rates <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    weighted_trips,
    by = "day_id"
  ) %>%
  dplyr::mutate(
    weighted_trips = ifelse(is.na(weighted_trips), 0, weighted_trips),
    weighted_trips_per_day = weighted_trips / day_weight
  )

Table 60 shows the day-level dataset after weighted trips have been joined back to the person-day denominator.

Code
gt::gt(dplyr::slice_head(day_trip_rates, n = 10)) %>%
  gt::fmt_number(
    columns = c(day_weight, weighted_trips, weighted_trips_per_day),
    decimals = 2
  ) %>%
  gt::tab_header(title = "Day-Level Weighted Trip Rates")
Day-Level Weighted Trip Rates
day_id person_id travel_date day_num travel_dow person_num surveyable is_participant hh_id travel_day hh_day_complete num_complete_trip_surveys num_trips is_complete hh_is_complete proxy_complete begin_day end_day school_daily telecommute_time made_travel num_reasons_no_travel attend_school_1 attend_school_2 attend_school_3 attend_school_998 attend_school_999 attend_school_no_1 attend_school_no_2 attend_school_no_4 attend_school_no_5 attend_school_no_997 attend_school_no_998 attend_school_no_999 congestion delivery_2 delivery_3 delivery_4 delivery_5 delivery_6 delivery_7 delivery_8 delivery_996 no_travel_1 no_travel_11 no_travel_12 no_travel_2 no_travel_3 no_travel_4 no_travel_5 no_travel_6 no_travel_7 no_travel_8 no_travel_9 no_travel_99 attend_school_no_3 daily_activity_pattern day_weight day_weight_tue day_weight_fri day_weight_mon day_weight_sat day_weight_sun day_weight_thu day_weight_wed weighted_trips weighted_trips_per_day
240000890101 2400008901 2024-06-11 1 2 1 1 1 24000089 1 1 2 2 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 1 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 2 157.43 187.88768 NA NA NA NA NA NA 494.27 3.14
240000890201 2400008902 2024-06-11 1 2 2 1 1 24000089 1 1 5 5 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 2 157.43 187.88768 NA NA NA NA NA NA 1,235.68 7.85
240001220101 2400012201 2024-06-11 1 2 1 1 1 24000122 1 1 5 7 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 995 995 995 995 995 995 995 995 995 995 995 995 995 1 25.84 59.87428 NA NA NA NA NA NA 180.86 7.00
240001220102 2400012201 2024-06-12 2 3 1 1 1 24000122 1 1 5 5 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 1 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 1 25.84 NA NA NA NA NA NA 58.57788 129.18 5.00
240001220103 2400012201 2024-06-13 3 4 1 1 1 24000122 1 1 5 5 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 995 995 995 995 995 995 995 995 995 995 995 995 995 2 25.84 NA NA NA NA NA 64.49532 NA 129.18 5.00
240001220104 2400012201 2024-06-14 4 5 1 1 1 24000122 1 1 2 2 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 995 995 995 995 995 995 995 995 995 995 995 995 995 3 NA NA 75.95139 NA NA NA NA NA 0.00 NA
240001220105 2400012201 2024-06-15 5 6 1 1 1 24000122 1 1 8 10 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 1 0 0 0 0 995 995 995 995 995 995 995 995 995 995 995 995 995 2 NA NA NA NA 76.29594 NA NA NA 0.00 NA
240001220106 2400012201 2024-06-16 6 7 1 1 1 24000122 1 1 6 6 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 995 995 995 995 995 995 995 995 995 995 995 995 995 2 NA NA NA NA NA 74.33908 NA NA 0.00 NA
240001220107 2400012201 2024-06-17 7 1 1 1 1 24000122 1 1 4 4 1 1 995 1 1 995 NA 995 0 995 995 995 995 995 995 995 995 995 995 995 995 995 0 0 0 0 0 0 0 1 995 995 995 995 995 995 995 995 995 995 995 995 995 2 25.84 NA NA 65.09989 NA NA NA NA 103.35 4.00
240001400101 2400014001 2024-06-14 1 5 1 1 1 24000140 1 1 2 2 1 1 995 1 1 995 390 995 0 995 995 995 995 995 995 995 995 995 995 995 995 2 0 0 0 0 0 0 0 1 995 995 995 995 995 995 995 995 995 995 995 995 995 2 NA NA 655.71500 NA NA NA NA NA 0.00 NA
Table 60: Day-level weighted trip rates.

15.3 Person-Miles Traveled (PMT) and Vehicle-Miles Traveled (VMT)

Analysis of person-miles and vehicle-miles traveled proceeds similarly to the analysis of trip rates, with some additional considerations for occupancy and drive-mode identification.

Calculating PMT

Because the trip table is a person-trip table, total person-miles traveled can be calculated by summing the product of trip distance and trip weight.

total_pmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::summarize(
    total_pmt = sum(
      distance_miles * trip_weight,
      na.rm = TRUE
    )
  )

Table 61 reports the weighted person-miles represented by the trip table.

Code
gt::gt(total_pmt) %>%
  gt::fmt_number(columns = total_pmt, decimals = 2) %>%
  gt::tab_header(title = "Total PMT")
Total PMT
total_pmt
294,281,801.73
Table 61: Total weighted PMT.

Calculating VMT

Vehicle-miles traveled require an occupancy adjustment. In the normalized fixtures, num_travelers is the most common starting point.

trip_mode_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "mode") %>%
  dplyr::transmute(
    mode_code = value,
    mode_key = as.character(value),
    mode_label = label
  )

total_vmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::mutate(
    mode_key = as.character(mode_type),
    occupancy = num_travelers
  ) %>%
  dplyr::left_join(
    trip_mode_labels %>%
      dplyr::select(mode_key, mode_label),
    by = "mode_key"
  ) %>%
  dplyr::filter(
    !is.na(occupancy),
    occupancy > 0,
    occupancy != 995,
    stringr::str_detect(
      mode_label,
      "Drive|Car|SOV|Hov|Motorcycle"
    )
  ) %>%
  dplyr::mutate(vmt = distance_miles / occupancy) %>%
  dplyr::summarize(
    total_vmt = sum(vmt * trip_weight, na.rm = TRUE)
  )

Table 62 reports the weighted VMT after filtering to drive-mode records and adjusting each trip by occupancy.

Code
gt::gt(total_vmt) %>%
  gt::fmt_number(columns = total_vmt, decimals = 2) %>%
  gt::tab_header(title = "Total VMT")
Total VMT
total_vmt
151,301,985.08
Table 62: Total weighted VMT.

Disaggregating PMT and VMT by Population Subgroups

To disaggregate PMT or VMT by population subgroup, aggregate the trip data to the day level first and then join the resulting day-level totals back to the day or household table.

day_trip_vmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::mutate(occupancy = num_travelers) %>%
  dplyr::filter(!is.na(occupancy), occupancy > 0, occupancy != 995) %>%
  dplyr::mutate(vmt = distance_miles / occupancy) %>%
  dplyr::group_by(day_id) %>%
  dplyr::summarize(
    total_wtd_vmt_on_day = sum(
      vmt * trip_weight,
      na.rm = TRUE
    ),
    .groups = "drop"
  )

day_trip_vmt <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::transmute(
    day_id,
    day_weight,
    household_id = hh_id
  ) %>%
  dplyr::left_join(
    day_trip_vmt,
    by = "day_id"
  ) %>%
  dplyr::mutate(
    total_wtd_vmt_on_day = ifelse(is.na(total_wtd_vmt_on_day), 0, total_wtd_vmt_on_day),
    wtd_vmt_per_day = total_wtd_vmt_on_day / day_weight
  )

Table 63 shows the day-level file that can be joined to other denominator tables for subgroup analysis.

Code
gt::gt(dplyr::slice_head(day_trip_vmt, n = 10)) %>%
  gt::fmt_number(
    columns = c(day_weight, total_wtd_vmt_on_day, wtd_vmt_per_day),
    decimals = 2
  ) %>%
  gt::tab_header(title = "Day-Level VMT Summary")
Day-Level VMT Summary
day_id day_weight household_id total_wtd_vmt_on_day wtd_vmt_per_day
240000890101 157.43 24000089 1,449.64 9.21
240000890201 157.43 24000089 973.44 6.18
240001220101 25.84 24000122 182.02 7.05
240001220102 25.84 24000122 90.33 3.50
240001220103 25.84 24000122 95.57 3.70
240001220104 NA 24000122 0.00 NA
240001220105 NA 24000122 0.00 NA
240001220106 NA 24000122 0.00 NA
240001220107 25.84 24000122 125.95 4.87
240001400101 NA 24000140 0.00 NA
Table 63: Day-level weighted VMT summary.

15.4 Identifying Work Days and Telework Status

MassDOT does not include a separate day-level work_time field, but it does include a day-level telework duration field. In the codebook, telecommute_time is defined as time spent teleworking on the travel day, so for this study it can be used directly as a diary-day telework measure rather than as a prior-week proxy.

To identify work days, join the day table to person employment status and then flag days with at least one work-purpose trip. For MassDOT, d_purpose_category == 2 is Work and d_purpose_category == 3 is Work related, so it is usually best to use code 2 for travel to a work location and keep Work related separate unless the analysis specifically needs broader job-related travel.

The preparation steps below build the worker-day file in a linear sequence so each analytic decision stays visible. This example focuses on respondents coded as employed full-time, employed part-time, or self-employed (employment values 1, 2, and 3).

Start by selecting the day-level telework field and joining employment status from the person table.

worker_day_status <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(day_id, hh_id, person_id, telecommute_time) %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::select(person_id, employment),
    by = "person_id"
  )

Then restrict the file to workers and add day-level work-trip flags from the default trip table.

worker_day_status <- worker_day_status %>%
  dplyr::filter(employment %in% c(1, 2, 3)) %>%
  dplyr::left_join(
    hts[[default_trip_table_name]] %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::group_by(day_id) %>%
      dplyr::summarize(
        has_work_trip = any(d_purpose_category == 2, na.rm = TRUE),
        has_work_related_trip = any(d_purpose_category == 3, na.rm = TRUE),
        .groups = "drop"
      ),
    by = "day_id"
  )

Next, derive the telework and work-day flags used in the classification table.

worker_day_status <- worker_day_status %>%
  dplyr::mutate(
    has_work_trip = ifelse(is.na(has_work_trip), FALSE, has_work_trip),
    has_work_related_trip = ifelse(is.na(has_work_related_trip), FALSE, has_work_related_trip),
    telework_min = telecommute_time,
    teleworked_any = dplyr::if_else(
      is.na(telework_min),
      NA,
      telework_min > 0
    ),
    telework_flag = dplyr::case_when(
      is.na(teleworked_any) ~ "Missing telework response",
      teleworked_any ~ "telecommute_time > 0",
      TRUE ~ "telecommute_time == 0"
    ),
    work_trip_flag = dplyr::if_else(
      has_work_trip,
      "Work trip present",
      "No work trip"
    ),
    work_day_type = dplyr::case_when(
      is.na(teleworked_any) & has_work_trip ~ "Missing telework response / work trip present",
      is.na(teleworked_any) & !has_work_trip ~ "Missing telework response / no work trip",
      teleworked_any & !has_work_trip ~ "Telework only",
      teleworked_any & has_work_trip ~ "Hybrid",
      !teleworked_any & has_work_trip ~ "In-person only",
      !teleworked_any & !has_work_trip ~ "Non-work day"
    )
  )

Finally, collapse the worker-day records into the cross-tab used in the handbook table.

worker_day_telework_crosstab <- worker_day_status %>%
  dplyr::count(telework_flag, work_trip_flag, name = "n_worker_days") %>%
  tidyr::pivot_wider(
    names_from = work_trip_flag,
    values_from = n_worker_days,
    values_fill = 0
  ) %>%
  dplyr::mutate(
    total = `No work trip` + `Work trip present`
  )

Table 64 shows the observed cross-tab for complete-household worker days in the prepared MassDOT data. These are unweighted record counts, so use them to understand coding patterns rather than as population estimates.

Code
gt::gt(worker_day_telework_crosstab) %>%
  gt::fmt_number(
    columns = c(`No work trip`, `Work trip present`, total),
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    telework_flag = "Telework Status",
    `No work trip` = "No Work Trip",
    `Work trip present` = "Work Trip Present",
    total = "Total"
  ) %>%
  gt::tab_header(title = "Worker-Day Telework and Work-Trip Cross-Tab")
Worker-Day Telework and Work-Trip Cross-Tab
Telework Status No Work Trip Work Trip Present Total
Missing telework response 1,088 199 1,287
telecommute_time == 0 20,225 13,171 33,396
telecommute_time > 0 16,434 5,690 22,124
Table 64: Telework minutes and work-trip presence across complete-household worker days.

This cross-tab maps cleanly to the common diary-day categories:

  • Telework only: telecommute_time > 0 and no work trip
  • Hybrid: telecommute_time > 0 and a work trip is present
  • In-person only: telecommute_time == 0 and a work trip is present
  • Non-work day: telecommute_time == 0 and no work trip

Keep missing telework responses separate rather than forcing them into 0. In the prepared MassDOT files, missing telecommute_time values are already stored as NA.

16 Day-of-Week Analysis

Use the alternate day-of-week weights when the question is explicitly about differences across Monday through Sunday. The standard weights remain the default for overall household, person, and average-day reporting; the alternate day-of-week weights are for day-specific person-day and trip analysis.

16.1 Trip Rates by Day of Week

The key pattern is:

  • reshape the weekday-specific day weights to long form
  • reshape the matching weekday-specific trip weights to long form
  • aggregate weighted trips to the day level
  • join those weighted trips back to the weighted person-day denominator
  • estimate weekday means with a survey design that uses the weekday-specific day weights

Use the prep step below to build the weekday-specific person-day analysis file before calculating the final estimates.

day_weight_lookup <- day_of_week_day_weight_columns %>%
  dplyr::select(weekday, weight_column)

trip_weight_lookup <- day_of_week_trip_weight_columns %>%
  dplyr::select(weekday, weight_column)

day_long <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  tidyr::pivot_longer(
    cols = day_weight_lookup$weight_column,
    names_to = "weight_column",
    values_to = "day_weight_dow"
  ) %>%
  dplyr::left_join(
    day_weight_lookup,
    by = "weight_column"
  ) %>%
  dplyr::mutate(
    weekday = factor(weekday, levels = day_of_week_weekday_order)
  ) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(day_weight_dow),
    day_weight_dow > 0
  )

trip_long <- hts[[day_of_week_trip_table_name]] %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  tidyr::pivot_longer(
    cols = trip_weight_lookup$weight_column,
    names_to = "weight_column",
    values_to = "trip_weight_dow"
  ) %>%
  dplyr::left_join(
    trip_weight_lookup,
    by = "weight_column"
  ) %>%
  dplyr::mutate(
    weekday = factor(weekday, levels = day_of_week_weekday_order)
  ) %>%
  dplyr::filter(
    !is.na(trip_weight_dow),
    trip_weight_dow > 0
  )

weighted_trips_by_day <- trip_long %>%
  dplyr::group_by(day_id, weekday) %>%
  dplyr::summarize(
    weighted_trips = sum(trip_weight_dow, na.rm = TRUE),
    .groups = "drop"
  )

day_trip_rates_dow <- day_long %>%
  dplyr::left_join(
    weighted_trips_by_day,
    by = c("day_id", "weekday")
  ) %>%
  dplyr::mutate(
    weighted_trips = ifelse(is.na(weighted_trips), 0, weighted_trips),
    wtd_trips_on_day = weighted_trips / day_weight_dow
  )

trip_rate_by_weekday <- day_trip_rates_dow %>%
  dplyr::filter(!is.na(wtd_trips_on_day)) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = day_weight_dow
  ) %>%
  dplyr::group_by(weekday) %>%
  dplyr::summarize(
    trip_rate = srvyr::survey_mean(wtd_trips_on_day, vartype = "ci"),
    .groups = "drop"
  )

Table 65 shows the weekday-specific trip-rate estimates after the long-format weights and day-level totals have been assembled.

Code
gt::gt(trip_rate_by_weekday) %>%
  gt::fmt_number(
    columns = c(trip_rate, trip_rate_low, trip_rate_upp),
    decimals = 2
  ) %>%
  gt::cols_label(
    weekday = "Weekday",
    trip_rate = "Trip Rate",
    trip_rate_low = "CI Low",
    trip_rate_upp = "CI High"
  ) %>%
  gt::tab_header(title = "Trip Rates by Day of Week")
Trip Rates by Day of Week
Weekday Trip Rate CI Low CI High
Monday 4.54 4.40 4.68
Tuesday 4.70 4.57 4.82
Wednesday 4.76 4.64 4.89
Thursday 4.74 4.62 4.87
Friday 4.98 4.82 5.13
Saturday 4.91 4.74 5.09
Sunday 4.07 3.92 4.23
Table 65: Trip rates by day of week.

This workflow keeps zero-trip days in the denominator, which is critical for valid person-day trip rates.

It also keeps the day-of-week estimates inside the complete-household analytic universe used for most MassDOT reporting.

16.2 Telework Rates by Day of Week

Use the same weekday-specific person-day design for telework participation. In the prepared MassDOT files, missing telecommute_time values are already stored as NA, and positive minutes indicate that some telework occurred on that day.

telework_rate_by_weekday <- day_long %>%
  dplyr::mutate(
    telework_min = telecommute_time,
    teleworked_any = dplyr::if_else(
      is.na(telework_min),
      NA,
      telework_min > 0
    )
  ) %>%
  dplyr::filter(!is.na(teleworked_any)) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = day_weight_dow
  ) %>%
  dplyr::group_by(weekday) %>%
  dplyr::summarize(
    telework_rate = srvyr::survey_mean(teleworked_any, vartype = "ci"),
    .groups = "drop"
  )

Table 66 reports the weekday-specific telework participation rates from the same day-level survey design.

Code
gt::gt(telework_rate_by_weekday) %>%
  gt::fmt_percent(
    columns = c(telework_rate, telework_rate_low, telework_rate_upp),
    decimals = 1
  ) %>%
  gt::cols_label(
    weekday = "Weekday",
    telework_rate = "Telework Rate",
    telework_rate_low = "CI Low",
    telework_rate_upp = "CI High"
  ) %>%
  gt::tab_header(title = "Telework Rates by Day of Week")
Telework Rates by Day of Week
Weekday Telework Rate CI Low CI High
Monday 43.5% 41.4% 45.6%
Tuesday 44.1% 42.2% 46.1%
Wednesday 43.9% 42.1% 45.8%
Thursday 43.2% 41.3% 45.1%
Friday 45.4% 43.1% 47.6%
Saturday 14.5% 12.7% 16.4%
Sunday 12.9% 11.1% 14.6%
Table 66: Telework rates by day of week.

When the goal is a single overall estimate for the study area, return to the standard average-day workflow in Section 15 and Section 14. Use the alternate day-of-week weights only when the day itself is part of the analytic question.

17 Advanced Analysis

17.1 From Description to Inference: Using Weighted Models

Simple weighted proportions, with accompanying standard errors or confidence intervals, are an excellent first tool for describing population patterns. However, there are many situations where weighted proportions alone are not sufficient for reliable inference. When subgroup sample sizes are small or design effects are large, analysts should use weighted multivariate models rather than relying solely on repeated subgroup tabulations.

Weighted models keep the full sample intact, improve statistical precision, and allow analysts to estimate the unique contribution of each factor while holding others constant. This approach avoids the instability that arises from slicing the data into many small subpopulations.

NoteUsing Survey Weights in Regression Models

Most analysts will work in R, Stata, SPSS, or SAS. Each platform provides dedicated tools for fitting regression models that correctly incorporate survey weights, clustering, and stratification. Across platforms, the key principle is the same: define the survey design once, then fit models using functions that respect the sampling structure to obtain valid, population-representative inferences.

Does Telework Reduce VMT?

One common example is a survey-weighted regression that estimates daily VMT as a function of telework status while controlling for household and person characteristics.

For MassDOT, the model example below begins from complete households so the day-level outcome and predictors reflect the same household-complete analytic universe used elsewhere in the guide.

Start by aggregating trip-level VMT to the diary-day level so the outcome matches the day-level telework measure.

day_trip_vmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::mutate(occupancy = num_travelers) %>%
  dplyr::filter(!is.na(occupancy), occupancy > 0, occupancy != 995) %>%
  dplyr::mutate(vmt = distance_miles / occupancy) %>%
  dplyr::group_by(day_id) %>%
  dplyr::summarize(
    total_wtd_vmt_on_day = sum(
      vmt * trip_weight,
      na.rm = TRUE
    ),
    .groups = "drop"
  )

day_trip_vmt <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::transmute(
    day_id,
    day_weight
  ) %>%
  dplyr::left_join(
    day_trip_vmt,
    by = "day_id"
  ) %>%
  dplyr::mutate(
    total_wtd_vmt_on_day = ifelse(is.na(total_wtd_vmt_on_day), 0, total_wtd_vmt_on_day),
    wtd_vmt_per_day = total_wtd_vmt_on_day / day_weight
  )

Next, assemble the model dataset by joining the day-level outcome to the household and person predictors used in the regression.

model_data <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(person_id, hh_id, day_id, day_weight, telecommute_time) %>%
  dplyr::left_join(
    day_trip_vmt %>%
      dplyr::select(day_id, wtd_vmt_per_day),
    by = "day_id"
  ) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::transmute(
        hh_id,
        sample_segment,
        num_vehicles,
        income_broad,
        income_key = as.character(income_broad)
      ),
    by = "hh_id"
  ) %>%
  dplyr::left_join(
    income_value_labels,
    by = "income_key"
  ) %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::select(person_id),
    by = "person_id"
  ) %>%
  dplyr::filter(day_weight > 0) %>%
  dplyr::mutate(
    telework_min = telecommute_time,
    telework_group = dplyr::case_when(
      is.na(telework_min) ~ "Missing",
      telework_min == 0 ~ "0 min",
      telework_min <= 120 ~ "1-120 min",
      telework_min <= 240 ~ "121-240 min",
      telework_min > 240 ~ "240+ min"
    ),
    telework_group = factor(
      telework_group,
      levels = c("0 min", "1-120 min", "121-240 min", "240+ min", "Missing")
    ),
    num_vehicles_group = factor(
      dplyr::case_when(
        num_vehicles %in% c(995, 999) ~ "Missing",
        num_vehicles >= 4 ~ "4+",
        TRUE ~ as.character(num_vehicles)
      ),
      levels = c("0", "1", "2", "3", "4+", "Missing")
    ),
    income_broad_label = factor(
      income_broad_label,
      levels = income_value_labels$income_broad_label
    )
  )

model_data <- model_data %>%
  dplyr::mutate(
    telework_group = droplevels(telework_group),
    num_vehicles_group = droplevels(num_vehicles_group),
    income_broad_label = droplevels(income_broad_label)
  )

If the analytic question also depends on age, a useful extension is to collapse the delivered age categories into broader groups before fitting the regression.

age_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "age") %>%
  dplyr::transmute(
    age_key = as.character(value),
    age_label = sub("^Age\\s+", "", label)
  ) %>%
  dplyr::mutate(
    age_label = ifelse(age_label == "85 up", "85 or older", age_label)
  )

model_data <- model_data %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::transmute(
        person_id,
        age_key = as.character(age)
      ) %>%
      dplyr::left_join(
        age_value_labels,
        by = "age_key"
      ),
    by = "person_id"
  ) %>%
  dplyr::mutate(
    age_group = dplyr::case_when(
      age_label %in% c("18-24", "25-34") ~ "18-34",
      age_label %in% c("35-44", "45-54") ~ "35-54",
      age_label %in% c("55-64", "65-74", "75-84", "85 or older") ~ "55+",
      TRUE ~ "Missing"
    ),
    age_group = factor(age_group, levels = c("18-34", "35-54", "55+", "Missing"))
  ) %>%
  dplyr::mutate(age_group = droplevels(age_group))

vmt_model_formula <- wtd_vmt_per_day ~ telework_group + num_vehicles_group + income_broad_label + age_group

Finally, define the survey design, fit the weighted model, and tidy the coefficients for display.

vmt_design <- model_data %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = day_weight
  )

vmt_model_formula <- wtd_vmt_per_day ~ telework_group + num_vehicles_group + income_broad_label

vmt_model <- survey::svyglm(
  vmt_model_formula,
  design = vmt_design
)

model_tbl <- broom::tidy(vmt_model, conf.int = TRUE) %>%
  dplyr::mutate(
    term_clean = dplyr::case_when(
    term == "(Intercept)" ~ "Intercept",
    stringr::str_detect(term, "^telework_group") ~ stringr::str_replace(term, "telework_group", "Telework: "),
    stringr::str_detect(term, "^num_vehicles_group") ~ stringr::str_replace(term, "num_vehicles_group", "Vehicles: "),
    stringr::str_detect(term, "^income_broad_label") ~ stringr::str_replace(term, "income_broad_label", "Income: "),
    TRUE ~ term
  ),
    stars = dplyr::case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      TRUE ~ ""
    )
  ) %>%
  dplyr::select(term_clean, estimate, std.error, statistic, p.value, stars, conf.low, conf.high)

Table 67 presents the base weighted model with standard errors and confidence intervals.

Code
gt::gt(model_tbl) %>%
  gt::fmt_number(
    columns = c(estimate, std.error, statistic, conf.low, conf.high),
    decimals = 2
  ) %>%
  gt::fmt_number(columns = p.value, decimals = 3) %>%
  gt::cols_label(
    term_clean = "Term",
    estimate = "Estimate",
    std.error = "Std. Error",
    statistic = "t-value",
    p.value = "p-value",
    stars = "",
    conf.low = "CI Low",
    conf.high = "CI High"
  ) %>%
  gt::tab_header(
    title = "Base Survey-Weighted Linear Model of Daily VMT",
    subtitle = "Outcome: Weighted Vehicle-Miles Traveled per Diary Day"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )
Base Survey-Weighted Linear Model of Daily VMT
Outcome: Weighted Vehicle-Miles Traveled per Diary Day
Term Estimate Std. Error t-value p-value CI Low CI High
Intercept 25.45 5.58 4.56 0.000 *** 14.51 36.40
Telework: 1-120 min 2.98 7.18 0.41 0.679 −11.11 17.06
Telework: 121-240 min −0.13 6.43 −0.02 0.983 −12.74 12.47
Telework: 240+ min −17.63 5.75 −3.07 0.002 ** −28.89 −6.37
Telework: Missing −22.98 6.27 −3.66 0.000 *** −35.27 −10.69
Vehicles: 1 6.08 3.35 1.82 0.069 −0.48 12.65
Vehicles: 2 17.76 5.69 3.12 0.002 ** 6.61 28.90
Vehicles: 3 17.77 9.62 1.85 0.065 −1.10 36.64
Vehicles: 4+ 21.04 9.44 2.23 0.026 * 2.54 39.54
Income: $25,000-$49,999 11.98 9.45 1.27 0.205 −6.54 30.51
Income: $50,000-$74,999 3.48 3.84 0.91 0.364 −4.04 11.01
Income: $75,000-$99,999 −0.31 3.04 −0.10 0.920 −6.27 5.66
Income: $100,000-$199,999 13.84 7.55 1.83 0.067 −0.96 28.65
Income: $200,000 or more 2.76 4.12 0.67 0.503 −5.31 10.83
Income: Prefer not to answer 1.52 4.53 0.34 0.737 −7.36 10.40
Table 67: Base survey-weighted daily VMT model.

Why Use Weighted Models?

Weighted models become especially useful when analysts need to compare groups, adjust for multiple factors at once, or stabilize estimates for small subgroups. They do not replace descriptive tables, but they provide a more reliable route to inference when the question extends beyond simple description.