Massachusetts Travel Study Data User Guide

Author

Resource Systems Group, Inc.

Published

May 27, 2026

Data User Guide

Massachusetts Travel Study Data User Guide

A reusable reference for study design, survey documentation, delivered data structure, weighting, codebook metadata, and analyst workflows.

Choose one of the four sections below for a guided path through the guide.

Study Design Start with the study overview, sample design, and survey instrument to understand who was sampled, how the survey was fielded, and what participants were asked. Data Processing Review how survey records were processed, cleaned, organized, and weighted so you can understand how the delivered files were constructed. Data Reference Use the dataset overview, codebook, and frequency tables to inspect table structure, variable definitions, value labels, and observed distributions. Using the Data Move into setup, joins, analytic units, variables, weights, and worked examples that show how to use the delivered data in analysis.

1 Overview

The Massachusetts Department of Transportation (MassDOT) contracted with RSG to conduct the 2024-2025 Massachusetts Travel Study (MTS), a statewide survey designed to collect demographically and geographically representative travel behavior data from 15,140 households across the Commonwealth. The survey exceeded this target and collected data from 18,122 households. Since the last Massachusetts Household Travel Survey was conducted in 2011, the transportation landscape, inclusive of infrastructure, services, and travel behaviors, has changed substantially. MassDOT conducted the MTS to gain a better understanding of changing travel patterns and mode choice, as well as to inform planning efforts and tools, including the statewide travel demand model.

1.1 Study Geography

The study sampled 0.6% of Massachusetts households and was geographically and demographically representative of the Commonwealth’s population.

To support adequate sample across the state, the survey team used Massachusetts’ 13 sample geographies for stratification. These geographies align with the Commonwealth’s MPO and regional planning areas and are summarized in Table 1.

Sample Geography	MPO / Regional Planning Body	Regional Description	Sample-Frame Households
Berkshire	Berkshire Regional Planning Commission	Berkshire County and surrounding western Massachusetts communities.	56,078
Boston Region	Boston Region MPO / MAPC	Greater Boston core and inner suburban communities.	1,315,052
Cape Cod	Cape Cod Commission	Barnstable County and the Cape Cod region.	99,969
Central Massachusetts	Central Massachusetts Regional Planning Commission	Worcester-area communities in central Massachusetts.	229,416
Franklin	Franklin Regional Council of Governments	Franklin County and nearby western Massachusetts communities.	31,234
Martha's Vineyard	Martha's Vineyard Commission	The island region of Martha's Vineyard.	6,899
Merrimack Valley	Merrimack Valley Planning Commission	Northeastern Massachusetts communities centered on the Merrimack Valley.	137,029
Montachusett	Montachusett Regional Planning Commission	North-central Massachusetts communities in the Montachusett region.	98,602
Nantucket	Nantucket Planning and Economic Development Commission	The island region of Nantucket.	4,659
Northern Middlesex	Northern Middlesex Council of Governments	Lowell-area and nearby communities in northern Middlesex County.	113,727
Old Colony	Old Colony Planning Council	South Shore and Plymouth County communities in the Old Colony region.	142,628
Pioneer Valley	Pioneer Valley Planning Commission	Connecticut River Valley communities in western Massachusetts.	244,794
Southeastern Massachusetts	Southeastern Regional Planning and Economic Development District	South Coast and southeastern Massachusetts communities.	260,908
Household totals come from the MassDOT sampling plan and reflect the ABS sample frame, not achieved completes.

Table 1: Massachusetts sample geographies used for statewide stratification.

These geographies are shown in Figure 1 below.

Section 2 provides additional details about the sample design.

Figure 1: Map of the Survey Region, Showing Sample Geographies by Block Group

1.2 Study Timeline

Data collection for the 2024 Massachusetts Travel Study started in May 2024 and continued through June 2025 over three fielding periods, detailed in Table 2.

SURVEY TASK	TIMELINE
Survey Design (Sample planning, survey website programming, invitation development)	January 2024 - April 2024
Data Collection - Spring 2024 (Sending invitations, data monitoring, and adjustments)	May 2024 - July 2024
Data Collection - Fall 2024 (Sending invitations, data monitoring, and adjustments)	September 2024 - November 2024
Data Collection - Winter/Spring 2025 (Sending invitations, data monitoring, and adjustments)	January 2025 - June 2025
Data Preparation (Data cleaning and weighting, finalizing dashboard, final reporting)	June 2025 - June 2026

Table 2: Survey task timeline

Table 3 displays the number of completed households by year and month of first travel date. Travel data collection was limited in the Summer, when school was out of session, and in the winter during peak holiday travel periods.

Month	2024	2025
Complete Households by Month and Year of First Travel Date
January	0	1,390
February	0	276
March	0	473
April	0	1,980
May	1,246	522
June	2,461	0
July	0	0
August	0	0
September	2,719	0
October	4,188	0
November	386	0
December	0	0
Incomplete households are excluded from this summary.

Table 3: Completed Households by First Travel Date

1.3 Data Collection

Survey data were collected through a mixed-mode design that combined:

Smartphone-based travel diary (rMove®): Participants recorded travel via smartphone app in real-time for up to seven consecutive days.
Web-based travel diary (rMove for Web): Participants reported travel via a web survey on one assigned weekday.
Call center interviews: Participants reported travel via a call center on one assigned weekday, and were recorded in the web-based travel diary (or rMove for Web).

Each household first completed a recruit survey describing household composition, demographics, and vehicles, followed by a travel diary describing all trips made during the assigned day(s).

Table 4 shows a count of completed households by survey mode.

	Survey Mode	Completed Households	Percentage
Completed Households by Survey Mode
	Web-based Diary (rMove for Web)	8,660	55.4%
	Smartphone App (rMove app)	6,410	41.0%
	Call Center Interview (rMove for Web)	571	3.7%
Total	—	15,641	—

Table 4: Completed Households by Survey Mode

Section 3 provides further details about the survey instrument and question content.

2 Sample Design

The Massachusetts Travel Study used a probability-based, geographically stratified sample of households across Massachusetts.

The primary method of sampling was a probability address-based sampling (ABS) approach, whereby Massachusetts was stratified by key demographic features along census block groups and within those segments, random households throughout Massachusetts were invited into the study through the mail.

2.1 Sampling Framework

Sampling Frame

The survey used a United States Postal Service (USPS) address-based sampling (ABS) frame that includes all Massachusetts residential addresses, excluding group quarters such as dormitories, prisons, or assisted living facilities. Each sampled address represented a single household eligible for recruitment. The ABS frame provided complete statewide coverage and supported stratification by geography, land-use density, and socioeconomic characteristics.

Primary Sampling Unit

The primary sampling unit was the household, selected through random sampling from the ABS frame. All household members reported person-level and trip-level details, but only one member (the “primary respondent”) completed the recruit survey on behalf of the household.

Though the primary sampling unit was the household, the data collected also represent the behavior of individual persons. For participants who reported data using the smartphone, data were collected across multiple days, representing a multitude of travel and daily activity data.

Surveyable Population

Not all household members were surveyable. Only persons related to Person 1 (the primary respondent) were considered surveyable. Non-surveyable members (e.g., guests, visitors, or unrelated roommates) did not have trip or day completion requirements and were excluded from household-level completeness determinations.

Non-surveyable members can be identified in the person table using the surveyable variable. These members do not contribute to the household’s completeness status. They count towards the total number of household members but are not weighted (see Section 5.4.2.2).

2.2 Target Completions

The study’s goal was 15,140 completed household surveys statewide, distributed across MPO areas. The study achieved 18,122 completed households, exceeding the statewide target while maintaining proportionality across MPO geographies.

Sampling by Season and Wave

Sampling was conducted in three main fielding periods: Spring 2024, Fall 2024, and Winter/Spring 2025. Each wave included households from across the study geography, with adjustments to invitation pacing and oversampling emphasis based on observed response patterns and representativeness in earlier periods. This adaptive approach helped the final sample maintain geographic balance and demographic coverage across the state.

2.3 Stratification and Oversampling

The sample design combined statewide address-based sampling with geographic stratification and targeted oversampling. Within each MPO geography, sampled block groups were assigned to one of four mutually exclusive strata so the study could improve representation of groups that are typically harder to recruit while also increasing sample for key policy questions.

Sample strata

General population. Block groups that did not qualify for any of the targeted oversampling strata.
Rural population. Block groups that did not qualify for other oversampling strata and had fewer than 150 people per square kilometer.
Hard-to-reach oversample. Block groups with at least 30% of households earning less than $25,000 per year, at least 60% of households identified as Hispanic and/or BIPOC, or at least 15% of households speaking limited English.
Walk/Bike/Transit oversample. Block groups within the Boston Region MPO with transit access density classified as CBD or Dense Urban.

The sample plan identified 270 block groups that qualified for both the hard-to-reach and walk/bike/transit strata. Those block groups were assigned to the hard-to-reach stratum, which the plan anticipated would have the lower response rate. That overlap rule made the final strata mutually exclusive for sample management and weighting. Table 5 summarizes the resulting block groups, households, and adults in each sample stratum.

Sample Stratum	Number of BGs	Total Households	Total Adults	Adults per Household
Walk/Bike/Transit	327	183,508	354,282	1.9
Hard-to-reach	1,330	670,542	1,337,712	2.0
Rural	475	250,685	517,967	2.0
General	2,923	1,636,260	3,380,998	2.0
Total	5,055	2,740,995	5,590,959	2.0

Table 5: Survey region households and adults by sample stratum.

The hard-to-reach oversample increased representation for lower-income, BIPOC, and limited-English block groups, while the walk/bike/transit oversample increased representation for dense Boston-area block groups where multimodal travel was expected to be more common. Together, these design choices improved analytic coverage without changing the fact that the final weighted dataset represents the statewide household population. Table 6 summarizes the reference households, invitations sent, and invitation rates used across geographies and sample strata.

Geography	Sample Stratum	Invitations Sent	Reference Households	Invitation Rate
Berkshire	General	3,755	22,471	16.7%
Berkshire	Hard-to-reach	4,826	10,343	46.7%
Berkshire	Rural	6,793	23,264	29.2%
Boston Region	General	146,945	754,832	19.5%
Boston Region	Hard-to-reach	171,751	355,304	48.3%
Boston Region	Rural	5,870	21,408	27.4%
Boston Region	Walk/Bike/Transit	28,061	183,508	15.3%
Cape Cod	General	31,328	79,927	39.2%
Cape Cod	Hard-to-reach	4,927	7,617	64.7%
Cape Cod	Rural	4,499	12,425	36.2%
Central Massachusetts	General	26,344	133,520	19.7%
Central Massachusetts	Hard-to-reach	35,015	57,589	60.8%
Central Massachusetts	Rural	11,583	38,307	30.2%
Franklin	General	1,477	10,124	14.6%
Franklin	Hard-to-reach	1,230	4,258	28.9%
Franklin	Rural	4,395	16,852	26.1%
Martha's Vineyard	General	3,559	3,128	113.8%
Martha's Vineyard	Rural	3,678	3,771	97.5%
Merrimack Valley	General	22,319	86,361	25.8%
Merrimack Valley	Hard-to-reach	40,407	42,244	95.7%
Merrimack Valley	Rural	4,504	8,424	53.5%
Montachusett	General	14,498	53,897	26.9%
Montachusett	Hard-to-reach	7,321	12,125	60.4%
Montachusett	Rural	11,911	32,580	36.6%
Nantucket	General	5,474	3,032	180.5%
Nantucket	Rural	2,856	1,627	175.5%
Northern Middlesex	General	22,001	86,555	25.4%
Northern Middlesex	Hard-to-reach	22,886	23,834	96.0%
Northern Middlesex	Rural	1,356	3,338	40.6%
Old Colony	General	37,417	103,891	36.0%
Old Colony	Hard-to-reach	28,394	31,685	89.6%
Old Colony	Rural	3,182	7,052	45.1%
Pioneer Valley	General	22,925	129,544	17.7%
Pioneer Valley	Hard-to-reach	48,752	71,373	68.3%
Pioneer Valley	Rural	10,740	43,877	24.5%
Southeastern Massachusetts	General	50,296	168,978	29.8%
Southeastern Massachusetts	Hard-to-reach	36,651	54,170	67.7%
Southeastern Massachusetts	Rural	13,365	37,760	35.4%

Table 6: Reference households, invitations sent, and invitation rates by geography and sample stratum.

Across all waves, the study sent 903,291 invitations. Relative to the reference household counts in each sample segment, the statewide invitation rate was 23.7% in the general stratum, 60.0% in the hard-to-reach stratum, 33.8% in the rural stratum, and 15.3% in the walk/bike/transit stratum. These realized invitation rates show that the field effort emphasized hard-to-reach households statewide while also maintaining targeted coverage of rural areas and dense multimodal areas in the Boston Region.

In a small number of segments, the cumulative invitation rate exceeded 100%. This reflects repeated fielding across waves relative to the segment’s reference household count and should be interpreted as the total invitation effort rather than as unique-household coverage of the sampling frame.

Recruitment Channels

Households were recruited primarily through mailed invitation letters that directed sampled addresses to the study website and provided their survey access information. Follow-up reminder postcards were sent to nonresponding households, and the project team also used website support, email, and phone follow-up where appropriate to help households complete the study.

Participation Modes

Eligible households could participate via:

rMove smartphone app (seven-day diary),
rMove for Web (one-day diary), or
Call-center interview (one-day diary).

Mode assignment depended on household technology access and preference; all modes followed identical survey logic and data validation.

Incentives

The study used different incentive amounts by participation mode and sample stratum. rMove incentives were paid per adult; online and call-center incentives were paid per household.

General population — rMove: $25 per adult.
General population — web or call center: $15 per household.
Hard-to-reach population — rMove: $35 per adult.
Hard-to-reach population — web or call center: $25 per household.

Monitoring and Response Tracking

RSG maintained a real-time survey monitoring dashboard accessible to the Massachusetts Travel Study team throughout data collection.

The dashboard provided:

Response rates by segment and demographic subgroup,
Comparison to American Community Survey (ACS) benchmarks, and
Progress toward study targets

This tool supported geographic balance and demographic representativeness through adaptive field management.

Analyst Tip: Interpreting Response Rates

Differences in observed response rates across MPOs or demographic strata reflect design priorities, not data quality. The weighting process fully corrects for these differences, so analysts should rely on weighted data for representativeness.

2.4 Representativeness and Nonresponse

Post-survey comparisons to ACS and model control totals indicated that the achieved sample closely reflected the statewide household population by:

Income group,
Household size,
Vehicle availability, and
Land-use context.

Residual differences were addressed through weighting adjustments (see Section 5).

3 Survey Instrument

The Massachusetts Travel Study collected detailed information about households, people, vehicles, and daily travel through a unified instrument designed for use across multiple reporting platforms. Each mode implemented the same core survey logic, ensuring results are directly comparable across participants and modes.

3.1 Survey Modes and Language Support

The survey instrument was administered through three participation modes:

Smartphone-based travel diary (rMove): Participants recorded travel via smartphone app in real time for up to seven consecutive days.
Web-based travel diary (rMove for Web): Participants reported travel via a web survey on one assigned weekday.
Call center interviews: Participants reported travel via a call center on one assigned weekday, and were recorded in the web-based travel diary (or rMove for Web).

The survey instrument was available in English and Spanish. The call center also supported participation in Portuguese, Chinese, Haitian Creole, Vietnamese, and Russian.

3.2 Recruit Survey

The recruit survey established household eligibility and collected baseline information to assign travel days and tailor diary prompts. Key modules included:

Household composition and member roster
Demographics (age, gender, race/ethnicity, income, employment, student status)
Housing characteristics (type, tenure, vehicles available)
Technology access and preferred reporting mode

The recruit survey was completed by the primary respondent (Person 1) on behalf of all household members. The primary respondent was also responsible for ensuring that all eligible household members completed their assigned travel diary.

3.3 Travel Diary

The travel diary collected information about travel made on the assigned reporting day or days. In the smartphone app diary, participants reviewed passively collected travel and completed prompted trip surveys. In rMove for Web and the call center diaries, respondents reported travel directly through the prompted diary instrument.

Across modes, the travel diary collected:

Day-begin and day-end location confirmation
Trip destinations, purposes, travel modes, and timing
Access, transfer, and egress details for transit trips
Companion and escort activity (where applicable)
Reasons for no travel on the assigned day (where applicable)

Adults could report their own travel, while proxy reporting was used for children (under 18 years) and other eligible household members.

Analyst Tip: Multi-Day Travel Data

Only participants who used the rMove smartphone app recorded travel for multiple days, including weekends. The standard weights represent Monday, Tuesday, Wednesday, and Thursday travel. For analyses that compare travel across Monday through Sunday, use the alternate day-of-week weights described in the weighting chapter and analyst handbook.

3.4 Daily Surveys

The daily survey collected additional context about each reporting day, including:

Deliveries and pickups (e-commerce activity)
Telecommuting activity
Attitudinal questions
School attendance and activities

Some questions were repeated across all travel days; others (e.g., attitudinal questions) were asked only once. Household-level questions were asked of the primary respondent, while person-level questions were directed to each individual respondent.

When respondents completed the travel diary via browser or call center, “daily” questions were consolidated into a single survey following the travel diary.

3.5 Travel Date Assignment

Households were assigned one of the study’s weighted travel weekdays (Monday, Tuesday, Wednesday, and Thursday) during the study period. Households participating via rMove were assigned a seven-day reporting period beginning with the assigned start date. Households participating via web or call center reported travel for one assigned day and completed the survey after that travel date.

3.6 Questionnaire

The survey instrument covered the standard household travel survey modules needed for household, person, vehicle, travel day, trip, location, and tour delivery tables. It also included modules that were especially important for MassDOT’s analysis needs, including deliveries, telecommuting, school and work travel context, and household roster detail.

Question wording and skip logic were aligned across smartphone, web, and call center participation so that the delivered analysis variables remain comparable across modes. Table 7 summarizes the major topic areas covered by the instrument.

Topic Area	What the Instrument Collected
Household	Household composition, vehicles, income, home context, and respondent assignment
Person	Demographics, employment, student status, technology access, and proxy reporting
Travel day	Assigned travel date, begin/end-of-day location, no-travel confirmation, and daily context
Trip	Destinations, purposes, modes, timing, transfers, and related trip details
School and work context	Commuting, telework, school attendance, and related routine travel context
Special topics	Deliveries and pickups, attitudinal questions, and study-specific follow-up items

Table 7: Major survey instrument topic areas.

4 Data Processing

4.1 Overview

This section describes the procedures used to transform raw household travel survey data – collected from participants’ smartphones and survey responses – into clean, analysis-ready tables. The process was designed to preserve the integrity of participants’ reported travel while correcting errors, filling gaps, and enriching the data with geographic and analytical variables that support modeling and planning applications.

Data processing occurred in four phases:

Automated Processing – Raw survey records were copied into a structured working environment, trips were routed on the road network, and an automated classifier flagged trips that required human attention.
Analyst Review – Trained data analysts reviewed flagged trips using a web-based interface, correcting errors in trip start and end points, splitting trips that contained unreported stops, joining trip fragments that were incorrectly separated, and removing invalid trips.
Post-Review Processing – After analyst review, the data underwent a second round of automated processing that cleaned remaining issues, assigned geographic identifiers, imputed trip purposes where necessary, and performed a second pass of transit trip unlinking.
TICTOC Processing – The processed unlinked trip data underwent additional treatment through TICTOC (Trip Imputation, Coordination, and Tour Organization Compiler), which prepared household travel survey data for travel forecasting by imputing selected missing trips, coordinating joint household travel, organizing unlinked trips into linked trips and tours, and adding model-facing attributes to the household, person, day, trip, linked trip, and tour outputs.

Each phase is described in detail in the sections that follow.

4.2 Automated Processing

When a participant completed a travel day, the smartphone application transmitted a set of raw records to our survey platform. These records included household and person characteristics from the recruitment survey, GPS traces from the phone’s location sensors, and the participant’s own descriptions of their trips – where they went, how they traveled, and why.

Automated processing began by copying these raw records into a working environment where they could be modified without affecting the original data. Several operations were then performed in sequence.

Data Completion and Household Disposition

The system evaluated whether each participant had provided sufficient data for their assigned travel days. Households that had completed all assigned travel periods were marked as complete. Households with insufficient data – such as those that uninstalled the app before their travel period ended – were flagged and could be excluded from further processing depending on the study’s sample requirements.

Trip Classification

After routing, an automated classifier examined each trip to determine whether it required analyst review. The classifier evaluated a set of rules based on trip characteristics – for example, whether the trip had an unusually high speed for its reported mode, whether it appeared to duplicate another trip, or whether its start and end times overlapped with other trips by the same person.

Trips that passed all checks were considered clean and did not require review. Trips that failed one or more checks were flagged for analyst attention and assigned to the review queue.

Transit Unlinking

Trips reported by participants as a single transit journey were automatically separated into their component segments: the walk, bike, or drive to the transit stop (the access leg); the ride on the transit vehicle itself; and the walk, bike, or drive from the alighting stop to the final destination (the egress leg). This separation was based on routing data that identified where the mode of travel changed.

To do this, the system used the Google Routes API (Google API) to identify the most likely path between the trip’s origin and destination, then classified each segment of that path as walk, bike, drive, or transit based on the routing profile. This step produced separate trip records for each segment of a transit journey. Only rMove-recorded trips were subject to transit unlinking; manually added trips and trips recorded through the online survey instrument was not processed through this step. The Post-Review Processing step described later performs a second round of transit unlinking that applies to all trip types, including those not processed through this initial automated unlinking.

4.3 Analyst Review

After automated processing, trained analysts reviewed all flagged trips using a web-based editing tool that displayed each trip on a map alongside its GPS trace and survey responses. The goal of analyst review was to help the final trip table accurately reflect the travel that actually occurred, correcting errors that automated processing could not resolve.

Analysts performed four primary types of edits:

Dropping invalid trips. Some GPS traces were recorded as trips by the application when no actual travel occurred – for example, when a phone drifted in a parking garage or when a brief walk to the mailbox was detected. Analysts removed these records.
Joining trip fragments. Occasionally a single trip was recorded as two or more separate trips due to GPS signal loss (for example, when a participant entered a tunnel or a large building). Analysts merged these fragments back into the trip they represented.
Splitting trips with unreported stops. When a participant made an intermediate stop during a trip – such as stopping for coffee on the way to work – but the application recorded it as a single trip, analysts split it into two trips with the correct stop location and times.
Reviewing transit trips. Analysts verified that transit trips had been correctly separated into access, transit, and egress segments, and adjusted the segment boundaries if the automated unlinking produced incorrect results.

After analyst review, households with no remaining flagged trips were marked as ready for post-review processing.

4.4 Post-Review Processing

Post-review processing transformed the analyst-reviewed trip records into the final tables that make up the delivered dataset. This phase involved extensive cleaning, quality checks, and enrichment of the data. Steps included table construction, trip cleaning, location processing, distance derivation, geographic enrichment, and purpose assignment and imputation. Each of these steps is described in detail below.

Table Construction

The survey platform stores data in a “normalized” (long-format) database structure optimized for data collection. Post-review processing reshaped these records into the “denormalized” (wide-format), analysis-ready tables familiar to data users: household, person, day, trip, and vehicle. During this step, variables were renamed to standard conventions and identifiers were standardized across tables.

Post-Review Trip Cleaning

Post-review cleaning addressed a range of data quality issues that remained after analyst review. The cleaning process proceeded through several sub-steps:

Missing coordinates. A small number of trips may have missing origin or destination coordinates. Where possible, these were filled from nearby GPS points in the trip’s trace.
Travel period enforcement. Trips that fell outside the participant’s assigned travel period were removed. The travel day boundary was set at 3:00 AM rather than midnight, so a trip departing at 1:00 AM was assigned to the previous calendar day’s travel.
Zero or negative duration. Trips with nonsensical durations were removed.
Transit segment processing. Transit trips were separated into their component legs using the routing data produced during automated processing. Each segment received its own origin, destination, departure time, and arrival time.
Spatial gap cleaning. When the destination of one trip and the origin of the next trip were far apart but the trips themselves appeared to be near-duplicates (similar origins and destinations, similar times), one of the duplicate trips was removed. This addressed situations where overlapping device recordings or delayed survey submissions produced redundant trip records.
Overlapping trip resolution. Trips whose time windows overlapped were resolved through an iterative set of rules that favored completed surveys over incomplete ones, longer trips over shorter ones, and trips with reasonable speeds over those with extreme speeds.
TNC trip correction. In some cases, analysts split ride-hailing trips (such as Uber or Lyft) into separate segments during review. Because a ride-hailing trip is a single journey from the passenger’s perspective, these segments were automatically merged back together.
Loop trip splitting. A loop trip is one where the participant departs from and returns to the same location – for example, a jog around the neighborhood or a drive to run multiple errands that ends back at home. When the GPS trace revealed a clear outbound and return path, the loop was split at the point farthest from the origin. This produced two trips: one outbound and one return. This is important for modeling because the outbound and return portions of a loop trip often serve different purposes or pass through different areas.
Dwell time calculation. The time spent at each destination (the interval between arriving at one location and departing for the next trip) was calculated and stored as dwell_mins and dwell_time_hr.

Proxy and Copied Trips

In a household travel survey, not every household member carries a smartphone or directly reports their own travel. Young children, for example, typically do not have their own devices. Instead, an adult household member, called a proxy, reports travel on the child’s behalf. In most cases, this means the child was traveling with the adult; the child’s trip record was therefore created as a copy of the adult’s trip.

These copied trips were created during automated processing (before analyst review) and were preserved through all subsequent processing steps. A copied trip has the same GPS trace, origin, destination, departure time, arrival time, and distance as the trip it was copied from – only the person identifier differs. The flag copied_from_proxy identifies these records.

Analysts should be aware that copied trips will produce identical geometries and travel times for multiple household members. This is expected and correct: it reflects the fact that those individuals were traveling together. Proxy-copied trips are distinct from TICTOC joint trip imputations, which are created later in processing based on a different set of rules (see Section 4.5).

Location Processing and Distance Derivation

Raw GPS traces can contain hundreds of individual location points for each trip. Location processing cleaned these traces and prepared them for distance and duration calculations.

The location-processing step included:

Removing erroneous points, including points flagged as untrustworthy by rMove.
Eliminating duplicate points from the GPS trace.
Imputing start and end points where the GPS trace did not perfectly align with the trip’s reported departure and arrival.

After cleaning, the trace data was used to recalculate distance and duration measures for GPS-tracked trips.

distance_m was calculated as the sum of straight-line distances between consecutive points along the cleaned GPS trace. This trace-based distance represents the approximate path the traveler followed. Units: meters.
distance_beeline_m was calculated as the direct straight-line distance from origin to destination. Unlike distance_m, which depends on the available trip geometry, distance_beeline_m is calculated consistently from the trip origin and destination and is retained for comparison and quality assurance. Units: meters.
distance_miles is derived from distance_m by converting meters to miles (1 mile = 1,609.34 meters). Units: miles.
duration_s was recalculated from the cleaned GPS timestamps. Units: seconds.

For trips without a usable GPS trace, including participant-added trips, analyst-added trips, and trips collected through the online survey instrument, distance_m could not be calculated from observed trace points. These trips were processed separately using origin-destination network routing, described below.

Origin-Destination Routing for Non-GPS Trips

Origin-destination routing was used for trips that had only a reported origin and destination and no full GPS trace. This included manually added trips and trips recorded through the online survey instrument.

To estimate a realistic path distance for these trips, origins and destinations were routed through the Open Source Routing Machine (OSRM), a routing engine built on OpenStreetMap data.

The routing process used:

Origin and destination coordinates as the required inputs.
Mode-specific routing profiles for automobile, bicycle, and pedestrian travel.
Shortest feasible network paths between each trip’s endpoints.

The routing step produced distance_m, a network-based path distance.

For GPS-tracked trips, distance_m values were derived from the cleaned GPS trace by summing point-to-point distances along the observed path. For trips without GPS traces, OSRM origin-destination routing provided an analogous path-based distance rather than a simple straight-line distance.

As a result, the delivered distance_m field represents the best available path-distance estimate for each trip:

GPS-tracked trips: observed trace distance.
Trips with only origin and destination coordinates: routed network distance.

The separate distance_beeline_m field provides a consistent straight-line origin-to-destination distance for comparison and quality assurance.

%%{init: {"theme":"base","flowchart":{"htmlLabels":true,"curve":"basis","nodeSpacing":36,"rankSpacing":52},"themeVariables":{"fontFamily":"Bai Jamjuree, Arial, sans-serif"}}}%%
flowchart TB

    subgraph ALL["<b>All trips</b>"]
        direction LR

        subgraph GPS["<b>GPS-Tracked Trips</b>"]
            direction TB
            TRACE("<span style='font-size:1.02em; font-weight:700;'>Cleaned GPS trace</span><br/><span style='font-size:0.88em; font-weight:400;'>(cleaned location points)</span>"):::gpsNode
            SUMHAV("<span style='font-size:1.02em; font-weight:700;'>Summed trace distance</span><br/><span style='font-size:0.88em; font-weight:400;'>point-to-point haversine distances</span>"):::gpsNode
            TRACE --> SUMHAV
        end

        BEELINE("<span style='font-size:1.02em; font-weight:700;'>distance_beeline_m</span><br/><span style='font-size:0.88em; font-weight:400;'>= haversine(origin,<br/>destination)</span>"):::beelineNode

        subgraph NONGPS["<b>Non-GPS Trips</b>"]
            direction TB
            OD("<span style='font-size:1.02em; font-weight:700;'>Origin + Destination</span><br/><span style='font-size:0.88em; font-weight:400;'>coordinates only</span>"):::manualNode
            OSRM("<span style='font-size:1.02em; font-weight:700;'>OSRM shortest path</span><br/><span style='font-size:0.88em; font-weight:400;'>network route</span>"):::manualNode
            OD --> OSRM
        end
    end

    DM("<span style='font-size:1.05em; font-weight:700;'>distance_m</span><br/><span style='font-size:0.89em; font-weight:400;'>path-distance estimate</span>"):::outputNode
    MILES("<span style='font-size:1.03em; font-weight:700;'>distance_miles</span><br/><span style='font-size:0.89em; font-weight:400;'>= distance_m / 1,609.34</span>"):::outputNode
    SPEED("<span style='font-size:1.03em; font-weight:700;'>speed_mph</span><br/><span style='font-size:0.89em; font-weight:400;'>= distance_miles /<br/>(duration_s / 3,600)</span>"):::outputNode

    SUMHAV --> DM
    OSRM --> DM
    DM --> MILES
    MILES --> SPEED

    classDef gpsNode fill:#FFF7E8,stroke:#F4A300,color:#232323,font-family:Bai Jamjuree,stroke-width:2.6px,font-weight:bold,fill-opacity:0.98
    classDef manualNode fill:#FFF0EB,stroke:#E94B2E,color:#232323,font-family:Bai Jamjuree,stroke-width:2.6px,font-weight:bold,fill-opacity:0.98
    classDef beelineNode fill:#F6F6F6,stroke:#9CA3AF,color:#232323,font-family:Bai Jamjuree,stroke-width:2.4px,font-weight:bold,fill-opacity:0.98
    classDef outputNode fill:#F7F7F7,stroke:#9CA3AF,color:#232323,font-family:Bai Jamjuree,stroke-width:2.4px,font-weight:bold,fill-opacity:0.98

    style ALL fill:#FCFCFB,stroke:#A8A29E,color:#232323,fill-opacity:0.42,stroke-width:2px
    style GPS fill:#FFF7E8,stroke:#F4A300,color:#232323,fill-opacity:0.24,stroke-width:2.2px
    style NONGPS fill:#FFF0EB,stroke:#E94B2E,color:#232323,fill-opacity:0.24,stroke-width:2.2px

    linkStyle 0 stroke:#F4A300,stroke-width:4px,stroke-linecap:round
    linkStyle 1 stroke:#E94B2E,stroke-width:4px,stroke-linecap:round
    linkStyle 2 stroke:#F4A300,stroke-width:4px,stroke-linecap:round
    linkStyle 3 stroke:#E94B2E,stroke-width:4px,stroke-linecap:round
    linkStyle 4 stroke:#6B7280,stroke-width:4px,stroke-linecap:round
    linkStyle 5 stroke:#6B7280,stroke-width:4px,stroke-linecap:round

Figure 2: Trip distance derivation for GPS-tracked and manually added trips.

Table 8 summarizes the distance source used for each major trip type in the delivered data.

Trip Type	`distance_m` Source	`distance_beeline_m`	Notes
Distance derivation by trip type
GPS-tracked trips	Cleaned GPS trace	Haversine O-D	Observed path distance from cleaned trace points
Manually added trips	OSRM origin-destination network route	Haversine O-D	Only origin and destination were available; no trace
Online survey trips	OSRM origin-destination network route	Haversine O-D	Only origin and destination were available; no trace
Split loop legs	Cleaned GPS trace, where trace geometry was available	Haversine O-D per leg	Each leg was processed independently
Unlinked transit legs	Cleaned GPS trace, where trace geometry was available	Haversine O-D per leg	May have been sparse if portions of the trip occurred underground
Synthetic access/egress	NA	NA	Zero-distance placeholder
`distance_m` was a path-distance measure: cleaned trace distance for GPS-tracked trips and OSRM origin-destination network distance for trips without usable trace geometry. `distance_beeline_m` was calculated consistently as the direct origin-to-destination distance and was provided for comparison and quality assurance.

Table 8: Distance derivation by trip type.

Geographic Enrichment

Trip origins and destinations, home locations, and habitual work and school locations were assigned to U.S. Census geographic units through spatial point-in-polygon joins. Table 9 summarizes the identifiers added during this step.

Variable Pattern	Geography	Applied To
Geographic variables added during spatial enrichment
*_bg_2020	Census Block Group (2020)	Home, work, school, trip O/D
*_puma_2022	Public Use Microdata Area (2022)	Home, work, school, trip O/D
*_county	County (derived from block group)	Home, work, school, trip O/D
*_state	State (derived from block group)	Home, work, school, trip O/D
o_in_region, d_in_region	Study region boundary	Trip O/D

Table 9: Geographic variables added during spatial enrichment.

These identifiers enable geographic analysis at multiple levels without requiring users to perform their own spatial operations.

Purpose Assignment and Imputation

Each trip in the dataset has a destination purpose (d_purpose) describing why the traveler went to that location, for example, going to work, shopping, or returning home. Respondents report the destination purpose in the trip survey. The origin purpose (o_purpose) is generally derived from the previous trip’s destination purpose, reflecting the activity the traveler was engaged in before departing.

Purpose assignment involves several steps:

Purpose cleaning. Purposes on split loop trips were corrected: the return leg was assigned the purpose of the location the traveler was returning to (typically the same as the purpose two trips prior). Unlinked transit segments were assigned a purpose of “change mode.”
Purpose categorization. Detailed purpose codes from the survey were grouped into broader purpose categories (e.g., “work,” “school,” “shopping,” “social/recreation”) to support aggregate analysis.
Open-ended purpose classification. When a participant selected “other” as the trip purpose and provided a free-text description, that description could be assigned to one of the standard purpose categories used in the dataset using a language model. This step reduced the number of uncategorized “other” trips while preserving a consistent set of purpose categories for analysis. If no suitable standard category could be identified, the trip remained classified as “other.”
Habitual location matching. Trip endpoints were compared to known home, work, and school locations. If a trip’s destination fell within 100 meters of the participant’s home, it was classified as a home location; similar thresholds applied to work and school. When a trip was reported as “work” but the destination was far from any known work location, it was reclassified as “work-related” to distinguish between commute trips and trips to secondary work sites.
Purpose imputation. Respondents report the purpose of each trip destination, and the origin purpose is generally derived from the destination purpose of the previous trip. During processing, a rules-based algorithm identified trips whose reported purposes appeared inconsistent with their locations or with the surrounding trip sequence and corrected them where appropriate. For example, a trip ending at the participant’s home but reported as “shopping” would be reclassified. This processing could include location-based corrections, derived values for analyst-split trips, and broader imputations when reported purposes were missing or implausible. The imputation algorithm iterated across related trips to resolve chains of dependencies.

Table 10 summarizes the main purpose-assignment and imputation outcomes reflected in the delivered trip data.

Label	Description
Purpose assignment and imputation outcomes
Reported	Purpose as reported by participant, unchanged
Location-corrected	Reported purpose conflicted with proximity to a habitual location (e.g., reported 'work' but location is home)
AI-classified	Participant selected 'other' and provided text; the text was assigned to a standard purpose category using a language model when automated coding was used
Split loop	Return leg of a split loop trip; purpose set to match the location being returned to
Algorithm-imputed	Purpose assigned by the iterative imputation algorithm based on location type, dwell time, and trip sequence
Linked transit	Purpose set to 'change mode' during transit trip linking
Incomplete survey	Trip survey was not completed; purpose defaulted to 'other'
Browser/proxy	Trip was not processed through imputation (browser-move or non-participant copy)
These categories describe how trip purposes may remain as reported or be modified during processing.

Table 10: Purpose assignment and imputation outcomes.

Delivered purpose columns. After processing, the trip table includes both the originally reported purpose fields and the final delivered purpose fields for origins and destinations. The final detailed-purpose columns and final purpose-category columns are paired outputs of the same processing pipeline: they are delivered together and intended to remain consistent with one another. The reported fields preserve the pre-imputation values for comparison. Table 11 summarizes those delivered purpose fields.

Column	Content
Delivered purpose columns on the trip table
d_purpose / o_purpose	Final imputed detailed purpose code. Use these columns when detailed-purpose distinctions are needed.
d_purpose_category / o_purpose_category	Grouped category paired with the final imputed purpose. Derived from the same imputation as `*_purpose`, not from a separate downstream recode.
d_purpose_reported / o_purpose_reported	Originally reported detailed purpose before reclassification and imputation. Provided for comparison and quality assurance.
d_purpose_category_reported / o_purpose_category_reported	Grouped category corresponding to the originally reported purpose.
Use the final `_purpose` or `_purpose_category` columns for analysis, depending on the level of detail needed. These final columns are designed to stay in sync; the `_reported` columns preserve the pre-imputation values.

Table 11: Delivered purpose columns on the trip table.

In most cases, the final and reported purposes are identical. They differ only when processing reclassified or imputed purpose values. The final detailed-purpose and purpose-category fields are intended to agree with each other, though open-ended “other” purposes may require additional analyst caution.

Origin purpose on first trips. Because o_purpose is derived from the previous trip’s destination purpose, the first trip of each person’s travel period has no preceding trip to draw from. In the post-review processed data, o_purpose for first trips will be missing (NA).

During TICTOC processing, origin purposes were recalculated after trip imputation. TICTOC set o_purpose to the previous trip’s d_purpose only when the trip’s origin was spatially consistent with the previous trip’s destination (i.e., within a configurable distance buffer). First trips of the day and trips with a spatial gap from the previous destination retained their existing o_purpose value and were not overwritten. Analysts filtering or tabulating on o_purpose should account for these missing values on first trips.

Mode Type Assignment

The survey asked respondents to select all modes used on each trip from a checkbox list. Respondents could select as many modes as applied. The first four selections are preserved in the delivered unlinked trip table as mode_1, mode_2, mode_3, and mode_4. These columns are unordered: mode_1 is simply the first-reported mode, not a primary or dominant one. For most analyses, including mode share, use mode_type rather than the mode_n columns directly.

mode_type is derived by applying a priority hierarchy across all populated mode_n columns on each trip. When a respondent selected more than one mode, the mode with the highest priority value wins. For example, if a respondent selected both walk and transit, the trip is assigned mode_type = 5 (Transit), because transit outranks walk in the hierarchy. In the rare cases where a respondent selected more than four modes, the mode_type assignment may not correspond to the first four mode_n values.

mode_priority records the numeric priority of the winning mode, and is useful for confirming which mode was selected when a trip has multiple mode_n values populated.

Table 12 shows the full crosswalk of detailed survey mode codes to mode_type groups, in priority order.

`mode_type`	Detailed Mode Value	Detailed Mode
Detailed mode to mode_type crosswalk
Walk
1	1	Walk (or jog/wheelchair)
1	43	Skateboard or rollerblade
Bike
2	2	Standard bicycle (my household's)
2	3	Borrowed bicycle (e.g., a friend's)
2	4	Other rented bicycle
2	56	Other personal bicycle (e.g., cargo, tandem, etc.)
2	82	Electric bicycle (my household's)
2	103	Bicycle or e-bicycle
2	107	Micromobility (e.g., scooter, moped, skateboard)
Bike Share
3	69	Bike-share - standard bicycle
3	70	Bike-share - electric bicycle
Scooter Share
4	73	Moped-share (e.g., Scoot)
4	74	Segway
4	83	Scooter-share (e.g., Bird, Lime)
Taxi
5	36	Regular taxi (e.g., Yellow Cab)
5	60	Other hired car service (e.g., black car, limo)
Tnc
6	49	Uber, Lyft, or other smartphone-app ride service
6	106	Uber/Lyft, taxi or car service
Other
7	5	Other
7	27	Paratransit/Dial-A-Ride (e.g., The RIDE)
7	44	Golf cart
7	45	ATV
7	75	Other
7	77	Personal scooter or moped (not shared)
7	80	Other boat (e.g., kayak)
7	81	Snowmobile
7	104	Other
Car
8	6	Household vehicle 1
8	7	Household vehicle 2
8	8	Household vehicle 3
8	9	Household vehicle 4
8	10	Household vehicle 5
8	11	Household vehicle 6
8	12	Household vehicle 7
8	13	Household vehicle 8
8	14	Household vehicle 9
8	15	Household vehicle 10
8	16	Other vehicle in household
8	17	Rental car
8	22	Other vehicle (not my household's)
8	33	Car from work
8	34	Friend/relative/colleague's car
8	47	Other motorcycle in household
8	54	Other motorcycle (not my household's)
8	68	Cable car or streetcar
8	100	Household vehicle (or motorcycle)
8	101	Other vehicle (e.g., friend's car, rental, carshare, work car)
Car Share
9	18	Carshare service (e.g., Zipcar)
9	59	Peer-to-peer car rental (e.g., Turo)
9	76	Carpool match (e.g., Waze Carpool)
School Bus
10	24	School bus
Shuttle Vanpool
11	21	Vanpool
11	26	Other private shuttle/bus (e.g., a hotel's, an airport's)
11	38	University/college shuttle/bus
11	62	Employer-provided shuttle/bus
Ferry
12	78	Other public ferry or water taxi
12	79	Vehicle ferry (took vehicle on board)
Transit
13	23	Local bus
13	28	Other bus
13	30	Subway
13	39	Light rail
13	42	Other rail
13	55	Express/commuter bus
13	58	Commuter rail
13	61	Rapid transit bus (BRT)
13	102	Bus, shuttle, or vanpool
13	105	Rail (e.g., train, subway)
Ld Passenger
14	25	Intercity bus (e.g., Greyhound)
14	31	Airplane/helicopter
14	41	Intercity rail (e.g., Amtrak)
Higher mode_type values take priority when multiple modes are reported on a single trip. mode_type on unlinked transit access and egress legs is assigned by the routing engine rather than from the participant's original survey response.

Table 12: Detailed mode_n to mode_type crosswalk, in priority order.

Secondary Transit Trip Unlinking

Initial pre-review processing used the Google Routes API to split rMove-recorded transit trips into their component legs. To split manually added, analyst-added, and online survey trips into access, transit, and egress legs, a secondary unlinking process was applied to the unlinked trip table after analyst review. This unlinking step identified transit trips without user-recorded or Google-derived leg splits and applied a set of rules to create “synthetic” access and egress legs where needed. This ensured that all transit trips had a consistent structure for downstream processing, even if the original survey response did not capture the full set of legs.

The unlinking algorithm applied a set of rules based on:

Whether consecutive trips had a short dwell time between them (consistent with a transfer rather than a true stop)
Whether the mode sequence suggested access/transit/egress
Whether the destination purpose was “change mode” (indicating a transfer point)

Unlinking returned flags that identify each trip’s role in the linked journey: is_access, is_egress, is_transit_leg, and is_primary_leg (the highest-priority mode segment).

For transit trips that were missing an access or egress segment – for example, because the walk to the bus stop was too short to be detected – a synthetic zero-distance leg was created as a placeholder to maintain a consistent data structure. These synthetic legs were flagged with transit_quality_flag values of “SA” (synthetic access) or “SE” (synthetic egress).

Mode codes on unlinked transit legs. When a transit trip was separated into access, transit, and egress segments, the mode on each segment was reassigned to reflect the actual travel mode of that segment (e.g., walk for an access leg, bus for the transit leg) based on the routing engine’s classification. As part of this reassignment, the original survey-reported mode codes (mode_1 through mode_N) on affected segments were cleared and set to a missing/not-applicable value (995). Analysts querying the detailed mode columns on transit access or egress legs will find these values blank; the mode_type column contains the correct grouped mode for each segment.

Derived Travel Variables

After all cleaning and linking was complete, the following derived variables were calculated:

distance_miles – trip distance converted from meters to miles
speed_mph – derived from distance and duration
duration_minutes – duration in minutes
depart_hour, depart_minute, arrive_hour, arrive_minute – time components for modeling software
travel_dow – day of week
mode_type – grouped mode category (e.g., car, transit, walk, bike). See Section 4.4.8.
mode_priority – the highest-priority mode used on the trip
speed_flag – indicator for trips with speeds exceeding plausible thresholds for their mode
teleport – indicator for spatial discontinuities between consecutive trips (destination of one trip is far from origin of the next)

Driver status. The driver variable indicates whether the trip-maker was the driver or a passenger for automobile trips. During processing, this variable was adjusted in two cases. First, if a person reported as “driver” was under the minimum driving age or did not hold a driver’s license, they were reclassified as a passenger. Second, if an automobile trip included exactly one licensed household member, that person was imputed as the driver regardless of their original response. These corrections maintained consistency between the driver variable and the person table’s age and license fields. Analysts performing auto occupancy or driver/passenger analyses should be aware that some driver values reflect these corrections rather than the original survey response.

Quality Assurance

A comprehensive suite of automated quality checks was applied to the final tables, producing an HTML diagnostic report and a CSV file of test results. Checks included verification of referential integrity across tables (every trip belongs to a valid person and day), consistency of trip counts, plausibility of speeds and distances, and completeness of required fields. A draft codebook documenting all variables and their value labels was also generated automatically.

4.5 TICTOC Processing

TICTOC (Trip Imputation, Coordination, and Tour Organization Compiler) prepared household travel survey data for travel forecasting by imputing selected missing trips, coordinating joint household travel, organizing unlinked trips into linked trips and tours, and adding model-facing attributes to the household, person, day, trip, linked trip, and tour outputs. Before organizing trips into linked trips and tours, TICTOC performed location-purpose correction, imputed missing joint trips for household members who traveled together but did not independently report the trip, and imputed selected child school trips when survey responses indicated school attendance but no corresponding trip was present. Each of these steps is described in detail in the sections that follow.

Joint Trip Detection and Imputation

Household travel surveys rely on individual participants to report their own travel. When household members travel together – for example, a parent driving children to school – each person’s trips should appear in the data. In practice, non-participant household members (particularly young children) often have missing or incomplete trip records.

TICTOC addresses this in two steps:

Detecting joint trips. For each household, the system examined all pairs of household members and identified trips that overlapped in both time and space. Trips were considered joint if they departed from and arrived at similar locations within similar time windows.
Imputing missing joint trips. When one household member had a reported trip that indicated joint travel with another member who did not have a corresponding trip, a new trip record was created for the missing member. The imputed trip was created from the host trip and populated using project-specific column-action rules that specified which attributes were copied directly from the host trip, which were filled with person-specific values (such as the target person’s demographics), and which received default or sampled values. Imputation occurred only when:
- The host trip indicated joint travel with the target person
- The imputed trip would not overlap with the target person’s existing trips
- The imputed trip would not create a spatial discontinuity (teleport) in the target person’s travel chain
- The host trip had not been dropped or flagged as invalid

School Trip Imputation

Children’s school trips are among the most commonly underreported trip types in household travel surveys, particularly for younger children who do not carry smartphones. TICTOC imputed school trips when survey responses indicated that a child attended school and no corresponding school trip was present in the data. Trips were not imputed when the available responses indicated that the child did not attend school, attended a different or unknown school location, was home-schooled, or was not a student.

Imputed school trips used the child’s reported home and school locations from the person table, the household’s reported usual school travel mode where available, and sampled departure times. Other trip attributes–such as occupancy, driver status, and mode details–were derived from project-specific rules rather than copied from another person’s trip. This distinguished school trip imputation from joint trip imputation, where most attributes came from a host trip.

Trip Linking

After trip imputation, TICTOC organized trips into linked trips. Linked trips were constructed from one or more unlinked trips, namely where “change mode” purposes indicated that multiple segments belonged to a single journey. The TICTOC-derived linked trip records were used downstream for tour organization, origin-destination analysis, and weighting. Each trip_linked record summarized one or more unlinked trip records into a single journey with the origin of the first segment, the destination of the last segment, and journey-level mode, distance, and duration.

Figure 3 illustrates this linked trip structure for downstream TICTOC processing.

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%
flowchart LR
    subgraph UNLINKED["Unlinked Trips (trip table)"]
        direction LR
        T1["Trip 1: Walk
        Home -> Bus Stop
        purpose: change_mode
        is_access: 1"]
        T2["Trip 2: Bus
        Stop A -> Stop B
        purpose: change_mode
        is_transit_leg: 1"]
        T3["Trip 3: Walk
        Bus Stop -> Work
        purpose: work
        is_egress: 1"]
        T1 --> T2 --> T3
    end

    subgraph LINKED["Linked Trip (trip_linked table)"]
        LT["Linked Trip
        Mode: Transit (bus)
        Home -> Work
        Access: Walk | Egress: Walk
        Distance: sum of all segments"]
    end

    UNLINKED --> LINKED

    classDef linked_trip fill:#F68B1F,stroke:#C66916,color:#000000,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
    classDef unlinked_trip fill:#E4572E,stroke:#BA3F21,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92

    style UNLINKED fill:#E4572E,stroke:#BA3F21,color:#ffffff,fill-opacity:0.14,stroke-width:1.75px
    style LINKED fill:#F68B1F,stroke:#C66916,color:#000000,fill-opacity:0.16,stroke-width:1.75px

    linkStyle default stroke:#475569,stroke-width:2.25px

    class T1,T2,T3 unlinked_trip
    class LT linked_trip

Figure 3: Unlinked transit segments summarized into a single linked transit trip for downstream TICTOC processing.

Note

Multi-leg transit trips and intermediate transfer segments.

The example above shows a simple three-segment journey (access → transit → egress). Some linked trips include multiple transit vehicles — for example: bike to bus stop, bus leg, short transfer, second bus leg.

For these multi-leg linked trips:

is_access is assigned only to the first leg, when it is non-transit and immediately precedes a transit leg.
is_egress is assigned only to the last leg, when it is non-transit and immediately follows a transit leg.
An intermediate non-transit segment between two transit legs — such as a walk or bike transfer between buses — will appear as its own row in trip_unlinked under the same linked_trip_id, but will carry none of these flags: it is not is_access, not is_egress, and not is_transit_leg.

These intermediate segments are present in the unlinked trip table, but they should be treated as an occasional byproduct of the Google Routes API un-linking process or unusually detailed participant recording of trips rather than a guaranteed analytical construct. Their detailed mode fields (mode_1 through mode_N) are typically set to 995 (not applicable); mode_type reflects the routing engine’s classification (usually walk or bike).

To identify intermediate transfer-like segments in trip_unlinked, filter to rows where all four conditions hold:

is_transit == 1 — part of a transit linked trip
is_transit_leg == 0 — not the transit vehicle leg
is_access == 0
is_egress == 0

Time and distance for these segments are included in the linked trip totals in trip_linked.

Linked Trip Mode Assignment

TICTOC assigned a single mode to each linked trip through a two-step process that operated on the raw survey mode values (mode_n columns) across all constituent unlinked trips — not on the mode_type variable derived during post-review processing.

Step 1: Group raw survey modes. TICTOC collected every populated mode_n value across all unlinked segments belonging to the linked trip and mapped each value to an intermediate mode group using the project-configurable crosswalk in Table 13. For example, a respondent who selected “Local bus” (value 23) and “Walk” (value 1) on separate segments would contribute both LOCAL and WALK to the mode group set. All unique mode groups present across all segments were collected into a single set for use in Step 2.

Survey Mode Value	Survey Mode Label
Step 1: Survey mode value to mode group crosswalk
Each raw mode_n value is mapped to an intermediate group before the hierarchy is applied
SCHOOLBUS
24	School bus
LONGDIST
25	Intercity bus (e.g., Greyhound)
31	Airplane/helicopter
41	Intercity rail (e.g., Amtrak)
REGIONAL
55	Express/commuter bus
58	Commuter rail
78	Other public ferry or water taxi
79	Vehicle ferry (took vehicle on board)
LOCAL
23	Local bus
26	Other private shuttle/bus (e.g., a hotel's, an airport's)
28	Other bus
30	Subway
38	University/college shuttle/bus
39	Light rail/trolley
42	Other rail
61	Rapid transit bus (BRT)
62	Employer-provided shuttle/bus
102	Bus, shuttle, or vanpool
105	Rail (e.g., train, subway)
DRIVE
6	Household vehicle 1
7	Household vehicle 2
8	Household vehicle 3
9	Household vehicle 4
10	Household vehicle 5
11	Household vehicle 6
12	Household vehicle 7
13	Household vehicle 8
14	Household vehicle 9
15	Household vehicle 10
16	Other vehicle in household
17	Rental car
18	Carshare service (e.g., Zipcar)
21	Vanpool
22	Other vehicle (not my household's)
27	Medical transportation service
33	Car from work
34	Friend/relative/colleague's car
47	Other motorcycle in household
54	Other motorcycle (not my household's)
59	Peer-to-peer car rental (e.g., Turo)
76	Carpool match (e.g., Waze Carpool)
100	Household vehicle (or motorcycle)
101	Other vehicle (e.g., friend's car, rental, carshare, work car)
BIKE
2	Standard bicycle (my household's)
3	Borrowed bicycle (e.g., a friend's)
4	Other rented bicycle
56	Other personal bicycle (e.g., cargo, tandem, etc.)
82	Electric bicycle (my household's)
103	Bicycle or e-bicycle
PERSONAL MOBILITY
43	Skateboard or rollerblade
44	Golf cart
45	ATV
74	Segway
77	Personal scooter or moped (not shared)
80	Other boat (e.g., kayak)
81	Snowmobile
83	Scooter-share (e.g., Bird, Lime)
107	Micromobility (e.g., scooter, moped, skateboard)
TNC
36	Regular taxi (e.g., Yellow Cab)
49	Uber, Lyft, or other smartphone-app ride service
60	Other hired car service (e.g., black car, limo)
106	Uber/Lyft, taxi or car service
200	Paratransit/Dial-A-Ride (e.g., The RIDE)
SHARED
69	Bike-share - standard bicycle
70	Bike-share - electric bicycle
WALK
1	Walk (or jog/wheelchair)
OTHER
5	Other
75	Other
104	Other
995	Missing Response
DRIVE is an intermediate group only and does not appear directly as a linked trip mode. It is further resolved into SOV, HOV2, or HOV3 in Step 2 based on vehicle occupancy.

Table 13: Step 1: Survey mode values to intermediate mode groups.

Step 2: Apply the mode hierarchy. Given the set of mode groups collected in Step 1, TICTOC walked down the priority-ordered hierarchy in Table 14 and assigned the linked trip the first mode group present in the set. This meant that a more “significant” mode always took precedence: a trip that included any transit segment would be classified as LOCAL or REGIONAL transit regardless of how many walk segments accompanied it, and a long-distance trip would outrank a local transit trip.

The one exception to a simple group-wins rule is DRIVE. When DRIVE was present in the mode group set, the final linked_trip_mode — SOV, HOV2, or HOV3 — was determined by the maximum number of travelers (num_travelers) reported across all constituent unlinked trips: a single traveler yields SOV, two travelers yields HOV2, and three or more yields HOV3.

If all mode_n values were missing across every segment of the linked trip, the linked_trip_mode was set to missing. This is the case for incomplete survey responses where no mode information is available. Analysts can filter out these records when analyzing mode share or apply imputation rules as needed; these records are unweighted and therefore automatically excluded from weighted analyses.

Linked Trip Mode	Priority (1 = highest)
Step 2: Linked trip mode hierarchy
TICTOC assigns the first matching mode group present in the set
SCHOOLBUS	1
LONGDIST	2
REGIONAL	3
LOCAL	4
HOV3 (3 or more person occupancy vehicle)	5
HOV2 (2-person occupancy vehicle)	6
SOV (Single-occupancy vehicle)	7
BIKE	8
PERSONAL MOBILITY	9
TNC	10
SHARED	11
WALK	12
OTHER	13
HOV3, HOV2, and SOV all derive from the DRIVE mode group. The split is determined by the maximum `num_travelers` across all constituent unlinked trips: 1 = SOV, 2 = HOV2, 3+ = HOV3. The priority order is project-configurable.

Table 14: Step 2: Linked trip mode hierarchy.

Example. Consider a linked transit trip consisting of three unlinked segments: a walk to the bus stop (mode_n value 1 → WALK), a local bus ride (mode_n value 23 → LOCAL), and a walk from the stop to the destination (mode_n value 1 → WALK). The mode group set is {WALK, LOCAL}. TICTOC checks the hierarchy from the top: SCHOOLBUS — not present; LONGDIST — not present; REGIONAL — not present; LOCAL — present. The linked trip is assigned mode LOCAL.

Linked Trip Purpose Assignment

TICTOC assigned a single destination purpose to each linked trip from the final unlinked segment’s d_purpose. Because the intermediate segments of a transit journey carry d_purpose = "change mode" by convention, the last segment’s destination purpose reflects the true trip destination — where the traveler actually ended up and why.

Origin purpose on the linked trip was taken from the first segment’s o_purpose, following the same convention used for unlinked trips.

As a result, analysts working with trip_linked can use d_purpose and o_purpose directly for trip-purpose summaries without needing to filter out “change mode” records, which are an artifact of the unlinked segment structure and do not appear on linked trip records.

Tour Organization

A tour is a sequence of trips that begins and ends at the same anchor location. Most tours are home-based – the traveler departs from home, makes one or more stops, and returns home. At-work subtours begin and end at the workplace (for example, leaving work for lunch and returning). Open-jawed tours occur when a travel day begins or ends away from home, so that the full home-to-home circuit is not completed within the observed day.

TICTOC organized all of a person’s daily trips into tours by:

Identifying anchor locations (departures from and returns to home, or to work for subtours)
Grouping intermediate trips into the tour they belong to
Scoring candidate primary destinations based on activity duration, purpose, and trip characteristics using configurable scoring functions (see Section 4.5.4.1 below).
Assigning a tour purpose based on the primary destination
Identifying sub-tours within parent tours
Classifying each person’s daily activity pattern (e.g., “mandatory” for days with work/school tours, “non-mandatory” for discretionary travel only, “home” for days with no travel)

Figure 4 illustrate how TICTOC organized unlinked trips into tours and subtours for a closed and open-jawed example.

Home-based work tour with at-work subtour:

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%
flowchart LR
    HOME(("Home")) -- "Trip 1: Car" --> WORK["Work"]
    WORK -- "Trip 2: Walk" --> LUNCH["Lunch"]
    LUNCH -- "Trip 3: Walk" --> WORK
    WORK -- "Trip 4: Car" --> HOME

    classDef household fill:#024D5F,stroke:#013845,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef person fill:#0D7993,stroke:#085C70,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef tour fill:#FDD835,stroke:#C2A200,color:#000000,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90

    linkStyle default stroke:#475569,stroke-width:2.4px

    class HOME household
    class WORK person
    class LUNCH tour

Open-jawed tour (day begins away from home):

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%
flowchart LR
    WORK(("Work
    (day starts here)")) -- "Trip 1: Car" --> GROCERY["Grocery"]
    GROCERY -- "Trip 2: Car" --> HOME(("Home"))

    classDef household fill:#024D5F,stroke:#013845,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef person fill:#0D7993,stroke:#085C70,color:#ffffff,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90
    classDef tour fill:#FDD835,stroke:#C2A200,color:#000000,font-family:Inter,stroke-width:2.75px,font-weight:bold,fill-opacity:0.90

    linkStyle default stroke:#475569,stroke-width:2.4px

    class HOME household
    class WORK person
    class GROCERY tour

Figure 4: Examples of home-based and open-jawed tours used in TICTOC tour organization.

Tour Purpose Scoring and Primary Destination

Tour purpose was determined by the primary destination — the most “important” stop on the tour.
Because a tour includes multiple trips, one destination was selected as the representative stop, and its purpose became the tour purpose.

For tours with mandatory travel (work or school), primary-destination selection was straightforward: the mandatory stop took precedence.

Complexity was higher for discretionary tours. For example, a tour might include two shopping stops (e.g., grocery store and auto shop), where purpose alone did not clearly identify which stop was primary.

TICTOC resolved this using a weighted penalty method:

Each candidate destination was scored using a decay function of duration, stratified by purpose.
The destination with the minimum penalty score was selected as primary.
Duration was defined as:
- time spent at the destination (dwell time), plus
- duration of the preceding trip, plus
- duration of the subsequent trip.

This approach gave more favorable scores to destinations reached via longer trips or with longer dwell times, reflecting greater relative importance within the tour.

Purpose priority was built into scoring:

Mandatory destinations (work, school) received higher base scores than discretionary activities, ensuring a work or school stop was selected whenever present.
Discretionary destinations were differentiated by the duration-based decay function, with scoring functions stored in project-configurable files delivered with the data.

The assigned tour purpose was taken from the primary destination’s purpose category (e.g., “work,” “school,” “shop”).

At-work subtours — sequences that departed from and returned to the workplace within a parent tour — were identified separately and assigned their own tour purpose using the same scoring logic.

Additional TICTOC Outputs

TICTOC appended model-facing fields to the household, person, day, and trip tables, and produced new linked trip and tour tables. Key additions include:

Daily activity pattern (daily_activity_pattern) on the day table, classifying each person-day as mandatory, non-mandatory, or home
Tour identifiers (tour_id, tour_num) on trips and the tour table
Linked trip identifiers (linked_trip_id) connecting unlinked trip segments to their linked trip record
Stop counts on tours, indicating the number of intermediate stops
Joint travel indicators identifying which trips were taken with other household members
Escorting attributes identifying trips where one household member accompanies another
Imputation flags distinguishing reported trips from imputed joint and school trips (see Section 4.6 for details)
Summary diagnostics documenting imputation rates, tour distributions, and data quality metrics

4.6 Reference: Flags and Classifications

The following tables provide reference definitions for flags and classification variables used throughout the delivered tables.

Trip Flag Reference

Table 15 summarizes the main trip-level flags included in the delivered trip table.

Flag	Values	Meaning	Filtering Guidance
Trip-level flags in the delivered trip table
browser	0/1	Trip created via browser survey (not GPS)	Exclude for GPS-quality analysis
added_trip	0/1	Trip manually added by analyst or participant	No GPS trace; OD-routed distance
split_loop	0/1	Trip created by splitting a loop trip	Original loop no longer exists
unlinked_trip	0/1	Trip is a segment of a transit journey	Use trip_linked for O-D analysis
is_primary_leg	0/1	Highest-priority mode leg in a linked trip	Use to avoid double-counting linked trips
is_access / is_egress	0/1/995	Role in a linked transit trip (995 = not applicable)	--
is_synthetic_transit_leg	0/1	Placeholder leg for missing access/egress	Distance and duration are NA
speed_flag	0/1	Speed exceeds plausible threshold for mode	Review or exclude
teleport	0/1	Gap >= 250m between destination and next trip's origin	May indicate missing trip
copied_from_proxy	0/1	Trip record copied from a proxy reporter	Same trace as reporter's trip

Table 15: Trip-level flags in the delivered trip table.

TICTOC Imputation Flags

Additional flags identify imputed and modified records. Table 16 summarizes the additional TICTOC-specific fields used to identify imputed and coordinated records.

Field	Description
TICTOC-specific flags and identifiers
imputed_record_type	Indicates whether the trip is reported (0), imputed as a joint trip, or imputed as a school trip
imputed_host_trip	For joint trip imputations, the trip_id of the household member's trip that served as the basis for the imputed record
imputed_joint_trip	Flag indicating whether this trip was created through joint trip imputation
joint_trip_id	Identifier grouping household members who traveled together on the same trip
daily_activity_pattern	Person-day classification: mandatory, non-mandatory, or home
These fields allow analysts to distinguish reported travel from imputed travel and to identify joint-travel episodes.

Table 16: TICTOC-specific flags and identifiers.

Joint Travel Taxonomy

TICTOC classifies joint travel at both the trip and tour level:

Non-joint: Trip or tour made by the person alone
Partially joint: Some but not all segments of the tour include another household member
Fully joint: All segments of the tour are shared with another household member
Joint tour participants: Identifiers linking all household members sharing a tour
Escorting: Trips where the primary purpose is to transport another household member (e.g., driving a child to school); further classified by whether the escort makes a dedicated round trip or chains the escort with other activities

4.7 Delivered Data Products

Table 17 summarizes the delivered tables and their units of observation.

Table	Records	Unit of Observation	Source
Delivered tables
Household	One per household	Household	Survey + processing
Person	One per person	Person	Survey + processing
Day	One per person per travel day	Person-day	Survey + processing + TICTOC
Vehicle	One per household vehicle	Vehicle	Survey
Trip	One per unlinked trip (includes imputed joint and school trips)	Trip	Survey + processing + TICTOC
Trip Linked	One per linked journey	Linked trip	TICTOC
Location	GPS trace points per trip	Location point	Survey app + processing
Tour	One per tour	Tour	TICTOC
Joint Tour Participant	One per person per joint tour	Person-tour	TICTOC
All tables are accompanied by a codebook listing every variable, its data type, and value labels.

Table 17: Delivered tables.

All tables are accompanied by a codebook listing every variable, its data type, and its value labels (see Section 7). Weighted versions of these tables are produced separately by the weighting process and documented in Section 5.

For questions about specific variables, processing decisions, or data quality metrics for your study, please contact the project team.

5 Weighting

This section summarizes the weighting and expansion procedures used in the Massachusetts Travel Study dataset. The goal of weighting is to expand the survey sample so that it represents the full resident population of Massachusetts.

The Massachusetts Travel Study leveraged two related weighting workflows. The standard weights represent travel behavior on a typical weekday, using Monday through Thursday travel. The day-of-week weights represent travel behavior on each day of the week, including Friday, Saturday, and Sunday. Both workflows used the same general approach, but they differed in the records included, the weighting geographies, and the way the final weights should be used in analysis.

What are survey weights?

To produce statistics that represent an entire population without surveying every household or individual, survey researchers assign weights to each completed observation. In household travel surveys, the survey weight indicates how many people, households, days, or trips in the population a given respondent or record is estimated to represent. By applying these weights, analysts can generate regional estimates even when the sample is only a small fraction of the full population.

5.1 Overview of Weighting Goals

The weighting process aligns weighted survey estimates with external population totals and distributions across key household, person, day, and trip characteristics. Weighting corrects for differential sampling, differences in survey completion across demographic groups, and systematic differences in trip reporting that arise from the method respondents used to report their travel, such as smartphone app, web diary, or call center.

For the Massachusetts Travel Study, this process produced two sets of final weights. The standard weighting process expanded the survey sample to represent travel on an average weekday across Monday, Tuesday, Wednesday, and Thursday. The day-of-week weighting process expanded the survey sample separately for each day across Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. The day-of-week weights are most useful when an analysis explicitly depends on the day of week, such as comparing weekday and weekend travel or estimating travel totals for a specific day.

The day-of-week workflow built on the standard weighting workflow. Both workflows began with initial expansion, adjusted household weights to demographic targets, accounted for day-pattern reporting differences by diary platform, derived person, day, and trip weights, and then applied trip-level adjustments. The main difference was that the day-of-week workflow repeated the relevant weighting steps separately by day, so the weighted records represent Monday travel, Tuesday travel, Saturday travel, and so on, rather than one average weekday.

Table 18 summarizes the practical difference between the two weighting workflows.

Dimension	Standard weighting	Day-of-week weighting
Travel days represented	Monday through Thursday as an average weekday	Monday through Sunday, weighted separately by day
Recommended use	Default weighted workflow for typical weekday summaries and most standard reporting	Day-specific, weekday/weekend, and weekend travel analysis
Geographic controls	Custom client-defined weighting zones developed for the project	Broader Boston / Not Boston weighting groups
Completion basis	Complete eligible weekday travel days	Complete travel data for the specific day being weighted
Day weights	Person weights are divided across complete eligible weekdays	Day weights are equal to the person weight for that day
Trip weights	Trip totals represent an average weekday	Trip totals represent a specific day of week

Table 18: Comparison of standard and day-of-week weighting workflows.

5.2 What the Weights Represent

Across all steps, the weighting process produces final weights at multiple analytic levels. These weights allow the survey records to represent households, people, person-days, trips, linked trips, and tours in the study area.

Household weight: expands each surveyed household to represent households in the study area.
Person weight: expands each person to represent the population of persons.
Day weight: expands each complete person-day to represent daily travel, with the interpretation depending on whether the standard or day-of-week weighting workflow is used.
Unlinked trip weight: expands individual trip segments, including transit access, transfer, and egress legs.
Linked trip weight: expands complete trips between an origin and destination, with intermediate transit transfers combined into a single trip.
Tour weight: expands sequences of linked trips that begin and end at the same location.

For this study, the standard final weights represent a typical weekday based on Monday through Thursday travel. Standard day weights represent average weekday person-days, and standard trip, linked trip, and tour weights represent travel on an average weekday.

When the analytic question is explicitly about differences across Monday through Sunday, use the alternate day-of-week weighting workflow described in Section 16. The day-of-week weights produce a separate set of weights for each day of the week, so weighted estimates can represent Monday travel, Tuesday travel, weekend travel, or other day-specific comparisons.

Some Weights are Zero

The final dataset may contain weights equal to zero. When a weight is equal to zero, it means that the record is present in the delivered data, but was not eligible to receive that particular weight.

Records may receive zero weights for the following reasons:

Partially complete records. For example, if a household participated for seven days but only provided three days of complete diary data, the incomplete days would be retained in the delivered data but would not receive positive day weights.
For households with children, days without complete proxy-reported child travel. For the standard weights, households with children needed complete reported travel for children on the weighted travel day. Some additional household days involving children may therefore receive zero standard weights if they do not meet the standard completion rules. The day-of-week weights use a relaxed child-completion rule because children’s travel was proxy-reported for only one day.
For standard weights, days outside of the standard “typical weekday” definition. The standard weights represent typical weekday travel based on Monday through Thursday. Friday, Saturday, and Sunday records are therefore not eligible to receive positive standard day, trip, linked trip, or tour weights.
For day-of-week weights, day-specific eligibility. The day-of-week weights are assigned separately for each day of the week. A record with a zero standard weight may still receive a positive day-of-week weight if it meets the completion criteria for that specific day.

Analysts should treat zero weights as specific to the weight being used. A zero value does not necessarily mean that the record is invalid or unusable for all analyses.

5.3 Inputs to Weighting

The Massachusetts Travel Study used two primary inputs in the weighting process: survey data and population target data. These same general inputs supported both the standard weights and the day-of-week weights, but the eligible survey records, weighting geographies, and control totals differed between the two sets of weights.

Survey Data

The survey data consisted of cleaned household, person, day, and trip records that met the completion criteria for weighting. The records eligible for weighting depended on whether the standard weights or day-of-week weights were being created.

For the standard weights, households were included if they provided complete data for at least one Monday, Tuesday, Wednesday, or Thursday travel day. These records support estimates of travel on an average weekday.

For the day-of-week weights, survey records were evaluated separately for each day of the week. A household was included for a given day only when it provided complete data for that specific day. As a result, the set of records that receive positive day-of-week weights can differ across Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Before weighting, missing demographic values needed for weighting were imputed where possible. These included income, gender, race, and ethnicity. The imputed values were used to reduce missingness in the demographic control variables used during weighting.

Population Target Data

The target data provided the household and person control totals used to adjust the survey records to the study area population. Demographic weighting targets were developed from the 2023 ACS 1-year Public Use Microdata Sample (PUMS). Selected auxiliary inputs used in imputation, including block-group income distributions, were drawn from 2023 ACS 5-year data.

The target data provided total household and population counts for each weighting geography, as well as detailed demographic distributions used as control totals in weighting. These controls included household characteristics, person characteristics, and selected travel-related controls where appropriate.

Weighting Geographies

The standard weights and day-of-week weights used different weighting geographies.

The standard weights used custom weighting zones developed for this study (Figure 5). These zones were based on MPO geographies, with smaller areas grouped where needed to maintain enough sample for stable weighting. The weighting zones were designed to balance the need for geographic specificity with the need for stable estimation across a range of demographic and travel behavior targets.

Figure 5: Standard weighting zone groups.

The day-of-week weights used broader geographic groups because estimating weights separately for each day reduces the available sample size. For this reason, the day-of-week weights were developed using a simpler Boston / Not Boston geography (Figure 6). In the weighting memo, the “Boston” group is described as the four inner-most commuter rings; the remaining areas are included in the “Not Boston” group.

Figure 6: Day-of-week weighting zones used for alternate weekday- and weekend-specific weights.

Analyst Tip: Interpreting Weighting Geography

The practical implication is important for analysis. Standard weights are calibrated to the custom weighting zones used for the typical weekday workflow. Day-of-week weights are calibrated to broader Boston / Not Boston geographies by day. Estimates are generally more stable when summarized at or above the geography used in weighting. Estimates for smaller geographies, or geographies that cut across weighting zones, should be interpreted with additional caution and should be accompanied by checks of unweighted sample size and weight variability (e.g., standard errors, design effects, or effective sample size).

Targets

Targets are the specific demographic and household distributions that the weighting procedure seeks to align between the survey sample and the underlying population estimates. For the Massachusetts Travel Study, targets were defined using ACS PUMS data to promote statistical representativeness across the weighting geographies.

Each target represents a key dimension of the study area’s population and travel behavior that is important for accurate expansion of survey data to reflect the total population. Target variables span household characteristics, person-level attributes, and selected travel-related controls. At the highest level, the weighting process was constrained to match the total number of households and total number of persons in the study area and within the relevant weighting geographies.

Total households in the study area: approximately 2,816,000
Total persons in the study area: approximately 6,760,613

The standard and day-of-week workflows use similar target concepts, but some day-of-week categories are combined to improve stability after the sample is split by day. For example, the day-of-week process uses broader geographic controls and combines selected target levels where the day-specific sample is smaller.

Household-level targets

Table 19 summarizes the household-level target categories used in the two weighting workflows.

Variable	Standard weighting categories	Day-of-week weighting categories
Household size	1 person; 2 people; 3 people; 4 people; 5 people or more	Same as standard weighting
Income	Under $25,000; $25,000-$49,999; $50,000-$74,999; $75,000-$99,999; $100,000-$199,999; $200,000 or more	Same as standard weighting
Workers	0 workers; 1 worker; 2 workers; 3 workers or more	0 workers; 1 worker; 2 workers or more
Vehicles	No vehicles; at least one vehicle and fewer vehicles than drivers age 16 or older; vehicles greater than or equal to drivers	Same as standard weighting
Presence of children	0 children; 1 or more children	Same as standard weighting
Total households	Total households by weighting geography	Total households by weighting geography and day

Table 19: Household-level weighting targets by workflow.

Person-level targets

Table 20 summarizes the person-level target categories used in the two weighting workflows.

Variable	Standard weighting categories	Day-of-week weighting categories
Gender	Male; female	Same as standard weighting
Age	Under 5; 5-15; 16-17; 18-24; 25-44; 45-64; 65 or older	Under 5; 5-17; 18-24; 25-44; 45-64; 65 or older
Worker status	Full-time worker; part-time worker; non-worker	Same as standard weighting
Commute mode	Work from home; walk; bike; transit; Other (include auto); not applicable	Work from home; walk; bike; transit; other; not applicable
University student status	University student; not a university student	Same as standard weighting
Educational attainment	Some college education; no college education	Same as standard weighting
Race	African American; Asian Pacific; White; Other	Same as standard weighting
Ethnicity	Hispanic; Non-Hispanic	Same as standard weighting
Total persons	Total persons by weighting geography	Total persons by weighting geography and day

Table 20: Person-level weighting targets by workflow.

Travel-related controls

The standard weights included a regional transit trip target to address overrepresentation of transit trips in the survey data. The day-of-week weights did not use the same transit-trip control target.

Combined Weighting Targets

The categories listed above summarize the household- and person-level controls used in weighting. Analysts can use them as a quick reference for the dimensions and levels at which the weighted survey was calibrated to known population totals.

In practice, the weighting controls do two things simultaneously:

match the total households and total persons in each weighting zone group; and
match the marginal distributions of these household and person characteristics within each weighting zone group.

The controls do not guarantee that every cross-classification of those characteristics is perfectly represented. For example, age and income may each match target distributions, while age by income may still reflect sampling variability.

Some target categories were simplified, combined, or selectively applied to maintain stable estimation in smaller geographies and to avoid over-constraining the weighting process. Analysts should therefore interpret these categories as the effective levels at which the survey was calibrated to known population totals.

Analyst Tip: What Weights Can and Cannot Correct

Weighting targets define the population dimensions used to calibrate the survey. These controls make the marginal distributions (e.g., age, gender, income groups) in the weighted data match known population totals. However, there are important limitations:

1. Weighting improves representativeness only within defined categories.
Estimates are most reliable at the level of the weighting targets. More detailed breakdowns (e.g., finer income bins) were not controlled and may still reflect sampling variability or bias. In practice, targets define the finest level of safe aggregation.

2. Joint distributions are not guaranteed to match the population.
Weights align individual targets, not combinations of them. For example, age and race may each match population totals, but age x race may still be misrepresented. Be cautious with highly disaggregated cross-tabulations.

3. Non-targeted variables and small cells may be unstable.
Variables not included in weighting controls are not explicitly bias-corrected. Small or sparse groups remain unstable after weighting, especially when weights are large or variable. We recommend checking cell sizes and relative standard errors (RSEs) before interpreting results, especially when sample sizes are small.

4. Weighting does not correct measurement error.
Targets adjust who is represented, not what was reported. Misreporting or limitations in survey design (e.g., coarse mode categories) are not fixed through weighting.

Other useful diagnostics include the effective sample size, which reflects the equivalent number of equally weighted observations, and the design effect, which captures how weighting inflates variance (see Section 5.6.3 below).

Bottom line:
Weighting improves representativeness along specific dimensions, but it does not guarantee reliable estimates for all subgroups. Use targets as a guide to where estimates are most trustworthy.

5.4 Weighting Process

The flow chart below summarizes the full weighting workflow described in this section. The sections that follow explain each step in more detail.

%%{init: {"theme":"default","flowchart":{"htmlLabels":true,"curve":"basis"}}}%%

flowchart TD

  Survey[("Survey Data<br/>(households, persons, days, trips)")]
  Census[("Target Data<br/>(ACS PUMS and project controls)")]
  Targets[["Household and Person Targets"]]
  Base{"Base Weight Estimation"}
  P1{"Round 1:<br/>Demographic Reweighting"}
  DP{"Day-Pattern Modeling"}
  DayTargets[["Day-Pattern Targets"]]
  P2{"Round 2:<br/>Day-Pattern Reweighting"}
  PersonDay{"Person and Day Weight Derivation"}
  TripAdj{"Round 3:<br/>Trip Adjustment"}
  FinalWeights[/"Final Household, Person,<br/>Day, Trip, Linked Trip,<br/>and Tour Weights"/]

  Survey --> Base
  Census --> Targets
  Base --> P1
  Targets --> P1
  P1 --> DP
  DP --> DayTargets
  Targets --> P2
  DayTargets --> P2
  P2 --> PersonDay
  PersonDay --> TripAdj
  TripAdj --> FinalWeights

  classDef source fill:#024D5F,stroke:#013845,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
  classDef control fill:#1B9E77,stroke:#15785B,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
  classDef process fill:#0D7993,stroke:#085C70,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92
  classDef weight fill:#695CB4,stroke:#4B4180,color:#ffffff,font-family:Inter,stroke-width:2.5px,font-weight:bold,fill-opacity:0.92

  linkStyle default stroke:#475569,stroke-width:2.25px

  class Survey,Census source
  class Targets,DayTargets control
  class Base,P1,DP,P2,PersonDay,TripAdj process
  class FinalWeights weight

Base Weights

Weighting began with base weights, which reflected the probability that a household was included in the survey. For each sample segment, RSG calculated a base weight as the inverse of the probability of inclusion, which depended on both the probability of selection and the probability of response. For segment s with H total households and R responding households, the base weight can be understood as:

\[ w_s = \frac{H_s}{R_s} \]

Base weights provided the initial expansion from the sample to the population and served as the seed weights for subsequent rounds of weighting adjustments. For the day-of-week workflow, the same concept was applied separately by day. If a segment had a different number of complete records on Monday than on Tuesday, then the initial expansion could differ by day.

Round 1 Weighting: Adjusting for Demographic Bias

Round 1 weighting used PopulationSim to adjust base weights so that weighted survey estimates matched demographic control totals derived from ACS PUMS. PopulationSim performed constrained entropy maximization, adjusting household weights in the smallest way necessary to match a set of household- and person-level targets.

What is Entropy Maximization?

Entropy maximization is a statistical method used to adjust survey weights so that the weighted survey data matches known population totals, such as the number of households, adults, workers, or children in a region.

The key idea is simple: change the initial weights as little as possible while forcing the final weighted totals to match external control totals. Groups that were underrepresented in the sample receive higher weights, while groups that were overrepresented receive lower weights. This approach preserves the structure of the collected data while helping the survey reflect the population.

For standard weighting, this reweighting process was applied to the Monday-through-Thursday records together to represent an average weekday. For day-of-week weighting, weights were estimated to match the household- and person-level targets for each day of the week. In the end, a household could have different weights for different days, depending on which days of complete travel data were available and how those records fit the day-specific controls.

The output of Round 1 consisted of target-optimized household weights. These weights aligned the survey with demographic targets and served as inputs to the day-pattern adjustment described below.

Round 2 Weighting: Adjusting for Day-Pattern Bias

Survey trip rates differed across diary platforms, in part because smartphone app users tended to report more complete travel than online diary or call center respondents. To address this issue, RSG applied a day-pattern adjustment before finalizing household, person, and day weights.

RSG classified each person-day into three mutually exclusive day-pattern categories: made no trips, made mandatory trips, or made only non-mandatory trips. Mandatory trips are trips to work, work-related activities, school, or school-related activities. The day-pattern model estimated how likely each person-day was to fall into one of these categories after accounting for demographic characteristics and diary platform.

For standard weighting, the day-pattern adjustment represented the Monday-through-Thursday average weekday. For day-of-week weighting, the same general procedure was applied separately by day, with an additional day-of-week term in the model. The resulting day-pattern targets were added to a second PopulationSim run so that the final household weights accounted for both demographic targets and diary-platform reporting differences.

The table below shows the general direction of the day-pattern adjustment for the day-of-week workflow. For Monday through Thursday, the adjustment reduces the share of no-travel days for online diary and call center respondents and increases the share of days with reported travel. Online and call center diaries were collected Monday through Thursday, so no analogous adjustment is needed for Friday through Sunday trip records.

Day	Day type	Call center before	Call center after	Online diary before	Online diary after	Smartphone
Mon	No trips	26%	20%	22%	13%	12%
Mon	Made mandatory trips	18%	23%	42%	42%	45%
Mon	Made only non-mandatory trips	56%	57%	36%	45%	42%
Tue	No trips	26%	21%	22%	13%	12%
Tue	Made mandatory trips	18%	23%	44%	43%	48%
Tue	Made only non-mandatory trips	55%	56%	35%	44%	40%
Wed	No trips	25%	20%	21%	12%	11%
Wed	Made mandatory trips	19%	24%	44%	44%	48%
Wed	Made only non-mandatory trips	56%	56%	35%	44%	41%
Thu	No trips	25%	19%	21%	12%	11%
Thu	Made mandatory trips	21%	27%	43%	43%	47%
Thu	Made only non-mandatory trips	54%	54%	36%	45%	41%

Table 21: Illustrative day-pattern adjustment by weekday and diary platform.

Adjusting Person and Day Weights

After household weights were finalized, person and day weights were derived from the household weights. Person weights were created by assigning the household weight to each household member. Because the survey does not collect travel diaries from unrelated household members, unrelated persons received a person weight of zero and their weight was redistributed evenly among the remaining related household members.

Day weights were then assigned to complete person-days. This was one of the most important differences between the standard and day-of-week workflows. In standard weighting, a person with multiple complete eligible weekdays had their person weight divided across those complete days, so the resulting day records collectively represented that person’s average weekday contribution. In day-of-week weighting, the weights were already specific to a day of week, so the day weight was equal to the person weight for that day.

Round 3 Weighting: Adjusting for Trip-Type Reporting Bias

The final weighting step corrected for under-reporting of specific trip types across diary platforms. Trip records were grouped into work, school, and other trip categories. For each trip type, RSG estimated a weighted model predicting the number of trips per person-day and used the model to calculate a trip adjustment factor.

The adjustment factor was applied to unlinked trip weights, using the final day weight as the starting point. The adjustment was designed to account for under-reporting of stops in the trip diary, such as a brief stop that a respondent forgot to record. For day-of-week weighting, the same adjustment concept was applied to the day-specific weights. Because online diary and call center records were only collected Monday through Thursday, the adjustment was relevant to those days and platforms.

The table below summarizes the trip adjustment factors used in the day-of-week workflow.

Trip type	Online diary adjustment	Call center adjustment
Work trips	1.62	1.51
School trips	1.87	1.00
Other trips	1.50	1.15

Table 22: Day-of-week trip adjustment factors by trip type and diary platform.

After the unlinked trip weights were adjusted, linked trip weights and tour weights were calculated from the updated trip weights. Linked trip weights represent complete origin-to-destination trips, while tour weights represent sequences of linked trips that begin and end at the same location.

5.5 Weighted Totals

The final weights expand the survey to population totals at the household, person, day, trip, linked trip, and tour levels.

Table 23 shows the total weighted households, persons, person-days, trips, linked trips, and tours for the standard weekday workflow. These totals represent the overall scale of travel in the study area on an average weekday across Monday through Thursday.

Weight Level	Weighted Total
Final Weighted Totals by Analysis Level
Household	2,814,595.3
Person	6,759,611.8
Day	6,759,611.8
Trip	30,078,666.6

Table 23: Weighted totals by analysis level - standard weekday weights

Table 24 summarizes the day-specific totals available for the alternate day-of-week workflow. These totals represent the scale of travel in the study area on each specific day of the week, with separate estimates for Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday travel.

Day	Weight Level	Weighted Total
Day-of-Week Weighted Totals
Monday	Day	6,751,877.4
Monday	Trip	30,625,904.0
Tuesday	Day	6,770,579.5
Tuesday	Trip	31,791,642.6
Wednesday	Day	6,765,954.5
Wednesday	Trip	32,235,056.1
Thursday	Day	6,764,262.0
Thursday	Trip	32,082,955.5
Friday	Day	6,745,710.1
Friday	Trip	33,560,235.3
Saturday	Day	6,739,056.6
Saturday	Trip	33,110,358.2
Sunday	Day	6,746,910.2
Sunday	Trip	27,489,425.3

Table 24: Weighted totals by analysis level and day of week - day-of-week weights

5.6 Additional Guidance for Analysts

Choosing the Right Weight

Different analyses require different weight types. Analysts should select the weight that matches both the level of measurement and the weighting workflow.

For most typical weekday summaries, analysts should use the standard weights. Standard household, person, day, trip, linked trip, and tour weights are appropriate when the research question is about average weekday travel, especially for summaries that are not intended to distinguish Monday from Tuesday or weekday from weekend behavior.

Day-of-week weights should be used when the analytic question is explicitly about a specific day of week or about differences across days. Examples include comparing Saturday and Sunday travel, estimating Friday trip rates, or comparing weekday and weekend mode share. When using day-of-week weights, analysts should filter to the relevant day or group of days before applying the corresponding day-of-week weight. A Monday estimate should use Monday records and Monday weights; a Saturday estimate should use Saturday records and Saturday weights.

At each level, the same unit-matching principle applies:

Household weights should be used when households are the unit of analysis or when studying household-level characteristics.
Person weights should be used for demographic characteristics, person-level behaviors, and analyses where individuals, not days or trips, are the unit.
Day weights should be used when analyzing person-day travel behavior, including day patterns, trip rates, and average daily travel.
Trip weights should be used when analyzing trips.
Linked trip weights should be used when analyzing complete origin-to-destination linked trips.
Tour weights should be used when analyzing tours.

Using the wrong weight type can lead to biased estimates. For example, applying person weights to trip tables will underestimate total travel, while using day-of-week weights without filtering to the intended day can mix together records that represent different target days.

What the Weights Can and Cannot Correct

The weighting process corrects for several forms of bias:

differences in sampling likelihood across geographies;
differential response rates across demographic groups;
reporting differences across diary platforms, including smartphone, online diary, and call center reporting; and
under-reporting of specific trip types.

However, weighting cannot correct for every possible source of error. It cannot fully correct misreported or miscoded trip purposes, missing data not captured through imputation, recall errors unrelated to diary platform, GPS or routing errors, or sparse samples in very small geographies or rare population groups.

The day-of-week weights also have a specific limitation. Because the sample is split by day, each day-specific weighting run has fewer records than the standard Monday-through-Thursday workflow. The broader Boston / Not Boston geography and combined target categories improve stability, but they do not make all day-specific subgroup estimates equally reliable. Analysts should interpret highly granular day-of-week estimates with caution.

Design Effects and Effective Sample Size

Unequal weights reduce the statistical precision of estimates compared with a simple random sample of the same size. This reduction is summarized by the design effect (DEFF) and the effective sample size (ESS). DEFF reflects how much weight variability inflates variance; ESS reflects the size of an unweighted sample that would yield equivalent precision.

Because day-of-week weights are estimated separately by day, they may have larger variance than the standard weights, particularly for Friday, Saturday, Sunday, or small subgroups. As a rule of thumb, when DEFF exceeds 2.0, analysts should expect a noticeable loss of precision, especially when estimates are based on small subgroups where limited sample size and weight variability compound.

Weight Level	CV	DEFF	ESS
Weight Quality Diagnostics by Analysis Level
Household	1.18	2.38	6,521.83
Person	1.22	2.48	11,917.58
Day	1.60	3.57	13,727.30
Trip	1.81	4.26	46,971.97

Table 25: Weight quality diagnostics by analysis level

The standard diagnostics in Table 25 summarize the default weekday workflow. Table 26 shows the weekday-specific diagnostics for the alternate day-of-week weights. These values are especially helpful when comparing the relative stability of Monday-through-Thursday estimates with Friday, Saturday, or Sunday estimates.

Day	Weight Level	CV	DEFF	ESS
Day-of-Week Weight Quality Diagnostics
Monday	Day	1.36	2.85	4,461.10
Monday	Trip	1.44	3.08	17,038.78
Tuesday	Day	1.33	2.78	5,760.86
Tuesday	Trip	1.45	3.10	21,093.64
Wednesday	Day	1.39	2.93	5,444.14
Wednesday	Trip	1.56	3.42	19,229.54
Thursday	Day	1.37	2.88	5,253.48
Thursday	Trip	1.49	3.22	19,562.40
Friday	Day	1.37	2.88	3,557.98
Friday	Trip	1.44	3.08	16,416.02
Saturday	Day	1.36	2.86	3,574.00
Saturday	Trip	1.42	3.00	17,224.92
Sunday	Day	1.37	2.88	3,528.14
Sunday	Trip	1.44	3.06	13,759.76

Table 26: Day-of-week weight quality diagnostics by weekday and analysis level

Distribution of Weights

Weight variability differs by dataset level and by weighting workflow. Household and person weights reflect the demographic and geographic calibration process. Day weights reflect the way person weights are assigned to complete travel days. Trip weights inherit day-weight variability and incorporate trip-type adjustment factors.

In the day-of-week workflow, weight distributions also differ by day. Friday, Saturday, and Sunday generally have fewer records because online diary and call center travel data were collected Monday through Thursday, while Friday through Sunday records come from smartphone respondents. This smaller sample size contributes to larger day-of-week weights and greater uncertainty for some estimates. Figure 7 visualizes those differences across dataset levels.

Figure 7: Weight Distributions by Dataset Level

Analysts should be cautious when conducting analyses in which a small number of high-weight observations dominate the estimates. This caution is especially important for day-specific estimates, small geographies, rare subgroups, and cross-tabulations with many categories.

Geographic Considerations and Small-Area Estimates

Because weighting was performed to specific weighting geographies, those are the geographies at which the weighted data is most internally consistent. For standard weighting, the relevant geography is the custom client-defined weighting zone structure. For day-of-week weighting, the relevant geography is the broader Boston / Not Boston structure.

This does not prevent analysts from summarizing the data to other geographies, but it does affect interpretation. Cities, towns, neighborhoods, corridors, and other analyst-defined geographies are not individually controlled unless they align with the weighting geographies. Weighted totals for those areas may not match external benchmarks, and estimates may be driven by a small number of high-weight records.

For fine-scale estimates, analysts should check the unweighted sample size, the number of households or persons contributing to the estimate, and the distribution of weights. In some cases, pooling multiple areas, pooling multiple days, or reporting estimates at a broader geography may be more appropriate.

Small Population Groups

Rare population groups, rare travel behaviors, and highly specific day-by-geography combinations may have limited representation in the weighted data. This can include groups such as zero-vehicle households, transit commuters, active transportation users, university students, or weekend travelers in a small geography.

For these groups, analysts should consider pooling response categories, pooling across geographies or days where conceptually appropriate, reporting uncertainty measures, or using model-based estimation techniques. The day-of-week weights improve the ability to analyze daily variation in travel, but they do not eliminate the need to evaluate sample size and precision.

5.7 Summary

Weighting for the Massachusetts Travel Study dataset follows a structured and incremental process. Base weights correct for sample design. Round 1 adjustments correct for demographic nonresponse. Round 2 adjustments correct for day-pattern reporting bias. Round 3 adjustments correct for trip-type under-reporting. Together, these steps yield household, person, day, trip, linked trip, and tour weights for estimating population-level travel behavior.

The standard weights should be treated as the default workflow for typical weekday analysis. The day-of-week weights should be used when the research question depends on a specific day of week or on comparisons across days, including weekday/weekend analysis. In either workflow, analysts should match the weight to the analytic unit, use the geography and target structure as a guide to stable interpretation, and check sample size and weight variability before interpreting small or highly detailed estimates.

6 Dataset Overview

This section describes the prepared tables available for analysis and how they relate to one another. It uses the prepared hts object and the current settings.yml configuration.

6.1 Data Structure

Household travel survey data are hierarchical: though the primary sampling unit (see Section 2.1.2) is the household, the data collected also represent the behavior of individual persons. For participants who reported data via the rMove smartphone app, data were collected across multiple days, representing a multitude of travel and daily activity data.

In the delivered dataset, this hierarchical structure shows up in the form of multiple tables that link to one another using stable identifiers (typically columns ending in _id). Figure 8 summarizes the relationship among the prepared tables.

Diagram showing how prepared study tables relate to each other. — Figure 8: Data linkages across prepared tables.

6.2 Summary of Data Tables

Table 27 lists the prepared tables, their units of observation, and their primary identifiers.

Table Name	Record Unit	Primary ID(s)	Weight Column	What's in the Table
Household (`hh`)	Household	`hh_id`	`hh_weight`	One record per household, with household-level attributes (e.g., sampling/strata fields and household characteristics) used for analysis and weighting.
Person (`person`)	Person	`person_id`	`person_weight`	One record per person in each household, including demographic attributes and person-level variables used for analysis and weighting.
Day (`day`)	Person-day	`day_id`	`day_weight`	One record per person-day (a single survey day for a person), used for day-based analysis such as trip rates and daily metrics.
Vehicle (`vehicle`)	Vehicle	`vehicle_id`	`hh_weight`	One record per household vehicle (when delivered), including vehicle identifiers and vehicle characteristics used in vehicle-based analysis and joins.
Location (`location`)	GPS point on a trip	`trip_id`, `collect_time`		One record per place/location reference (when delivered), often used to store geocoded attributes or repeated location metadata linked to trips, tours, or activities.
Unlinked Trip (`trip_unlinked`)	Person-trip	`trip_id`	`trip_weight`	One record per unlinked trip segment (when delivered), typically representing each movement between stops; used for detailed mode/path and trip-chaining analysis.
Linked Trip (`trip_linked`)	Person-trip	`linked_trip_id`	`linked_trip_weight`	One record per linked trip (when delivered), typically aggregating unlinked segments into a single journey between primary origin and destination.
Tour (`tour`)	Person-tour	`tour_id`	`tour_weight`	One record per tour (when delivered), grouping trips into an out-and-back sequence anchored at home or a primary location; used for tour-based analysis.

Table 27: Summary of prepared tables

6.3 Trip Unit of Measure: Person-Trips

Understanding how travel events are represented in the dataset is essential for correctly interpreting trip and tour outputs. This section describes the structure of person-trip records, how shared travel is represented, and what that means for later analyses.

What is a Person-Trip?

Trips are represented at the person-trip level: each row is a single travel event made by a single person. If multiple household members traveled together, the shared movement appears as multiple records, one per participating person.

The guide’s default trip table is trip_unlinked. See the Analyst Handbook for examples that build from this table.

Replication of Shared Trips

When household members travel together (for example, carpooling, walking together, or biking as a group), the data include:

one person-trip record per traveler, and
a unique trip identifier for each person-trip record.

It is therefore common to observe:

identical origin/destination coordinates,
nearly identical start/end times,
matching modes, and
matching purposes

across members of the same household. These patterns often indicate shared travel, not data duplication errors.

The prepared trip tables contain records only for surveyed persons. Non-household travelers (for example, friends, coworkers, carpool partners, or other companions) may be captured indirectly through trip-level metadata fields such as num_non_hh_travelers and num_hh_travelers.

For MassDOT, hh_member_* columns on the trip table should be treated as supplementary household co-travel metadata rather than definitive truth of co-travel. Processed, imputed co-travel information is present in joint_trip_id and joint_tour_id fields, which are the primary source of information about shared travel in the dataset.

Example Structure of Joint-Trip Records

When joint-trip identifiers are available, Table 28 shows one example of how the same shared movement appears across multiple person-trip records.

hh_id	person_id	day_id	trip_id	joint_trip_id	depart_date	depart_hour	depart_minute	arrive_date	arrive_hour	arrive_minute	o_lat	o_lon	d_lat	d_lon	hh_member_1	hh_member_2	hh_member_3	hh_member_4	hh_member_5	hh_member_6	hh_member_7	hh_member_8
Example joint trip records (joint_trip_id = -1)
24000089	2400008901	240000890101	2400008901001	-1	2024-06-11	14	20	2024-06-11	14	27	42.54132	-70.87968	42.57236	-70.85492	1	0	0	0	0	0	0	0
24000089	2400008901	240000890101	2400008901002	-1	2024-06-11	16	2	2024-06-11	16	15	42.57236	-70.85492	42.54132	-70.87968	1	0	0	0	0	0	0	0
24000089	2400008902	240000890201	2400008902001	-1	2024-06-11	8	0	2024-06-11	8	5	42.54132	-70.87968	42.54144	-70.88607	0	1	0	0	0	0	0	0
24000089	2400008902	240000890201	2400008902002	-1	2024-06-11	9	20	2024-06-11	9	25	42.54144	-70.88607	42.54626	-70.88079	0	1	0	0	0	0	0	0
24000089	2400008902	240000890201	2400008902003	-1	2024-06-11	9	40	2024-06-11	9	45	42.54626	-70.88079	42.54132	-70.87968	0	1	0	0	0	0	0	0
24000089	2400008902	240000890201	2400008902004	-1	2024-06-11	11	20	2024-06-11	11	30	42.54132	-70.87968	42.55242	-70.87901	0	1	0	0	0	0	0	0
24000089	2400008902	240000890201	2400008902005	-1	2024-06-11	12	5	2024-06-11	12	15	42.55242	-70.87901	42.54132	-70.87968	0	1	0	0	0	0	0	0
24000122	2400012201	240001220101	2400012201001	-1	2024-06-11	12	39	2024-06-11	13	1	42.34075	-71.15493	42.33640	-71.17110	1	0	0	0	0	0	0	0
24000122	2400012201	240001220101	2400012201002	-1	2024-06-11	15	55	2024-06-11	16	15	42.33663	-71.17125	42.34063	-71.15506	1	0	0	0	0	0	0	0
24000122	2400012201	240001220101	2400012201036	-1	2024-06-11	16	31	2024-06-11	16	31	42.34075	-71.15493	42.34075	-71.15493	1	0	0	0	0	0	0	0

Table 28: Example joint-trip records.

6.4 Record Counts

Table 29 summarizes the delivered record counts and the number of positively weighted records by table.

Table	Records	Weight Column	Weighted Records	Percent Weighted
Household (`hh`)	18,122	`hh_weight`	15,552	85.8%
Person (`person`)	37,616	`person_weight`	29,560	78.6%
Day (`day`)	134,187	`day_weight`	49,028	36.5%
Vehicle (`vehicle`)	25,849	`hh_weight`	21,669	83.8%
Location (`location`)	8,607,225		NA	NA
Unlinked Trip (`trip_unlinked`)	468,018	`trip_weight`	200,120	42.8%
Linked Trip (`trip_linked`)	419,469	`linked_trip_weight`	173,983	41.5%
Tour (`tour`)	160,091	`tour_weight`	68,007	42.5%
Value Labels (`value_labels`)	2,422		NA	NA
Variable List (`variable_list`)	567		NA	NA

Table 29: Record counts and weighted records by table

6.5 Household Completion Status

The delivered MassDOT data include both complete and incomplete households. For most descriptive and inferential analyses, the recommended analytic universe is the set of households where hts$hh$is_complete == 1.

This household-level completeness rule is different from trip-level completion flags. In particular, trip_unlinked$is_complete describes trip-record completion or usability, not whether the household belongs in the complete-household analytic universe. When an analysis starts from person-, day-, trip-, vehicle-, or tour-level records, use the household table to define the complete-household universe and then carry that restriction down through hh_id.

Table 30 shows how many delivered records belong to complete versus incomplete households across the main prepared tables.

Table	Household Completion Status	Records
hh	Complete household	15,641
hh	Incomplete household	2,481
person	Complete household	31,255
person	Incomplete household	6,361
day	Complete household	96,370
day	Incomplete household	37,817
trip_unlinked	Complete household	411,573
trip_unlinked	Incomplete household	56,445
trip_linked	Complete household	366,186
trip_linked	Incomplete household	53,283
tour	Complete household	139,240
tour	Incomplete household	20,851
vehicle	Complete household	21,770
vehicle	Incomplete household	4,079

Table 30: Delivered records by household completion status.

For lower-level analysis tables, the simplest workflow is:

create complete_hh_ids from hts$hh
filter households directly with dplyr::filter(is_complete == 1)
filter lower-level tables with dplyr::filter(hh_id %in% complete_hh_ids)
use trip-level is_complete only when the question is specifically about trip completion or trip usability

6.6 Data Types and Considerations

The dataset includes variables that behave differently in analysis. Understanding common patterns helps avoid common mistakes in summaries and models.

Categorical Variables

Categorical variables store labels rather than magnitudes. They include binary fields, nominal fields (no inherent order), ordinal fields (with a natural order), and many count-like fields that are top-coded or otherwise treated as binned categories.

Use the Codebook to Order Categories

When you build a table, chart, or derived factor from a categorical variable, use codebook$value_labels as the source of truth for both labels and ordering.

Continuous Numeric Variables

Continuous numeric variables represent numeric measures where arithmetic operations are meaningful (for example, distances, durations, travel time, or speed). These fields can have wide ranges and may include extreme values.

Top-coded count variables

Some variables that look numeric (for example age brackets or capped household sizes) should be treated as categorical in analysis when they represent binned values rather than true continuous measures.

The table below summarizes the configured outlier diagnostics used to review the tails of selected numeric variables.

Outlier diagnostics.

	Min	Max	P01	P99	IQR	Lower bound	Upper bound	Outliers	% outliers	Worst gap	Severity	Suggested handling
day
num_trips	0	68	0	15	5	−8	12	3,061	2.3%	56	Moderate	Consider trimming >= 99th pct.
person
num_trips	0	310	0	70	17	−24	44	2,560	6.8%	266	High	Trim or winsorize >= 95th pct.
trip
distance_meters	0	19,872,573	93	111,517	8,568	−11,723	22,549	49,331	10.8%	19,850,024	High	Trim or winsorize >= 95th pct.
distance_miles	0	12,348	0	69	5	−7	14	49,331	10.8%	12,334	High	Trim or winsorize >= 95th pct.
duration_minutes	0	8,008	0	158	16	−18	46	34,690	7.6%	7,962	High	Trim or winsorize >= 95th pct.
duration_seconds	1	480,464	1	9,472	979	−1,102	2,814	34,281	7.5%	477,650	High	Trim or winsorize >= 95th pct.
dwell_mins	0	9,162	0	2,367	311	−462	783	52,110	11.1%	8,379	High	Trim or winsorize >= 95th pct.
speed_mph	0	14,766,979	0	539	21	−26	57	19,239	4.2%	14,766,922	High	Trim or winsorize >= 95th pct.

Missing Values

Table 31 summarizes the configured missing-value codes and their labels in the codebook.

Code	Label(s) in codebook	# Variables
-1	Not imputable; Missing	13
995	Missing Response; 995	310
996	Never; None; None (I do not drive a vehicle)	7
997	Other; Other/prefer to self-describe; Other (e.g., boat, RV, van); Other vehicle	12
998	Don't know	3
999	Prefer not to answer	8

Table 31: Configured missing-value codes and codebook labels.

7 Codebook

The codebook is the primary reference for understanding the variables, response categories, question and response logic, and data structures used in this dataset.

A clear codebook is essential for reproducible analysis. It helps analysts identify the meaning of each variable, understand where it appears in the dataset, distinguish categorical variables from numeric or top-coded fields, verify skip logic and valid values, and interpret coded values consistently across tables.

Use the Codebook First

Start here whenever you need to answer any of these questions:

What table contains this variable?
What does this field mean?
Who was asked this question, and under what conditions should I expect a value?
Is this field categorical, numeric, top-coded, or part of a grouped response?
What do the stored values mean?
What order should categories appear in a plot or table?
Is this field part of a “select-all-that-apply” group or controlled by survey logic?

7.1 What the Codebook Contains

Variable List

The variable list is the structural reference for the dataset. It describes each delivered data element and helps analysts understand how variables are organized across household, person, day, trip, tour, vehicle, location, or other study tables.

The variable list includes:

variable name
table membership
delivered data type
description of the variable’s meaning, units, or derivation
survey question text, when available
survey logic that governs whether a respondent was asked the question or should have a value
checkbox or select-all-that-apply flags for multiple-response categorical variables

Value Labels

The value-label table is the categorical reference for the dataset. For coded categorical variables, it maps stored values to human-readable labels so analysts can reconstruct ordered factors, standard tabulations, and interpretable plots.

Depending on the study, value-label records may include:

table name, when labels vary by table
variable name
stored value or code
human-readable label
category order

7.2 Variable List

The variable list below is searchable and downloadable. For display, table membership flags present in the raw delivered codebook as binary hh, person, day columns are combined into a single table_membership field when the source codebook stores membership as separate table columns.

Codebook variable list.

7.3 Value Labels

The value-label table below lists the available value labels for categorical variables. Use it alongside the variable list to interpret coded values and preserve the intended category order in summaries, charts, and models.

Codebook value labels.

8 Frequency Tables

This chapter provides frequency summaries for variables in each prepared data table. Use the table of contents on the left to jump directly to each dataset table section.

8.1 Household

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	2,481	13.69%	0	0.00%
1	Yes	15,641	86.31%	2,814,595	100.00%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_trips`

Statistic	Unweighted	Weighted
`num_trips`
Number of trips
Statistic	Value	Value
N	18,122.00	2,814,595.28
Min	0.00	0.00
P25	4.00	5.00
Median	10.00	10.00
Mean	25.83	28.96
P75	37.00	38.00
P95	101.00	120.00
Max	355.00	355.00
SD	35.13	40.52

`num_days_complete`

Value	Label	Unweighted		Weighted
`num_days_complete`
Number of complete days
Value	Label	Count	Percent	Count	Percent
0	0 complete days	2,477	13.67%	0	0.00%
1	1 complete day	10,482	57.84%	2,074,334	73.70%
2	2 complete days	69	0.38%	12,725	0.45%
3	3 complete days	79	0.44%	12,724	0.45%
4	4 complete days	121	0.67%	22,069	0.78%
5	5 complete days	269	1.48%	41,327	1.47%
6	6 complete days	813	4.49%	124,650	4.43%
7	7 complete days	3,812	21.04%	526,766	18.72%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`participation_group`

Value	Label	Unweighted		Weighted
`participation_group`
Participation group
Value	Label	Count	Percent	Count	Percent
1	Signup survey completed via browserMove, Diary completed via browserMove	7,569	41.77%	1,426,249	50.67%
2	Signup survey completed via browserMove, Diary completed via call center	130	0.72%	23,856	0.85%
3	Signup survey completed via browserMove, Diary completed via rMove	3,298	18.20%	355,089	12.62%
4	Signup survey completed via call center, Diary completed via browserMove	91	0.50%	15,885	0.56%
5	Signup survey completed via call center, Diary completed via call center	425	2.35%	82,109	2.92%
6	Signup survey completed via call center, Diary completed via rMove	15	0.08%	1,999	0.07%
7	Signup survey completed via rMove, Diary completed via browserMove	1,335	7.37%	217,059	7.71%
8	Signup survey completed via rMove, Diary completed via call center	18	0.10%	3,451	0.12%
9	Signup survey completed via rMove, Diary completed via rMove	5,241	28.92%	688,898	24.48%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`sample_segment`

Value	Label	Unweighted		Weighted
`sample_segment`
Sample segment
Value	Label	Count	Percent	Count	Percent
1	Berkshire General	108	0.60%	20,148	0.72%
2	Berkshire Hard-to-reach	107	0.59%	15,776	0.56%
3	Berkshire Rural	153	0.84%	22,595	0.80%
4	Boston Region General	3,747	20.68%	741,690	26.35%
5	Boston Region Hard-to-reach	3,389	18.70%	441,647	15.69%
6	Boston Region Rural	141	0.78%	20,827	0.74%
7	Boston Region Walk/Bike/Transit	1,123	6.20%	130,153	4.62%
8	Cape Cod General	516	2.85%	88,117	3.13%
9	Cape Cod Hard-to-reach	74	0.41%	8,852	0.31%
10	Cape Cod Rural	85	0.47%	11,660	0.41%
11	Central Massachusetts General	664	3.66%	124,515	4.42%
12	Central Massachusetts Hard-to-reach	565	3.12%	73,071	2.60%
13	Central Massachusetts Rural	272	1.50%	37,407	1.33%
14	Franklin General	54	0.30%	8,771	0.31%
15	Franklin Hard-to-reach	39	0.22%	3,017	0.11%
16	Franklin Rural	133	0.73%	19,912	0.71%
17	Martha’s Vineyard General	73	0.40%	6,387	0.23%
18	Martha’s Vineyard Rural	82	0.45%	4,089	0.15%
19	Merrimack Valley General	461	2.54%	78,234	2.78%
20	Merrimack Valley Hard-to-reach	349	1.93%	55,599	1.98%
21	Merrimack Valley Rural	72	0.40%	8,246	0.29%
22	Montachusett General	324	1.79%	55,578	1.97%
23	Montachusett Hard-to-reach	128	0.71%	16,844	0.60%
24	Montachusett Rural	216	1.19%	26,139	0.93%
25	Nantucket General	80	0.44%	6,525	0.23%
26	Nantucket Rural	27	0.15%	1,326	0.05%
27	Northern Middlesex General	438	2.42%	77,595	2.76%
28	Northern Middlesex Hard-to-reach	331	1.83%	37,193	1.32%
29	Northern Middlesex Rural	28	0.15%	2,792	0.10%
30	Old Colony General	583	3.22%	99,395	3.53%
31	Old Colony Hard-to-reach	262	1.45%	37,333	1.33%
32	Old Colony Rural	59	0.33%	9,823	0.35%
33	Pioneer Valley General	646	3.56%	107,815	3.83%
34	Pioneer Valley Hard-to-reach	792	4.37%	105,120	3.73%
35	Pioneer Valley Rural	265	1.46%	41,054	1.46%
36	Southeastern Massachusetts General	968	5.34%	173,990	6.18%
37	Southeastern Massachusetts Hard-to-reach	521	2.87%	64,372	2.29%
38	Southeastern Massachusetts Rural	247	1.36%	30,988	1.10%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`signup_platform`

Value	Label	Unweighted		Weighted
`signup_platform`
Signup platform
Value	Label	Count	Percent	Count	Percent
browser		10,997	60.68%	1,805,194	64.14%
call		531	2.93%	99,993	3.55%
rmove		6,594	36.39%	909,408	32.31%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`diary_platform`

Value	Label	Unweighted		Weighted
`diary_platform`
Diary platform
Value	Label	Count	Percent	Count	Percent
browser		8,995	49.64%	1,659,193	58.95%
call		573	3.16%	109,416	3.89%
rmove		8,554	47.20%	1,045,986	37.16%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_people`

Value	Label	Unweighted		Weighted
`num_people`
Number of household members
Value	Label	Count	Percent	Count	Percent
1	1 person	6,598	36.41%	814,090	28.92%
2	2 people	6,858	37.84%	949,530	33.74%
3	3 people	2,349	12.96%	476,856	16.94%
4	4 people	1,624	8.96%	356,382	12.66%
5	5 people	497	2.74%	150,374	5.34%
6	6 people	132	0.73%	45,412	1.61%
7	7 people	39	0.22%	12,731	0.45%
8	8 people	17	0.09%	6,913	0.25%
9	9 people	7	0.04%	1,474	0.05%
10	10 people	1	0.01%	834	0.03%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_surveyable`

Value	Label	Unweighted		Weighted
`num_surveyable`
Number of surveyable household members
Value	Label	Count	Percent	Count	Percent
1	1 surveyable person	7,490	41.33%	938,544	33.35%
2	2 surveyable persons	6,445	35.56%	907,685	32.25%
3	3 surveyable persons	2,100	11.59%	431,299	15.32%
4	4 surveyable persons	1,484	8.19%	340,390	12.09%
5	5 surveyable persons	444	2.45%	137,783	4.90%
6	6 surveyable persons	109	0.60%	40,588	1.44%
7	7 surveyable persons	34	0.19%	11,603	0.41%
8	8 surveyable persons	14	0.08%	4,858	0.17%
9	9 surveyable persons	1	0.01%	1,013	0.04%
10	10 surveyable persons	1	0.01%	834	0.03%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_participants`

Value	Label	Unweighted		Weighted
`num_participants`
Number of participants
Value	Label	Count	Percent	Count	Percent
1	1 participant	8,084	44.61%	1,071,806	38.08%
2	2 participants	8,442	46.58%	1,324,835	47.07%
3	3 participants	1,136	6.27%	286,880	10.19%
4	4 participants	355	1.96%	96,861	3.44%
5	5 participants	87	0.48%	27,090	0.96%
6	6 participants	15	0.08%	5,976	0.21%
7	7 participants	3	0.02%	1,146	0.04%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_adults`

Value	Label	Unweighted		Weighted
`num_adults`
Number of adults
Value	Label	Count	Percent	Count	Percent
1	1 adult	7,173	39.58%	940,374	33.41%
2	2 adults	8,870	48.95%	1,370,461	48.69%
3	3 adults	1,406	7.76%	338,107	12.01%
4	4 adults	492	2.71%	114,259	4.06%
5	5 adults	140	0.77%	40,669	1.44%
6	6 adults	31	0.17%	8,982	0.32%
7	7 adults	5	0.03%	936	0.03%
8	8 adults	3	0.02%	411	0.01%
9	9 adults	2	0.01%	397	0.01%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_kids`

Value	Label	Unweighted		Weighted
`num_kids`
Number of children
Value	Label	Count	Percent	Count	Percent
0	0 children	14,737	81.32%	2,057,879	73.11%
1	1 child	1,701	9.39%	341,852	12.15%
2	2 children	1,299	7.17%	291,568	10.36%
3	3 children	305	1.68%	90,658	3.22%
4	4 children	63	0.35%	24,796	0.88%
5	5 children	14	0.08%	6,895	0.24%
6	6 children	3	0.02%	949	0.03%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_workers`

Value	Label	Unweighted		Weighted
`num_workers`
Number of workers
Value	Label	Count	Percent	Count	Percent
0	0 workers	4,697	25.92%	646,321	22.96%
1	1 worker	6,909	38.12%	953,169	33.87%
2	2 workers	5,562	30.69%	934,944	33.22%
3	3 workers	714	3.94%	226,046	8.03%
4	4 workers	180	0.99%	48,507	1.72%
5	5 workers	47	0.26%	4,007	0.14%
6	6 workers	9	0.05%	542	0.02%
7	7 workers	2	0.01%	1,044	0.04%
8	8 workers	2	0.01%	16	0.00%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_students`

Value	Label	Unweighted		Weighted
`num_students`
Number of students
Value	Label	Count	Percent	Count	Percent
0	0 students	15,428	85.13%	2,253,685	80.07%
1	1 student	2,122	11.71%	428,818	15.24%
2	2 students	436	2.41%	103,021	3.66%
3	3 students	85	0.47%	17,159	0.61%
4	4 students	36	0.20%	7,837	0.28%
5	5 students	12	0.07%	3,266	0.12%
6	6 students	2	0.01%	810	0.03%
7	7 students	1	0.01%	0	0.00%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`num_vehicles`

Value	Label	Unweighted		Weighted
`num_vehicles`
Number of vehicles
Value	Label	Count	Percent	Count	Percent
0	0 (no vehicles in my household)	2,437	13.45%	325,071	11.55%
1	1 vehicle	7,984	44.06%	1,096,480	38.96%
2	2 vehicles	5,884	32.47%	1,002,864	35.63%
3	3 vehicles	1,338	7.38%	275,204	9.78%
4	4 vehicles	362	2.00%	84,670	3.01%
5	5 vehicles	85	0.47%	22,896	0.81%
6	6 vehicles	20	0.11%	5,610	0.20%
7	7 vehicles	6	0.03%	1,167	0.04%
8	8 or more vehicles	6	0.03%	635	0.02%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`income_detailed`

Value	Label	Unweighted		Weighted
`income_detailed`
Last year’s household income (detailed categories)
Value	Label	Count	Percent	Count	Percent
1	Less than $15,000	1,135	6.26%	203,316	7.22%
2	$15,000-$24,999	954	5.26%	163,302	5.80%
3	$25,000-$34,999	846	4.67%	140,662	5.00%
4	$35,000-$49,999	1,237	6.83%	172,112	6.11%
5	$50,000-$74,999	2,204	12.16%	320,298	11.38%
6	$75,000-$99,999	2,242	12.37%	296,036	10.52%
7	$100,000-$149,999	2,904	16.02%	380,305	13.51%
8	$150,000-$199,999	1,861	10.27%	262,130	9.31%
9	$200,000-$249,999	1,044	5.76%	199,327	7.08%
10	$250,000 or more	1,391	7.68%	270,305	9.60%
999	Prefer not to answer	2,304	12.71%	406,802	14.45%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`income_followup`

Value	Label	Unweighted		Weighted
`income_followup`
Last year’s household income (broad categories)
Value	Label	Count	Percent	Count	Percent
1	Under $25,000	63	3.00%	14,600	3.62%
2	$25,000-$49,999	81	3.86%	15,029	3.73%
3	$50,000-$74,999	59	2.81%	11,148	2.77%
4	$75,000-$99,999	77	3.67%	12,448	3.09%
5	$100,000-$199,999	126	6.01%	19,522	4.85%
6	$200,000 or more	140	6.68%	33,486	8.31%
999	Prefer not to answer	1,551	73.96%	296,535	73.62%
995	Missing Response	16,025		2,411,828
	Total valid	2,097	100.00%	402,768	100.00%
	Total missing	16,025		2,411,828
	Total	18,122		2,814,595
Logic: if income_detailed = ‘Prefer not to answer’

`income_broad`

Value	Label	Unweighted		Weighted
`income_broad`
Last year’s household income upcoded responses from income_detailed and income_broad
Value	Label	Count	Percent	Count	Percent
1	Under $25,000	2,152	11.88%	381,218	13.54%
2	$25,000-$49,999	2,164	11.94%	327,803	11.65%
3	$50,000-$74,999	2,263	12.49%	331,446	11.78%
4	$75,000-$99,999	2,319	12.80%	308,484	10.96%
5	$100,000-$199,999	4,891	26.99%	661,957	23.52%
6	$200,000 or more	2,575	14.21%	503,118	17.88%
999	Prefer not to answer	1,758	9.70%	300,569	10.68%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`residence_rent_own`

Value	Label	Unweighted		Weighted
`residence_rent_own`
Current residence ownership
Value	Label	Count	Percent	Count	Percent
1	Own/buying (paying a mortgage)	10,436	57.59%	1,697,353	60.31%
2	Rent	6,840	37.74%	960,106	34.11%
3	Housing provided by job or military	23	0.13%	4,155	0.15%
4	Provided by family, relative, or friend without payment or rent	185	1.02%	30,172	1.07%
997	Other	256	1.41%	39,121	1.39%
999	Prefer not to answer	382	2.11%	83,688	2.97%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`home_county`

Value	Label	Unweighted		Weighted
`home_county`
Home location– County
Value	Label	Count	Percent	Count	Percent
25001	Barnstable County	652	3.60%	105,044	3.73%
25003	Berkshire County	367	2.03%	58,390	2.07%
25005	Bristol County	1,518	8.38%	230,388	8.19%
25007	Dukes County	148	0.82%	9,073	0.32%
25009	Essex County	1,633	9.01%	289,466	10.28%
25011	Franklin County	226	1.25%	31,307	1.11%
25013	Hampden County	1,178	6.50%	190,544	6.77%
25015	Hampshire County	523	2.89%	62,675	2.23%
25017	Middlesex County	4,498	24.82%	706,564	25.10%
25019	Nantucket County	106	0.58%	7,739	0.27%
25021	Norfolk County	1,549	8.55%	267,604	9.51%
25023	Plymouth County	1,148	6.33%	189,748	6.74%
25025	Suffolk County	2,464	13.60%	336,827	11.97%
25027	Worcester County	2,112	11.65%	329,227	11.70%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

`residence_type`

Value	Label	Unweighted		Weighted
`residence_type`
Type of current residence
Value	Label	Count	Percent	Count	Percent
1	Single-family house (detached house)	8,594	47.42%	1,459,613	51.86%
2	Single-family house attached to one or more houses (rowhouse or townhouse)	991	5.47%	185,848	6.60%
3	Building with 2-4 units (duplexes, triplexes, quads)	3,210	17.71%	462,111	16.42%
4	Building with 5-49 apartments/condos	3,088	17.04%	410,008	14.57%
5	Building with 50 or more apartments/condos	1,695	9.35%	211,258	7.51%
6	Senior or age-restricted apartments/condos	371	2.05%	53,676	1.91%
7	Manufactured home/mobile home/trailer	90	0.50%	15,918	0.57%
9	Dorm, group quarters, or institutional housing	37	0.20%	5,791	0.21%
997	Other (e.g., boat, RV, van)	46	0.25%	10,373	0.37%
	Total valid	18,122	100.00%	2,814,595	100.00%
	Total	18,122		2,814,595

8.2 Person

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	6,361	16.91%	0	0.00%
1	Yes	31,255	83.09%	6,759,612	100.00%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`num_trips`

Statistic	Unweighted	Weighted
`num_trips`
Number of trips
Statistic	Value	Value
N	37,616.00	6,759,611.78
Min	0.00	0.00
P25	2.00	2.00
Median	4.00	4.00
Mean	12.44	13.01
P75	19.00	21.00
P95	49.00	49.00
Max	310.00	180.00
SD	17.24	17.32

`num_days_complete`

Value	Label	Unweighted		Weighted
`num_days_complete`
Number of complete days
Value	Label	Count	Percent	Count	Percent
0	0 complete days	4,621	12.89%	0	0.00%
1	1 complete day	21,097	58.83%	4,958,389	73.35%
2	2 complete days	114	0.32%	15,867	0.23%
3	3 complete days	126	0.35%	16,716	0.25%
4	4 complete days	176	0.49%	29,619	0.44%
5	5 complete days	367	1.02%	62,778	0.93%
6	6 complete days	1,315	3.67%	235,969	3.49%
7	7 complete days	8,043	22.43%	1,440,275	21.31%
NA	No value assigned	1,757		0
	Total valid	35,859	100.00%	6,759,612	100.00%
	Total missing	1,757		0
	Total	37,616		6,759,612

`hh_is_complete`

Value	Label	Unweighted		Weighted
`hh_is_complete`
Household day completion status
Value	Label	Count	Percent	Count	Percent
0	No	6,357	16.90%	0	0.00%
1	Yes	31,259	83.10%	6,759,612	100.00%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`is_participant`

Value	Label	Unweighted		Weighted
`is_participant`
Active participant (age 18+ and surveyable)
Value	Label	Count	Percent	Count	Percent
0	No	7,274	19.34%	1,343,373	19.87%
1	Yes	30,342	80.66%	5,416,239	80.13%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`num_bicycles`

Value	Label	Unweighted		Weighted
`num_bicycles`
Number of bicycles
Value	Label	Count	Percent	Count	Percent
0	0 bicycles	8,986	46.67%	1,606,548	44.13%
1	1 bicycle	3,489	18.12%	604,712	16.61%
2	2 bicycles	3,515	18.26%	661,238	18.17%
3	3 bicycles	1,401	7.28%	292,476	8.03%
4	4 bicycles	1,059	5.50%	257,860	7.08%
5	5 bicycles	381	1.98%	108,746	2.99%
6	6 bicycles	188	0.98%	50,683	1.39%
7	7 bicycles	84	0.44%	22,650	0.62%
8	8 or more bicycles	151	0.78%	35,251	0.97%
995	Missing Response	18,362		3,119,448
	Total valid	19,254	100.00%	3,640,163	100.00%
	Total missing	18,362		3,119,448
	Total	37,616		6,759,612
Logic: if rMove or (rMove for Web and person 1)

`bicycle_type`

Option	Variable	Unweighted			Weighted
`bicycle_type`
Type of bicycle
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Standard bicycle	Bicycle Type 1	9,919	96.6%	27,348	1,967,594	96.8%	4,725,996
Electric bicycle	Bicycle Type 2	916	8.9%	27,348	191,151	9.4%	4,725,996
Other	Bicycle Type 997	419	4.1%	27,348	85,606	4.2%	4,725,996
Logic: show if number of bicycles > 0

`second_home_in_region`

Value	Label	Unweighted		Weighted
`second_home_in_region`
Second home in study region
Value	Label	Count	Percent	Count	Percent
0	No	515	23.26%	97,154	21.96%
1	Yes	1,699	76.74%	345,193	78.04%
995	Missing Response	35,402		6,317,265
	Total valid	2,214	100.00%	442,347	100.00%
	Total missing	35,402		6,317,265
	Total	37,616		6,759,612
Logic: if has second_home

`second_home_state`

Value	Label	Unweighted		Weighted
`second_home_state`
Second home location– State
Value	Label	Count	Percent	Count	Percent
09	Connecticut	55	2.71%	10,214	2.47%
25	Massachusetts	1,699	83.74%	345,193	83.62%
33	New Hampshire	154	7.59%	33,186	8.04%
36	New York	32	1.58%	6,198	1.50%
44	Rhode Island	45	2.22%	7,064	1.71%
50	Vermont	44	2.17%	10,975	2.66%
NA	No value assigned	35,587		6,346,782
	Total valid	2,029	100.00%	412,830	100.00%
	Total missing	35,587		6,346,782
	Total	37,616		6,759,612
Logic: if has second_home

`second_home_county`

Value	Label	Unweighted		Weighted
`second_home_county`
Second home location– County
Value	Label	Count	Percent	Count	Percent
09001	Fairfield County	1	0.05%	0	0.00%
09003	Hartford County	20	0.99%	3,977	0.96%
09005	Litchfield County	4	0.20%	280	0.07%
09007	Middlesex County, Connecticut	1	0.05%	58	0.01%
09009	New Haven County	8	0.39%	854	0.21%
09011	New London County	9	0.44%	1,892	0.46%
09013	Tolland County	7	0.34%	1,733	0.42%
09015	Windham County, Connecticut	5	0.25%	1,422	0.34%
25001	Barnstable County	166	8.18%	26,475	6.41%
25003	Berkshire County	41	2.02%	6,306	1.53%
25005	Bristol County	116	5.72%	22,283	5.40%
25007	Dukes County	14	0.69%	2,343	0.57%
25009	Essex County	138	6.80%	23,709	5.74%
25011	Franklin County	28	1.38%	6,251	1.51%
25013	Hampden County	102	5.03%	24,943	6.04%
25015	Hampshire County	67	3.30%	12,053	2.92%
25017	Middlesex County	323	15.92%	62,086	15.04%
25019	Nantucket County	14	0.69%	4,798	1.16%
25021	Norfolk County	144	7.10%	34,017	8.24%
25023	Plymouth County	135	6.65%	31,466	7.62%
25025	Suffolk County	236	11.63%	50,078	12.13%
25027	Worcester County	175	8.62%	38,384	9.30%
33001	Belknap County	25	1.23%	4,867	1.18%
33003	Carroll County	27	1.33%	7,272	1.76%
33005	Cheshire County	7	0.34%	1,039	0.25%
33007	Coos County	1	0.05%	0	0.00%
33009	Grafton County	25	1.23%	4,913	1.19%
33011	Hillsborough County	22	1.08%	6,800	1.65%
33013	Merrimack County	5	0.25%	1,713	0.41%
33015	Rockingham County	30	1.48%	5,488	1.33%
33017	Strafford County	6	0.30%	655	0.16%
33019	Sullivan County, New Hampshire	6	0.30%	441	0.11%
36001	Albany County	1	0.05%	0	0.00%
36005	Bronx County	1	0.05%	524	0.13%
36029	Bullock County	1	0.05%	0	0.00%
36033	Franklin County, New York	2	0.10%	391	0.09%
36035	Fulton County	1	0.05%	713	0.17%
36047	Kings County	3	0.15%	783	0.19%
36055	Monroe County	1	0.05%	83	0.02%
36061	New York County	6	0.30%	1,239	0.30%
36065	Oneida County	1	0.05%	175	0.04%
36067	Broome County	2	0.10%	140	0.03%
36079	Putnam County	1	0.05%	152	0.04%
36081	Queens County	1	0.05%	1,098	0.27%
36083	Rensselaer County	1	0.05%	21	0.01%
36091	Saratoga County	4	0.20%	158	0.04%
36103	Suffolk County, New York	3	0.15%	587	0.14%
36111	Ulster County	1	0.05%	0	0.00%
36115	Washington County, New York	2	0.10%	133	0.03%
44001	Bristol County, Rhode Island	3	0.15%	954	0.23%
44003	Kent County	6	0.30%	574	0.14%
44005	Newport County	3	0.15%	941	0.23%
44007	Providence County	18	0.89%	2,932	0.71%
44009	Washington County, Rhode Island	15	0.74%	1,663	0.40%
50001	Addison County	2	0.10%	32	0.01%
50003	Bennington County	2	0.10%	969	0.23%
50005	Caledonia County	3	0.15%	93	0.02%
50007	Chittenden County	2	0.10%	2,171	0.53%
50009	Essex County, Vermont	2	0.10%	71	0.02%
50015	Lamoille County	2	0.10%	0	0.00%
50019	Orleans County	2	0.10%	34	0.01%
50021	Rutland County	4	0.20%	785	0.19%
50023	Washington County, Vermont	7	0.34%	689	0.17%
50025	Windham County, Vermont	4	0.20%	527	0.13%
50027	Windsor County	14	0.69%	5,604	1.36%
NA	No value assigned	35,587		6,346,782
	Total valid	2,029	100.00%	412,830	100.00%
	Total missing	35,587		6,346,782
	Total	37,616		6,759,612
Logic: if has second_home

`is_proxy`

Value	Label	Unweighted		Weighted
`is_proxy`
Assigned proxy reporter
Value	Label	Count	Percent	Count	Percent
0	No	33,213	88.29%	5,617,136	83.10%
1	Yes	4,403	11.71%	1,142,476	16.90%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`has_proxy`

Value	Label	Unweighted		Weighted
`has_proxy`
Has a proxy
Value	Label	Count	Percent	Count	Percent
0	No	32,794	87.18%	5,416,239	80.13%
1	Yes	4,822	12.82%	1,343,373	19.87%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`has_phone`

Value	Label	Unweighted		Weighted
`has_phone`
Participant has phone
Value	Label	Count	Percent	Count	Percent
0	No	9,220	24.51%	1,779,711	26.33%
1	Yes	28,396	75.49%	4,979,900	73.67%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`phone_type`

Value	Label	Unweighted		Weighted
`phone_type`
Participant phone type
Value	Label	Count	Percent	Count	Percent
0	Does not have a smartphone	1,500	6.31%	315,388	7.17%
1	Has an Android phone	7,268	30.59%	1,348,539	30.66%
2	Has an Apple iPhone	14,534	61.17%	2,613,037	59.42%
3	Has other smartphone type	457	1.92%	120,951	2.75%
995	Missing Response	13,857		2,361,696
	Total valid	23,759	100.00%	4,397,915	100.00%
	Total missing	13,857		2,361,696
	Total	37,616		6,759,612

`relationship`

Value	Label	Unweighted		Weighted
`relationship`
Relationship to household person number 1
Value	Label	Count	Percent	Count	Percent
0	Self	18,122	48.18%	3,054,428	45.19%
1	Spouse, partner	8,846	23.52%	1,517,200	22.45%
2	Child, child in-law	6,962	18.51%	1,720,415	25.45%
3	Parent, parent in-law	1,134	3.01%	262,050	3.88%
4	Sibling, sibling in-law	479	1.27%	116,402	1.72%
5	Other relative (grandchild, cousin)	327	0.87%	89,117	1.32%
6	Nonrelative (friend, roommate, household help)	1,746	4.64%	0	0.00%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`age`

Value	Label	Unweighted		Weighted
`age`
Age of household member
Value	Label	Count	Percent	Count	Percent
1	Age under 5	1,638	4.35%	343,059	5.08%
2	Age 5-15	3,310	8.80%	835,318	12.36%
3	Age 16-17	606	1.61%	164,996	2.44%
4	Age 18-24	2,554	6.79%	528,447	7.82%
5	Age 25-34	6,583	17.50%	926,633	13.71%
6	Age 35-44	5,991	15.93%	944,855	13.98%
7	Age 45-54	4,288	11.40%	899,169	13.30%
8	Age 55-64	5,052	13.43%	873,011	12.92%
9	Age 65-74	5,086	13.52%	856,797	12.68%
10	Age 75-84	2,167	5.76%	333,185	4.93%
11	Age 85 up	341	0.91%	54,143	0.80%
	Total valid	37,616	100.00%	6,759,612	100.00%
	Total	37,616		6,759,612

`gender`

Value	Label	Unweighted		Weighted
`gender`
Gender
Value	Label	Count	Percent	Count	Percent
1	Female	18,361	51.19%	3,272,412	48.41%
2	Male	15,991	44.58%	3,125,722	46.24%
4	Non-binary	316	0.88%	57,210	0.85%
997	Other/prefer to self-describe	84	0.23%	15,344	0.23%
999	Prefer not to answer	1,118	3.12%	288,924	4.27%
995	Missing Response	1,746		0
	Total valid	35,870	100.00%	6,759,612	100.00%
	Total missing	1,746		0
	Total	37,616		6,759,612
Logic: if surveyable

`race`

Option	Variable	Unweighted			Weighted
`race`
Race
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
African American or Black	Race 1	1,412	5.0%	9,629	415,324	7.7%	1,349,718
American Indian or Alaska Native	Race 2	214	0.8%	9,629	90,755	1.7%	1,349,718
Asian	Race 3	2,449	8.8%	9,629	456,179	8.4%	1,349,718
Native Hawaiian or other Pacific Islander	Race 4	73	0.3%	9,629	23,812	0.4%	1,349,718
White	Race 5	21,473	76.7%	9,629	3,763,627	69.6%	1,349,718
Other race	Race 997	643	2.3%	9,629	293,069	5.4%	1,349,718
Prefer not to answer	Race 999	2,494	8.9%	9,629	679,994	12.6%	1,349,718
Logic: if surveyable

`ethnicity`

Option	Variable	Unweighted			Weighted
`ethnicity`
Ethnicity
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Not of Hispanic, Latino, or Spanish origin	Ethnicity 1	23,413	83.7%	9,629	4,134,795	76.4%	1,349,718
Mexican, Mexican American, Chicano	Ethnicity 2	206	0.7%	9,629	57,367	1.1%	1,349,718
Puerto Rican	Ethnicity 3	755	2.7%	9,629	217,328	4.0%	1,349,718
Cuban	Ethnicity 4	85	0.3%	9,629	23,308	0.4%	1,349,718
Another Hispanic, Latino, or Spanish origin	Ethnicity 997	1,012	3.6%	9,629	316,852	5.9%	1,349,718
Prefer not to answer	Ethnicity 999	2,603	9.3%	9,629	685,496	12.7%	1,349,718
Logic: if surveyable

`employment`

Value	Label	Unweighted		Weighted
`employment`
Employment status
Value	Label	Count	Percent	Count	Percent
1	Employed full-time (paid)	16,274	49.82%	2,821,097	50.55%
2	Employed part-time (paid)	3,250	9.95%	814,786	14.60%
3	Self-employed	1,690	5.17%	120,124	2.15%
5	Not employed and not looking for work (e.g., retired, stay-at-home parent, student)	9,382	28.72%	1,441,689	25.83%
6	Unemployed and looking for work	1,440	4.41%	226,176	4.05%
7	Unpaid volunteer or intern	326	1.00%	107,909	1.93%
8	Employed, but not currently working (e.g., on leave, furloughed 100%)	306	0.94%	49,453	0.89%
995	Missing Response	4,948		1,178,377
	Total valid	32,668	100.00%	5,581,235	100.00%
	Total missing	4,948		1,178,377
	Total	37,616		6,759,612
Logic: if 16 years or older

`work_mode`

Value	Label	Unweighted		Weighted
`work_mode`
Mode of transportation to work
Value	Label	Count	Percent	Count	Percent
1	Walk (or jog/wheelchair)	880	6.06%	146,847	4.60%
26	Shuttle or vanpool	53	0.36%	12,982	0.41%
100	Household vehicle (or motorcycle)	10,074	69.32%	2,441,582	76.49%
101	Other vehicle (e.g., friend’s car, rental, carshare, work car)	314	2.16%	87,597	2.74%
102	Bus	486	3.34%	83,604	2.62%
103	Bicycle or e-bicycle	455	3.13%	51,248	1.61%
104	Other	395	2.72%	105,879	3.32%
105	Rail (e.g., train, subway)	1,665	11.46%	201,122	6.30%
106	Uber/Lyft, taxi, or car service	186	1.28%	55,160	1.73%
107	Mircomobility (e.g., scooter, moped, skateboard)	24	0.17%	6,062	0.19%
995	Missing Response	23,084		3,567,530
	Total valid	14,532	100.00%	3,192,082	100.00%
	Total missing	23,084		3,567,530
	Total	37,616		6,759,612
Logic: if job_type IS NOT “work only from home” or “drive/bike/travel for work”

`job_type`

Value	Label	Unweighted		Weighted
`job_type`
Work location type
Value	Label	Count	Percent	Count	Percent
1	Go to one work location ONLY (outside of home)	9,582	47.13%	1,992,111	51.56%
2	Work location regularly varies (different offices/jobsites)	2,046	10.06%	378,600	9.80%
3	Work ONLY from home or remotely (telework, self-employed)	3,017	14.84%	596,569	15.44%
4	Drive/bike/travel for work (driver, sales, deliveries)	366	1.80%	36,115	0.93%
5	Telework some days and travel to a work location some days (work location may vary)	5,319	26.16%	860,521	22.27%
995	Missing Response	17,286		2,895,696
	Total valid	20,330	100.00%	3,863,916	100.00%
	Total missing	17,286		2,895,696
	Total	37,616		6,759,612
Logic: if employed full/part/self/volunteer

`num_jobs`

Value	Label	Unweighted		Weighted
`num_jobs`
Number of jobs
Value	Label	Count	Percent	Count	Percent
1	1 job	18,120	87.89%	3,444,005	88.01%
2	2 jobs	2,101	10.19%	403,761	10.32%
3	3 jobs	300	1.46%	49,643	1.27%
4	4 jobs	58	0.28%	7,652	0.20%
5	5 jobs	17	0.08%	3,559	0.09%
6	6 or more jobs	20	0.10%	4,750	0.12%
995	Missing Response	17,000		2,846,242
	Total valid	20,616	100.00%	3,913,369	100.00%
	Total missing	17,000		2,846,242
	Total	37,616		6,759,612
Logic: if employed full/part/furloughed/self/volunteer

`commute_subsidy`

Option	Variable	Unweighted			Weighted
`commute_subsidy`
Commute benefits provided by employer
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Free (fully subsidized) transit passes or fares	Commute Subsidy 1	936	5.3%	19,901	173,071	4.5%	2,893,742
Discounted (partially subsidized) transit passes or fares	Commute Subsidy 2	2,008	11.3%	19,901	340,190	8.8%	2,893,742
Free (fully subsidized) parking at work	Commute Subsidy 3	5,145	29.0%	19,901	1,124,870	29.1%	2,893,742
Discounted (partially subsidized) parking at work	Commute Subsidy 4	907	5.1%	19,901	172,070	4.5%	2,893,742
Ability to work from home	Commute Subsidy 5	3,928	22.2%	19,901	699,993	18.1%	2,893,742
Free/discount transit fare	Commute Subsidy 6	317	1.8%	19,901	59,788	1.5%	2,893,742
Free/discount vanpool	Commute Subsidy 7	106	0.6%	19,901	22,364	0.6%	2,893,742
Cash or incentives for carpooling, walking, or biking to work	Commute Subsidy 8	296	1.7%	19,901	60,997	1.6%	2,893,742
Free/discount Uber, Lyft, or other smartphone-app ride service	Commute Subsidy 9	135	0.8%	19,901	29,590	0.8%	2,893,742
Free/discount carshare membership/use (e.g., ZipCar)	Commute Subsidy 10	113	0.6%	19,901	23,825	0.6%	2,893,742
Free/discount shuttle service	Commute Subsidy 11	606	3.4%	19,901	112,178	2.9%	2,893,742
Free/discount bikeshare membership	Commute Subsidy 12	514	2.9%	19,901	63,799	1.7%	2,893,742
Free/discount bicycle tune-up/maintenance	Commute Subsidy 13	216	1.2%	19,901	26,190	0.7%	2,893,742
Stipend for working at home (e.g., internet, equipment)	Commute Subsidy 14	576	3.3%	19,901	116,280	3.0%	2,893,742
None of the above	Commute Subsidy 996	7,334	41.4%	19,901	1,709,321	44.2%	2,893,742
Don’t know	Commute Subsidy 998	905	5.1%	19,901	235,141	6.1%	2,893,742
Logic: if employed full/part/furloughed/volunteer

`commute_subsidy_use`

Option	Variable	Unweighted			Weighted
`commute_subsidy_use`
Commute benefit used
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Free (fully subsidized) transit passes or fares	Commute Subsidy Use 1	614	6.5%	28,140	102,876	5.4%	4,838,204
Discounted (partially subsidized) transit passes or fares	Commute Subsidy Use 2	846	8.9%	28,140	114,474	6.0%	4,838,204
Free (fully subsidized) parking at work	Commute Subsidy Use 3	4,601	48.6%	28,140	1,018,301	53.0%	4,838,204
Discounted (partially subsidized) parking at work	Commute Subsidy Use 4	466	4.9%	28,140	106,960	5.6%	4,838,204
Ability to work from home	Commute Subsidy Use 5	3,603	38.0%	28,140	644,165	33.5%	4,838,204
Free/discount transit fare	Commute Subsidy Use 6	135	1.4%	28,140	24,966	1.3%	4,838,204
Free/discount vanpool	Commute Subsidy Use 7	22	0.2%	28,140	5,056	0.3%	4,838,204
Cash or incentives for carpooling, walking, or biking to work	Commute Subsidy Use 8	84	0.9%	28,140	15,132	0.8%	4,838,204
Free/discount Uber, Lyft, or other smartphone-app service	Commute Subsidy Use 9	66	0.7%	28,140	16,487	0.9%	4,838,204
Free/discount carshare membership/use (e.g., ZipCar)	Commute Subsidy Use 10	25	0.3%	28,140	2,793	0.1%	4,838,204
Free/discount shuttle service	Commute Subsidy Use 11	191	2.0%	28,140	36,208	1.9%	4,838,204
Free or discount bikeshare membership	Commute Subsidy Use 12	159	1.7%	28,140	19,845	1.0%	4,838,204
Free or discount bicycle tune-up/maintenance	Commute Subsidy Use 13	77	0.8%	28,140	6,471	0.3%	4,838,204
Stipend for working at home (e.g., internet, equipment	Commute Subsidy Use 14	504	5.3%	28,140	103,935	5.4%	4,838,204
None of the above	Commute Subsidy Use 996	943	10.0%	28,140	212,531	11.1%	4,838,204
Logic: if selected benefit in commute_subsidy

`work_in_region`

Value	Label	Unweighted		Weighted
`work_in_region`
Work in study region
Value	Label	Count	Percent	Count	Percent
0	No	550	3.69%	115,845	4.06%
1	Yes	14,351	96.31%	2,736,787	95.94%
995	Missing Response	22,715		3,906,980
	Total valid	14,901	100.00%	2,852,632	100.00%
	Total missing	22,715		3,906,980
	Total	37,616		6,759,612
Logic: if job_type is “only one work location” or “teleworks some days and travels to a work location some days”

`work_state`

Value	Label	Unweighted		Weighted
`work_state`
Work location– State
Value	Label	Count	Percent	Count	Percent
09	Connecticut	112	0.75%	26,589	0.93%
25	Massachusetts	14,351	96.52%	2,736,787	96.11%
33	New Hampshire	159	1.07%	35,386	1.24%
36	New York	23	0.15%	3,624	0.13%
44	Rhode Island	210	1.41%	42,694	1.50%
50	Vermont	13	0.09%	2,498	0.09%
NA	No value assigned	22,748		3,912,035
	Total valid	14,868	100.00%	2,847,577	100.00%
	Total missing	22,748		3,912,035
	Total	37,616		6,759,612
Logic: if job_type is “only one work location” or “teleworks some days and travels to a work location some days”

`work_county`

Value	Label	Unweighted		Weighted
`work_county`
Work location– County
Value	Label	Count	Percent	Count	Percent
09003	Hartford County	77	0.52%	19,645	0.69%
09005	Litchfield County	5	0.03%	1,115	0.04%
09007	Middlesex County, Connecticut	1	0.01%	58	0.00%
09009	New Haven County	7	0.05%	778	0.03%
09011	New London County	7	0.05%	2,059	0.07%
09013	Tolland County	7	0.05%	815	0.03%
09015	Windham County, Connecticut	8	0.05%	2,119	0.07%
25001	Barnstable County	373	2.51%	89,589	3.15%
25003	Berkshire County	219	1.47%	40,513	1.42%
25005	Bristol County	756	5.08%	146,068	5.13%
25007	Dukes County	98	0.66%	5,750	0.20%
25009	Essex County	962	6.47%	228,695	8.03%
25011	Franklin County	131	0.88%	21,427	0.75%
25013	Hampden County	739	4.97%	149,235	5.24%
25015	Hampshire County	370	2.49%	67,519	2.37%
25017	Middlesex County	3,898	26.22%	737,491	25.90%
25019	Nantucket County	79	0.53%	7,275	0.26%
25021	Norfolk County	1,127	7.58%	252,836	8.88%
25023	Plymouth County	677	4.55%	153,925	5.41%
25025	Suffolk County	3,502	23.55%	556,554	19.54%
25027	Worcester County	1,420	9.55%	279,909	9.83%
33001	Belknap County	1	0.01%	22	0.00%
33003	Carroll County	2	0.01%	60	0.00%
33005	Cheshire County	5	0.03%	1,166	0.04%
33011	Hillsborough County	70	0.47%	16,901	0.59%
33013	Merrimack County	4	0.03%	504	0.02%
33015	Rockingham County	69	0.46%	14,523	0.51%
33017	Strafford County	7	0.05%	2,107	0.07%
33019	Sullivan County, New Hampshire	1	0.01%	104	0.00%
36001	Albany County	2	0.01%	77	0.00%
36021	Columbia County	1	0.01%	0	0.00%
36029	Erie County	1	0.01%	0	0.00%
36055	Monroe County	1	0.01%	83	0.00%
36059	Nassau County	1	0.01%	91	0.00%
36061	New York County	11	0.07%	2,348	0.08%
36065	Oneida County	1	0.01%	175	0.01%
36067	Onondaga County	2	0.01%	47	0.00%
36081	Queens County	1	0.01%	727	0.03%
36093	Schenectady County	1	0.01%	78	0.00%
36119	Montgomery County	1	0.01%	0	0.00%
44001	Bristol County, Rhode Island	6	0.04%	1,011	0.04%
44003	Kent County	14	0.09%	4,782	0.17%
44005	Newport County	35	0.24%	9,986	0.35%
44007	Providence County	151	1.02%	26,481	0.93%
44009	Washington County, Rhode Island	4	0.03%	435	0.02%
50003	Bennington County	3	0.02%	76	0.00%
50007	Chittenden County	3	0.02%	1,106	0.04%
50025	Windham County, Vermont	6	0.04%	1,086	0.04%
50027	Windsor County	1	0.01%	230	0.01%
NA	No value assigned	22,748		3,912,035
	Total valid	14,868	100.00%	2,847,577	100.00%
	Total missing	22,748		3,912,035
	Total	37,616		6,759,612
Logic: if job_type is “only one work location” or “teleworks some days and travels to a work location some days”

`education`

Value	Label	Unweighted		Weighted
`education`
Highest level of education completed
Value	Label	Count	Percent	Count	Percent
1	Less than high school	467	1.78%	154,014	2.86%
2	High school graduate/GED	2,554	9.75%	830,013	15.42%
3	Some college	2,891	11.04%	602,625	11.20%
4	Vocational/technical training	696	2.66%	206,493	3.84%
5	Associate degree	1,461	5.58%	280,729	5.22%
6	Bachelor’s degree	8,058	30.77%	1,455,989	27.05%
7	Graduate/post-graduate degree	9,233	35.25%	1,573,602	29.24%
999	Prefer not to answer	831	3.17%	278,949	5.18%
995	Missing Response	11,425		1,377,197
	Total valid	26,191	100.00%	5,382,415	100.00%
	Total missing	11,425		1,377,197
	Total	37,616		6,759,612
Logic: if participant

`student`

Value	Label	Unweighted		Weighted
`student`
Student status and location
Value	Label	Count	Percent	Count	Percent
0	Full-time student, currently attending some or all classes in-person	2,156	6.60%	468,359	8.39%
1	Part-time student, currently attending some or all classes in-person	542	1.66%	115,454	2.07%
2	Not a student	29,196	89.37%	4,838,956	86.70%
3	Part-time student, ONLY online classes	529	1.62%	100,687	1.80%
4	Full-time student, ONLY online classes	245	0.75%	57,779	1.04%
995	Missing Response	4,948		1,178,377
	Total valid	32,668	100.00%	5,581,235	100.00%
	Total missing	4,948		1,178,377
	Total	37,616		6,759,612
Logic: if surveyable

`school_mode`

Value	Label	Unweighted		Weighted
`school_mode`
Mode of transportation to school
Value	Label	Count	Percent	Count	Percent
1	Walk (or jog/wheelchair)	698	13.12%	194,883	12.78%
24	School bus	1,354	25.45%	394,970	25.90%
100	Household vehicle (or motorcycle)	2,368	44.51%	672,154	44.07%
101	Other vehicle (e.g., friend’s car, rental, work car)	114	2.14%	37,341	2.45%
102	Bus, shuttle, or vanpool	336	6.32%	94,756	6.21%
103	Bicycle or e-bicycle	141	2.65%	33,158	2.17%
104	Other	87	1.64%	27,895	1.83%
105	Rail (e.g., train, subway)	174	3.27%	51,546	3.38%
106	Uber/Lyft, taxi, or car service	36	0.68%	14,730	0.97%
107	Micromobility (e.g., scooter moped, skateboard)	12	0.23%	3,789	0.25%
995	Missing Response	32,296		5,234,389
	Total valid	5,320	100.00%	1,525,223	100.00%
	Total missing	32,296		5,234,389
	Total	37,616		6,759,612
Logic: if adult student and school_freq is not never or child who attends school or daycare

`school_type`

Value	Label	Unweighted		Weighted
`school_type`
Type of school attends
Value	Label	Count	Percent	Count	Percent
1	Cared for at home	580	7.36%	124,310	6.47%
2	Daycare outside home	615	7.81%	131,643	6.85%
3	Preschool	459	5.83%	93,171	4.85%
4	Home school	129	1.64%	33,651	1.75%
5	Elementary school (public, private, charter)	1,722	21.86%	443,334	23.08%
6	Middle school (public, private, charter)	865	10.98%	219,732	11.44%
7	High school (public, private, charter)	1,148	14.57%	298,245	15.53%
10	Vocational/technical school	121	1.54%	37,088	1.93%
11	2-year college	345	4.38%	77,811	4.05%
12	4-year college	635	8.06%	186,775	9.72%
13	Graduate or professional school	927	11.77%	206,693	10.76%
997	Other	331	4.20%	68,204	3.55%
995	Missing Response	29,739		4,838,956
	Total valid	7,877	100.00%	1,920,656	100.00%
	Total missing	29,739		4,838,956
	Total	37,616		6,759,612
Logic: if age 0-15 or adult student

`school_freq`

Value	Label	Unweighted		Weighted
`school_freq`
Frequency of travel to school
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	163	2.97%	60,188	3.80%
2	5 days a week	4,066	74.10%	1,151,075	72.62%
3	4 days a week	224	4.08%	65,095	4.11%
4	3 days a week	276	5.03%	71,328	4.50%
5	2 days a week	244	4.45%	67,612	4.27%
6	1 day a week	122	2.22%	39,936	2.52%
7	1-3 days a month	58	1.06%	16,753	1.06%
8	Less than monthly	78	1.42%	21,579	1.36%
996	Never	256	4.67%	91,539	5.77%
995	Missing Response	32,129		5,174,507
	Total valid	5,487	100.00%	1,585,104	100.00%
	Total missing	32,129		5,174,507
	Total	37,616		6,759,612
Logic: if adult student or child who attends school or daycare

`remote_class_freq`

Value	Label	Unweighted		Weighted
`remote_class_freq`
Frequency of remote schooling
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	92	1.39%	24,281	1.30%
2	5 days a week	285	4.29%	75,531	4.05%
3	4 days a week	67	1.01%	21,571	1.16%
4	3 days a week	146	2.20%	49,065	2.63%
5	2 days a week	213	3.21%	55,260	2.97%
6	1 day a week	278	4.19%	83,025	4.46%
7	1-3 days a month	92	1.39%	26,356	1.41%
8	Less than monthly	214	3.22%	56,474	3.03%
996	Never	5,249	79.10%	1,471,459	78.98%
995	Missing Response	30,980		4,896,589
	Total valid	6,636	100.00%	1,863,022	100.00%
	Total missing	30,980		4,896,589
	Total	37,616		6,759,612
Logic: if adult student and school_freq is not 6-7 days or child who is not cared for at home or attending daycare and school_freq is not 6 or 7 days

`school_in_region`

Value	Label	Unweighted		Weighted
`school_in_region`
School in study region
Value	Label	Count	Percent	Count	Percent
0	No	186	2.88%	50,282	3.12%
1	Yes	6,281	97.12%	1,560,355	96.88%
995	Missing Response	31,149		5,148,975
	Total valid	6,467	100.00%	1,610,637	100.00%
	Total missing	31,149		5,148,975
	Total	37,616		6,759,612
Logic: if attends school in person

`school_state`

Value	Label	Unweighted		Weighted
`school_state`
School location– State
Value	Label	Count	Percent	Count	Percent
09	Connecticut	31	0.48%	8,395	0.53%
25	Massachusetts	6,281	98.02%	1,560,355	97.98%
33	New Hampshire	26	0.41%	6,009	0.38%
36	New York	16	0.25%	5,642	0.35%
44	Rhode Island	45	0.70%	8,368	0.53%
50	Vermont	9	0.14%	3,747	0.24%
NA	No value assigned	31,208		5,167,096
	Total valid	6,408	100.00%	1,592,515	100.00%
	Total missing	31,208		5,167,096
	Total	37,616		6,759,612
Logic: if attends school in person

`school_county`

Value	Label	Unweighted		Weighted
`school_county`
School location– County
Value	Label	Count	Percent	Count	Percent
09001	Fairfield County	3	0.05%	1,158	0.07%
09003	Hartford County	10	0.16%	2,112	0.13%
09009	New Haven County	4	0.06%	277	0.02%
09013	Tolland County	9	0.14%	1,730	0.11%
09015	Windham County, Connecticut	5	0.08%	3,118	0.20%
25001	Barnstable County	110	1.72%	34,649	2.18%
25003	Berkshire County	69	1.08%	15,069	0.95%
25005	Bristol County	458	7.15%	109,482	6.87%
25007	Dukes County	18	0.28%	1,284	0.08%
25009	Essex County	583	9.10%	155,386	9.76%
25011	Franklin County	64	1.00%	12,777	0.80%
25013	Hampden County	446	6.96%	93,753	5.89%
25015	Hampshire County	232	3.62%	47,161	2.96%
25017	Middlesex County	1,584	24.72%	414,857	26.05%
25019	Nantucket County	27	0.42%	3,000	0.19%
25021	Norfolk County	554	8.65%	144,727	9.09%
25023	Plymouth County	402	6.27%	99,477	6.25%
25025	Suffolk County	869	13.56%	239,075	15.01%
25027	Worcester County	865	13.50%	189,658	11.91%
33001	Belknap County	1	0.02%	383	0.02%
33011	Hillsborough County	10	0.16%	2,318	0.15%
33015	Rockingham County	10	0.16%	1,937	0.12%
33017	Strafford County	5	0.08%	1,372	0.09%
36005	Bronx County	1	0.02%	28	0.00%
36021	Columbia County	4	0.06%	2,445	0.15%
36027	Dutchess County	1	0.02%	53	0.00%
36059	Nassau County	1	0.02%	59	0.00%
36061	New York County	3	0.05%	1,118	0.07%
36067	Onondaga County	2	0.03%	1,401	0.09%
36085	Richmond County	1	0.02%	19	0.00%
36103	Suffolk County, New York	1	0.02%	29	0.00%
36111	Ulster County	1	0.02%	338	0.02%
36119	Westchester County	1	0.02%	151	0.01%
44001	Bristol County, Rhode Island	5	0.08%	421	0.03%
44003	Kent County	3	0.05%	224	0.01%
44005	Newport County	4	0.06%	413	0.03%
44007	Providence County	25	0.39%	6,776	0.43%
44009	Washington County, Rhode Island	8	0.12%	534	0.03%
50003	Dallas County	1	0.02%	0	0.00%
50005	Caledonia County	1	0.02%	783	0.05%
50007	Chittenden County	6	0.09%	2,964	0.19%
50023	Polk County	1	0.02%	0	0.00%
NA	No value assigned	31,208		5,167,096
	Total valid	6,408	100.00%	1,592,515	100.00%
	Total missing	31,208		5,167,096
	Total	37,616		6,759,612
Logic: if attends school in person

`second_home`

Value	Label	Unweighted		Weighted
`second_home`
Regularly spends the night at a second home (e.g., another parent or grandparent’s house, partner or spouse’s home, or a vacation home)
Value	Label	Count	Percent	Count	Percent
0	Does not regularly spend night at second home	33,656	93.83%	6,317,265	93.46%
1	Regularly spends night at second home	2,214	6.17%	442,347	6.54%
995	Missing Response	1,746		0
	Total valid	35,870	100.00%	6,759,612	100.00%
	Total missing	1,746		0
	Total	37,616		6,759,612

`can_drive`

Value	Label	Unweighted		Weighted
`can_drive`
Household member drives
Value	Label	Count	Percent	Count	Percent
0	No, does not drive	3,963	12.80%	795,399	14.25%
1	Yes, drives	26,991	87.20%	4,785,835	85.75%
995	Missing Response	6,662		1,178,377
	Total valid	30,954	100.00%	5,581,235	100.00%
	Total missing	6,662		1,178,377
	Total	37,616		6,759,612
Logic: if surveyable and 16 or over

`vehicle`

Value	Label	Unweighted		Weighted
`vehicle`
Vehicle driven most often
Value	Label	Count	Percent	Count	Percent
6	Household vehicle 1	14,653	66.52%	2,712,106	60.32%
7	Household vehicle 2	5,575	25.31%	1,246,157	27.72%
8	Household vehicle 3	873	3.96%	265,603	5.91%
9	Household vehicle 4	143	0.65%	56,558	1.26%
10	Household vehicle 5	32	0.15%	12,614	0.28%
11	Household vehicle 6	7	0.03%	2,027	0.05%
12	Household vehicle 7	3	0.01%	229	0.01%
18	A carshare vehicle (e.g., ZipCar)	20	0.09%	7,385	0.16%
996	None (I do not drive a vehicle)	345	1.57%	100,287	2.23%
997	Other vehicle	376	1.71%	93,024	2.07%
995	Missing Response	15,589		2,263,622
	Total valid	22,027	100.00%	4,495,989	100.00%
	Total missing	15,589		2,263,622
	Total	37,616		6,759,612
Logic: if household has vehicle and person drives

`transit_freq`

Value	Label	Unweighted		Weighted
`transit_freq`
Frequency of transit trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	615	2.33%	113,016	2.09%
2	5 days a week	910	3.45%	150,156	2.78%
3	4 days a week	611	2.32%	104,343	1.93%
4	3 days a week	979	3.72%	165,283	3.06%
5	2 days a week	896	3.40%	173,234	3.20%
6	1 day a week	940	3.57%	158,319	2.93%
7	1-3 days a month	2,079	7.89%	392,880	7.27%
8	Less than monthly	7,408	28.11%	1,501,751	27.78%
9	Never	11,914	45.21%	2,646,305	48.96%
995	Missing Response	11,264		1,354,325
	Total valid	26,352	100.00%	5,405,287	100.00%
	Total missing	11,264		1,354,325
	Total	37,616		6,759,612
Logic: if participant

`tnc_freq`

Value	Label	Unweighted		Weighted
`tnc_freq`
Frequency of TNC trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	78	0.62%	26,977	1.11%
2	5 days a week	94	0.74%	26,143	1.08%
3	4 days a week	99	0.78%	28,341	1.17%
4	3 days a week	217	1.72%	45,579	1.88%
5	2 days a week	341	2.70%	77,206	3.19%
6	1 day a week	522	4.13%	96,383	3.98%
7	1-3 days a month	2,852	22.57%	516,289	21.31%
8	Less than monthly	8,433	66.74%	1,605,379	66.28%
995	Missing Response	24,980		4,337,317
	Total valid	12,636	100.00%	2,422,295	100.00%
	Total missing	24,980		4,337,317
	Total	37,616		6,759,612
Logic: if uses smartphone-app ride services

`bike_freq`

Value	Label	Unweighted		Weighted
`bike_freq`
Frequency of bike trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	371	1.42%	65,897	1.23%
2	5 days a week	339	1.30%	53,781	1.00%
3	4 days a week	340	1.30%	57,521	1.07%
4	3 days a week	614	2.35%	115,527	2.15%
5	2 days a week	632	2.42%	123,733	2.30%
6	1 day a week	711	2.72%	148,050	2.75%
7	1-3 days a month	1,807	6.92%	357,144	6.64%
8	Less than monthly	5,980	22.89%	1,247,989	23.21%
996	Never	15,336	58.69%	3,208,010	59.65%
995	Missing Response	11,486		1,381,960
	Total valid	26,130	100.00%	5,377,652	100.00%
	Total missing	11,486		1,381,960
	Total	37,616		6,759,612
Logic: if participant

`vanpool_freq`

Value	Label	Unweighted		Weighted
`vanpool_freq`
Frequency of vanpool trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	5	2.73%	1,953	4.49%
2	5 days a week	14	7.65%	5,351	12.31%
3	4 days a week	5	2.73%	979	2.25%
4	3 days a week	15	8.20%	3,919	9.02%
5	2 days a week	9	4.92%	2,681	6.17%
6	1 day a week	14	7.65%	4,594	10.57%
7	1-3 days a month	26	14.21%	7,463	17.17%
8	Less than monthly	95	51.91%	16,518	38.01%
995	Missing Response	37,433		6,716,154
	Total valid	183	100.00%	43,458	100.00%
	Total missing	37,433		6,716,154
	Total	37,616		6,759,612
Logic: if uses vanpool

`bikeshare_freq`

Value	Label	Unweighted		Weighted
`bikeshare_freq`
Frequency of bike-share trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	41	2.89%	5,884	2.72%
2	5 days a week	34	2.39%	5,378	2.48%
3	4 days a week	35	2.46%	4,573	2.11%
4	3 days a week	61	4.29%	11,262	5.20%
5	2 days a week	71	5.00%	10,483	4.84%
6	1 day a week	81	5.70%	16,591	7.66%
7	1-3 days a month	270	19.00%	33,143	15.30%
8	Less than monthly	828	58.27%	129,339	59.70%
995	Missing Response	36,195		6,542,959
	Total valid	1,421	100.00%	216,653	100.00%
	Total missing	36,195		6,542,959
	Total	37,616		6,759,612
Logic: if uses bikeshare

`scootshare_freq`

Value	Label	Unweighted		Weighted
`scootshare_freq`
Frequency of scooter-share trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	1	0.29%	605	1.06%
2	5 days a week	1	0.29%	19	0.03%
3	4 days a week	1	0.29%	406	0.71%
4	3 days a week	4	1.16%	1,295	2.26%
5	2 days a week	3	0.87%	656	1.14%
6	1 day a week	3	0.87%	995	1.74%
7	1-3 days a month	16	4.62%	4,392	7.66%
8	Less than monthly	317	91.62%	48,979	85.41%
995	Missing Response	37,270		6,702,265
	Total valid	346	100.00%	57,347	100.00%
	Total missing	37,270		6,702,265
	Total	37,616		6,759,612
Logic: if uses scooter share

`walk_freq`

Value	Label	Unweighted		Weighted
`walk_freq`
Frequency of walk trips
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	7,774	29.50%	1,420,733	26.28%
2	5 days a week	3,229	12.25%	648,088	11.99%
3	4 days a week	1,989	7.55%	385,005	7.12%
4	3 days a week	2,767	10.50%	556,070	10.29%
5	2 days a week	1,993	7.56%	428,385	7.93%
6	1 day a week	1,408	5.34%	286,319	5.30%
7	1-3 days a month	1,620	6.15%	343,158	6.35%
8	Less than monthly	5,572	21.14%	1,337,530	24.74%
995	Missing Response	11,264		1,354,325
	Total valid	26,352	100.00%	5,405,287	100.00%
	Total missing	11,264		1,354,325
	Total	37,616		6,759,612
Logic: if participant

`micromobility_devices`

Option	Variable	Unweighted			Weighted
`micromobility_devices`
Micromobility device used
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Scooter	Micromobility Devices 1	451	2.3%	18,367	92,481	2.5%	3,119,926
Moped	Micromobility Devices 2	77	0.4%	18,367	22,679	0.6%	3,119,926
Skateboard or rollerblades	Micromobility Devices 3	395	2.1%	18,367	88,525	2.4%	3,119,926
None	Micromobility Devices 996	18,233	94.7%	18,367	3,428,633	94.2%	3,119,926
Other	Micromobility Devices 997	192	1.0%	18,367	38,477	1.1%	3,119,926
Logic: if rMove or (rMove for Web and person 1)

`share`

Option	Variable	Unweighted			Weighted
`share`
Share service used
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Uber, Lyft, or other smartphone-app ride service	Share 2	12,636	48.0%	11,265	2,422,295	44.8%	1,354,482
Carshare (e.g., Zipcar)	Share 3	540	2.0%	11,265	91,937	1.7%	1,354,482
Peer-to-peer car rental (e.g., Turo)	Share 4	392	1.5%	11,265	79,819	1.5%	1,354,482
Bikeshare or bike rental service	Share 5	1,421	5.4%	11,265	216,653	4.0%	1,354,482
Vanpool	Share 6	183	0.7%	11,265	43,458	0.8%	1,354,482
Scooter share (e.g., Bird, Lime)	Share 7	346	1.3%	11,265	57,347	1.1%	1,354,482
None of the above	Share 996	13,370	50.7%	11,265	2,905,296	53.8%	1,354,482
Logic: if participant

`transit_pass`

Value	Label	Unweighted		Weighted
`transit_pass`
Ownership/type of transit pass
Value	Label	Count	Percent	Count	Percent
0	No	21,033	79.82%	4,466,017	82.62%
1	Yes	5,318	20.18%	939,236	17.38%
995	Missing Response	11,265		1,354,359
	Total valid	26,351	100.00%	5,405,253	100.00%
	Total missing	11,265		1,354,359
	Total	37,616		6,759,612
Logic: if participant

`disability`

Value	Label	Unweighted		Weighted
`disability`
Disability status
Value	Label	Count	Percent	Count	Percent
0	No	23,001	87.82%	4,660,243	86.58%
1	Yes	2,004	7.65%	399,484	7.42%
999	Prefer not to answer	1,186	4.53%	322,687	6.00%
995	Missing Response	11,425		1,377,197
	Total valid	26,191	100.00%	5,382,415	100.00%
	Total missing	11,425		1,377,197
	Total	37,616		6,759,612
Logic: if participant

`participate`

Value	Label	Unweighted		Weighted
`participate`
Willingness to participate in future studies
Value	Label	Count	Percent	Count	Percent
0	No	6,256	23.74%	1,509,453	27.93%
1	Yes	20,094	76.26%	3,895,535	72.07%
995	Missing Response	11,266		1,354,623
	Total valid	26,350	100.00%	5,404,988	100.00%
	Total missing	11,266		1,354,623
	Total	37,616		6,759,612
Logic: if participant

`barriers`

Option	Variable	Unweighted			Weighted
`barriers`
Barrier to making trips
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Parking was too expensive	Barriers 1	1,307	5.0%	11,266	287,038	5.3%	1,354,623
Vehicle was not working	Barriers 2	533	2.0%	11,266	135,621	2.5%	1,354,623
Vehicle was not available	Barriers 3	909	3.4%	11,266	226,251	4.2%	1,354,623
Bus was not running	Barriers 4	639	2.4%	11,266	124,018	2.3%	1,354,623
No access to app-based rides (e.g., Uber, Lyft)	Barriers 5	195	0.7%	11,266	44,792	0.8%	1,354,623
Inadequate bike parking	Barriers 6	259	1.0%	11,266	39,444	0.7%	1,354,623
Concerns about safety	Barriers 7	652	2.5%	11,266	133,189	2.5%	1,354,623
This has not happened in the past 7 days	Barriers 8	19,410	73.7%	11,266	3,904,961	72.2%	1,354,623
Not enough biking paths/lanes	Barriers 9	624	2.4%	11,266	97,005	1.8%	1,354,623
No connection to transit	Barriers 10	1,384	5.3%	11,266	240,435	4.4%	1,354,623
Other	Barriers 997	1,511	5.7%	11,266	281,936	5.2%	1,354,623
Prefer not to answer	Barriers 999	1,392	5.3%	11,266	375,305	6.9%	1,354,623

`bike_comfort_lane`

Value	Label	Unweighted		Weighted
`bike_comfort_lane`
Comfort level riding a bike on a major street with four lanes and a wide bike lane physically separated from traffic by a raised curb, planters, or parked cars
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	5,185	26.94%	999,917	27.47%
2	Somewhat comfortable	5,444	28.28%	1,017,879	27.97%
3	Somewhat uncomfortable	3,214	16.70%	593,735	16.31%
4	Very uncomfortable	5,406	28.08%	1,028,155	28.25%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_local`

Value	Label	Unweighted		Weighted
`bike_comfort_local`
Comfort level riding a bike on a quiet residential street with bicycle route markings, wide speed humps, and other things to discourage and slow down car traffic
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	9,824	51.04%	1,885,801	51.81%
2	Somewhat comfortable	4,671	24.27%	842,882	23.16%
3	Somewhat uncomfortable	1,747	9.08%	325,418	8.94%
4	Very uncomfortable	3,007	15.62%	585,585	16.09%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_major`

Value	Label	Unweighted		Weighted
`bike_comfort_major`
Comfort level riding a bike on a major street with four lanes and no bike lane
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	1,845	9.58%	402,348	11.05%
2	Somewhat comfortable	1,072	5.57%	225,779	6.20%
3	Somewhat uncomfortable	2,098	10.90%	387,134	10.64%
4	Very uncomfortable	14,234	73.95%	2,624,425	72.11%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_minor`

Value	Label	Unweighted		Weighted
`bike_comfort_minor`
Comfort level riding a bike on a minor street with two lanes and no bike lane
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	2,508	13.03%	519,592	14.28%
2	Somewhat comfortable	4,536	23.56%	839,196	23.06%
3	Somewhat uncomfortable	5,884	30.57%	1,047,824	28.79%
4	Very uncomfortable	6,321	32.84%	1,233,073	33.88%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_neighborhood`

Value	Label	Unweighted		Weighted
`bike_comfort_neighborhood`
Comfort level riding a bike on a quiet residential street
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	9,321	48.42%	1,802,397	49.52%
2	Somewhat comfortable	5,606	29.12%	1,025,269	28.17%
3	Somewhat uncomfortable	1,713	8.90%	298,738	8.21%
4	Very uncomfortable	2,609	13.55%	513,282	14.10%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_paths`

Value	Label	Unweighted		Weighted
`bike_comfort_paths`
Comfort level riding a bike on a path or trail separate from the street
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	12,031	62.50%	2,240,073	61.55%
2	Somewhat comfortable	3,197	16.61%	621,699	17.08%
3	Somewhat uncomfortable	1,022	5.31%	193,601	5.32%
4	Very uncomfortable	2,999	15.58%	584,312	16.05%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_street`

Value	Label	Unweighted		Weighted
`bike_comfort_street`
Comfort level riding a bike on a minor street with two lanes and a striped bike lane
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	4,057	21.08%	787,733	21.64%
2	Somewhat comfortable	6,532	33.93%	1,208,222	33.20%
3	Somewhat uncomfortable	4,413	22.93%	849,892	23.35%
4	Very uncomfortable	4,247	22.06%	793,840	21.81%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_comfort_striped`

Value	Label	Unweighted		Weighted
`bike_comfort_striped`
Comfort level riding a bike on a major street with four lanes and a striped bike lane
Value	Label	Count	Percent	Count	Percent
1	Very comfortable	1,864	9.68%	397,197	10.91%
2	Somewhat comfortable	3,144	16.33%	624,084	17.15%
3	Somewhat uncomfortable	5,844	30.36%	1,070,340	29.41%
4	Very uncomfortable	8,397	43.62%	1,548,065	42.53%
NA	No value assigned	18,367		3,119,926
	Total valid	19,249	100.00%	3,639,686	100.00%
	Total missing	18,367		3,119,926
	Total	37,616		6,759,612
Logic: if rMove or bMove person 1

`bike_factors`

Option	Variable	Unweighted			Weighted
`bike_factors`
Factor to increase biking frequency
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Safer physical riding conditions for bicyclists (e.g., bike paths separated from motor vehicles)	Bike Factors 1	9,708	37.2%	11,488	1,809,828	33.7%	1,382,113
An expanded bike network with more routes between my origin and destination	Bike Factors 2	6,592	25.2%	11,488	1,200,157	22.3%	1,382,113
Better knowledge of the best bike route to my destination	Bike Factors 3	2,711	10.4%	11,488	526,131	9.8%	1,382,113
More public secure bike storage	Bike Factors 4	3,525	13.5%	11,488	639,037	11.9%	1,382,113
More attractive routes (visually pleasing, improve non-traffic related safety)	Bike Factors 5	4,041	15.5%	11,488	762,196	14.2%	1,382,113
Lower cost electric bikes or similar equipment (e.g., electric scooters)	Bike Factors 6	2,492	9.5%	11,488	506,174	9.4%	1,382,113
Expanded bike share system	Bike Factors 7	1,126	4.3%	11,488	199,890	3.7%	1,382,113
Better maintenance of existing bicycle infrastructure (e.g., clearing paths of debris and/or snow during the winter)	Bike Factors 8	2,975	11.4%	11,488	520,146	9.7%	1,382,113
Don’t have access to a bike, but may in the future	Bike Factors 9	2,450	9.4%	11,488	477,218	8.9%	1,382,113
Don’t have access to a bike and will not in the future	Bike Factors 10	2,339	9.0%	11,488	474,398	8.8%	1,382,113
Other, specify	Bike Factors 11	1,833	7.0%	11,488	341,433	6.3%	1,382,113
None of the above	Bike Factors 12	9,385	35.9%	11,488	2,090,915	38.9%	1,382,113

`bike_purpose`

Option	Variable	Unweighted			Weighted
`bike_purpose`
Purpose used bicycle for in the past 30 days
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
To go to/from grocery/food shopping	Bike Purpose 1	1,168	24.3%	32,805	187,440	20.3%	5,838,353
To go to/from other shopping (e.g., pharmacy)	Bike Purpose 2	1,096	22.8%	32,805	166,794	18.1%	5,838,353
To go to/from medical appointment	Bike Purpose 3	479	10.0%	32,805	73,470	8.0%	5,838,353
To visit friends or relatives	Bike Purpose 4	1,068	22.2%	32,805	190,001	20.6%	5,838,353
To go to/from work	Bike Purpose 5	1,018	21.2%	32,805	145,278	15.8%	5,838,353
For other work-related reason	Bike Purpose 6	218	4.5%	32,805	36,932	4.0%	5,838,353
Other	Bike Purpose 7	336	7.0%	32,805	54,701	5.9%	5,838,353
For exercise or recreation	Bike Purpose 8	4,146	86.2%	32,805	798,119	86.6%	5,838,353
Logic: if bike_freq > never or less than monthly

`bike_safety`

Option	Variable	Unweighted			Weighted
`bike_safety`
Safety concerns preventing bicycle use
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
No safe route to ride my bike	Bike Safety 1	2,179	55.1%	33,658	409,734	58.2%	6,055,623
Biking routes don’t put enough seperations between me and moving cars	Bike Safety 2	2,194	55.4%	33,658	371,128	52.7%	6,055,623
Crossing intersections is too stressful	Bike Safety 3	1,661	42.0%	33,658	287,887	40.9%	6,055,623
Concerns about distracted or impaired drivers	Bike Safety 4	2,184	55.2%	33,658	386,395	54.9%	6,055,623
Speed of vehicle traffic is too high	Bike Safety 5	1,807	45.7%	33,658	307,386	43.7%	6,055,623
Poor bike path conditions or no bike paths available	Bike Safety 6	1,655	41.8%	33,658	288,539	41.0%	6,055,623
Poor or no lighting near bike paths	Bike Safety 7	543	13.7%	33,658	108,948	15.5%	6,055,623
Other, specify	Bike Safety 8	684	17.3%	33,658	118,720	16.9%	6,055,623
Logic: if why_no_bike = safety concern

`bike_store`

Option	Variable	Unweighted			Weighted
`bike_store`
Bicycle storage location
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Inside house/apartment (includes garage, porch, storage area)	Bike Store 1	9,179	89.4%	27,348	1,830,334	90.0%	4,725,996
Bike rack	Bike Store 2	348	3.4%	27,348	68,176	3.4%	4,725,996
Bike locker	Bike Store 3	57	0.6%	27,348	17,239	0.8%	4,725,996
Secured bike room	Bike Store 4	256	2.5%	27,348	38,807	1.9%	4,725,996
Locked to other object (e.g., post, tree)	Bike Store 5	202	2.0%	27,348	38,210	1.9%	4,725,996
In a parking garage/lot	Bike Store 6	258	2.5%	27,348	46,412	2.3%	4,725,996
Unlocked on-street	Bike Store 7	54	0.5%	27,348	15,861	0.8%	4,725,996
Other	Bike Store 997	395	3.8%	27,348	77,301	3.8%	4,725,996
Logic: if household has at least one bike

`carshare_freq`

Value	Label	Unweighted		Weighted
`carshare_freq`
Carshare use frequency
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	7	1.30%	3,013	3.28%
2	5 days a week	11	2.04%	3,638	3.96%
3	4 days a week	7	1.30%	2,112	2.30%
4	3 days a week	9	1.67%	1,290	1.40%
5	2 days a week	13	2.41%	4,477	4.87%
6	1 day a week	16	2.96%	2,993	3.26%
7	1-3 days a month	93	17.22%	12,167	13.23%
8	Less than monthly	384	71.11%	62,247	67.71%
995	Missing Response	37,076		6,667,675
	Total valid	540	100.00%	91,937	100.00%
	Total missing	37,076		6,667,675
	Total	37,616		6,759,612
Logic: if uses carshare

`commute_days`

Option	Variable	Unweighted			Weighted
`commute_days`
Day commuted to workplace last week
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Monday	Commute Days 1	8,451	58.1%	23,081	1,900,053	59.6%	3,571,608
Tuesday	Commute Days 2	9,896	68.1%	23,081	2,118,335	66.4%	3,571,608
Wednesday	Commute Days 3	9,809	67.5%	23,081	2,114,884	66.3%	3,571,608
Thursday	Commute Days 4	9,626	66.2%	23,081	2,046,202	64.2%	3,571,608
Friday	Commute Days 5	7,829	53.9%	23,081	1,755,828	55.1%	3,571,608
Saturday	Commute Days 6	1,874	12.9%	23,081	467,739	14.7%	3,571,608
Sunday	Commute Days 7	1,371	9.4%	23,081	359,638	11.3%	3,571,608
None	Commute Days 996	1,737	12.0%	23,081	407,236	12.8%	3,571,608
Logic: if employment = full/part/self/volunteer and job_type IS NOT “work only from home” or “drive/bike/travel for work”

`ev_subsidies`

Value	Label	Unweighted		Weighted
`ev_subsidies`
Familiarity rebates/subsidies for purchasing an electric vehicle
Value	Label	Count	Percent	Count	Percent
1	Extremely familiar	902	5.51%	176,407	5.71%
2	Very familiar	1,425	8.70%	256,220	8.29%
3	Moderately familiar	3,350	20.46%	630,927	20.41%
4	Slightly familiar	4,092	25.00%	736,192	23.82%
5	Not at all familiar	6,602	40.33%	1,291,067	41.77%
995	Missing Response	21,245		3,668,798
	Total valid	16,371	100.00%	3,090,813	100.00%
	Total missing	21,245		3,668,798
	Total	37,616		6,759,612
Logic: if rMove or (rMove for Web and person 1)

`ev_typical_charge`

Option	Variable	Unweighted			Weighted
`ev_typical_charge`
Electric vehicle charging location
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
At home	Ev Typical Charge 1	1,094	85.3%	36,333	206,093	83.8%	6,513,799
At work	Ev Typical Charge 2	223	17.4%	36,333	47,221	19.2%	6,513,799
At a commute location (e.g., Park and Ride lot, parking garage)	Ev Typical Charge 3	121	9.4%	36,333	24,849	10.1%	6,513,799
At a shopping location (e.g., grocery store, shopping mall)	Ev Typical Charge 4	355	27.7%	36,333	69,027	28.1%	6,513,799
At a public location (e.g., hospital, library, government building)	Ev Typical Charge 5	265	20.7%	36,333	44,445	18.1%	6,513,799
At a hotel/inn	Ev Typical Charge 6	46	3.6%	36,333	7,920	3.2%	6,513,799
Other	Ev Typical Charge 997	134	10.4%	36,333	26,749	10.9%	6,513,799
Logic: if fuel type of primary vehicle driven is electric or PHEV

`home_vehicle_park`

Value	Label	Unweighted		Weighted
`home_vehicle_park`
Typical household vehicle parking location
Value	Label	Count	Percent	Count	Percent
1	Home driveway/garage	13,053	79.73%	2,494,556	80.71%
2	Parking lot/garage	1,821	11.12%	306,801	9.93%
3	On-street parking	1,497	9.14%	289,456	9.37%
995	Missing Response	21,245		3,668,798
	Total valid	16,371	100.00%	3,090,813	100.00%
	Total missing	21,245		3,668,798
	Total	37,616		6,759,612
Logic: if household has vehicles and person drives; if rMove or (rMove for Web and person 1)

`home_vehicle_park_pay`

Value	Label	Unweighted		Weighted
`home_vehicle_park_pay`
Pays to park vehicle at home
Value	Label	Count	Percent	Count	Percent
0	No	14,782	90.29%	2,837,575	91.81%
1	Yes	1,589	9.71%	253,239	8.19%
995	Missing Response	21,245		3,668,798
	Total valid	16,371	100.00%	3,090,813	100.00%
	Total missing	21,245		3,668,798
	Total	37,616		6,759,612
Logic: if household has vehicles and person drives; if rMove or (rMove for Web and person 1)

`home_vehicle_park_permit`

Value	Label	Unweighted		Weighted
`home_vehicle_park_permit`
Purchased a residential parking pass to park vehicle
Value	Label	Count	Percent	Count	Percent
0	No	924	61.72%	194,757	67.28%
1	Yes	573	38.28%	94,698	32.72%
995	Missing Response	36,119		6,470,156
	Total valid	1,497	100.00%	289,456	100.00%
	Total missing	36,119		6,470,156
	Total	37,616		6,759,612
Logic: if household typically parks on-street parking; if rMove or (rMove for Web and person 1)

`peerrent_freq`

Value	Label	Unweighted		Weighted
`peerrent_freq`
Peer-to-peer car rental use frequency
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	4	1.02%	1,316	1.65%
2	5 days a week	8	2.04%	1,984	2.49%
3	4 days a week	7	1.79%	3,064	3.84%
4	3 days a week	4	1.02%	1,593	2.00%
5	2 days a week	8	2.04%	3,969	4.97%
6	1 day a week	5	1.28%	852	1.07%
7	1-3 days a month	13	3.32%	2,005	2.51%
8	Less than monthly	343	87.50%	65,035	81.48%
995	Missing Response	37,224		6,679,793
	Total valid	392	100.00%	79,819	100.00%
	Total missing	37,224		6,679,793
	Total	37,616		6,759,612
Logic: if uses peer-to-peer car rental

`telework_days`

Option	Variable	Unweighted			Weighted
`telework_days`
Day teleworked last week
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Monday	Telework Days 1	3,336	22.9%	23,072	624,107	19.6%	3,567,862
Tuesday	Telework Days 2	2,693	18.5%	23,072	532,572	16.7%	3,567,862
Wednesday	Telework Days 3	2,693	18.5%	23,072	523,295	16.4%	3,567,862
Thursday	Telework Days 4	2,850	19.6%	23,072	563,824	17.7%	3,567,862
Friday	Telework Days 5	4,043	27.8%	23,072	750,186	23.5%	3,567,862
Saturday	Telework Days 6	439	3.0%	23,072	93,830	2.9%	3,567,862
Sunday	Telework Days 7	368	2.5%	23,072	71,640	2.2%	3,567,862
None	Telework Days 996	8,721	60.0%	23,072	2,093,287	65.6%	3,567,862
Logic: if job_type IS NOT “work only from home” or “drive/bike/travel for work”

`telework_freq_pre_covid`

Value	Label	Unweighted		Weighted
`telework_freq_pre_covid`
Days worked from home before March 2020
Value	Label	Count	Percent	Count	Percent
1	6-7 days a week	527	2.85%	99,004	2.44%
2	5 days a week	1,968	10.65%	437,467	10.77%
3	4 days a week	437	2.36%	108,479	2.67%
4	3 days a week	537	2.91%	116,195	2.86%
5	2 days a week	692	3.74%	141,134	3.48%
6	1 day a week	804	4.35%	158,658	3.91%
7	1-3 days a month	898	4.86%	162,641	4.01%
8	Less than monthly	1,270	6.87%	221,872	5.46%
996	None	11,347	61.40%	2,615,184	64.40%
995	Missing Response	19,136		2,698,977
	Total valid	18,480	100.00%	4,060,634	100.00%
	Total missing	19,136		2,698,977
	Total	37,616		6,759,612
Logic: if job_type IS NOT “work only from home” or “drive/bike/travel for work”

`transit_factors`

Option	Variable	Unweighted			Weighted
`transit_factors`
Factor to increase transit use
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Lower cost of transit or free transit pass	Transit Factors 1	6,317	24.0%	11,264	1,250,738	23.1%	1,354,325
More reliable transit service	Transit Factors 2	8,584	32.6%	11,264	1,577,841	29.2%	1,354,325
More frequent transit service	Transit Factors 3	8,506	32.3%	11,264	1,548,646	28.7%	1,354,325
Faster arrival at my destination	Transit Factors 4	7,419	28.2%	11,264	1,396,850	25.8%	1,354,325
Transit service provided during different times of the day/week	Transit Factors 5	3,241	12.3%	11,264	619,205	11.5%	1,354,325
Transit stops closer to my home/work	Transit Factors 6	7,859	29.8%	11,264	1,521,859	28.2%	1,354,325
Higher gas or parking prices	Transit Factors 7	1,298	4.9%	11,264	262,769	4.9%	1,354,325
User-friendly transit mobile app	Transit Factors 8	2,227	8.5%	11,264	421,477	7.8%	1,354,325
Safer environment in the vehicles	Transit Factors 9	1,952	7.4%	11,264	390,700	7.2%	1,354,325
Safer environment at stops and stations	Transit Factors 10	2,670	10.1%	11,264	512,366	9.5%	1,354,325
Other	Transit Factors 11	1,157	4.4%	11,264	221,092	4.1%	1,354,325
None of the above	Transit Factors 12	9,409	35.7%	11,264	2,096,162	38.8%	1,354,325

`transit_purpose`

Option	Variable	Unweighted			Weighted
`transit_purpose`
Purpose for using transit in the past 30 days
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
To go to/from grocery/food shopping	Transit Purpose 1	1,949	27.7%	30,586	365,210	29.0%	5,502,381
To go to/from other shopping (e.g., pharmacy)	Transit Purpose 2	2,215	31.5%	30,586	378,016	30.1%	5,502,381
To go to/from other shopping (e.g., pharmacy)	Transit Purpose 3	2,072	29.5%	30,586	374,251	29.8%	5,502,381
To visit friends or relatives	Transit Purpose 4	2,813	40.0%	30,586	472,747	37.6%	5,502,381
To go to/from work	Transit Purpose 5	3,409	48.5%	30,586	542,391	43.1%	5,502,381
For other work-related reason	Transit Purpose 6	788	11.2%	30,586	134,041	10.7%	5,502,381
Other	Transit Purpose 7	1,616	23.0%	30,586	294,865	23.5%	5,502,381
Logic: if transit_freq is not never or less than monthly

`walk_purpose`

Option	Variable	Unweighted			Weighted
`walk_purpose`
Reason for walking in the past 30 days
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
To go to/from grocery/food shopping	Walk Purpose 1	6,099	29.4%	16,838	1,075,270	26.4%	2,692,153
To go to/from other shopping (e.g., pharmacy)	Walk Purpose 2	5,861	28.2%	16,838	1,002,666	24.7%	2,692,153
To go to/from medical appointment	Walk Purpose 3	2,465	11.9%	16,838	443,452	10.9%	2,692,153
To visit friends or relatives	Walk Purpose 4	4,215	20.3%	16,838	778,577	19.1%	2,692,153
To go to/from work	Walk Purpose 5	3,240	15.6%	16,838	540,391	13.3%	2,692,153
For other work-related reason	Walk Purpose 6	1,451	7.0%	16,838	281,178	6.9%	2,692,153
Other	Walk Purpose 7	1,601	7.7%	16,838	309,967	7.6%	2,692,153
For exercise or recreation	Walk Purpose 8	18,414	88.6%	16,838	3,524,723	86.7%	2,692,153
Logic: if walk_freq > less than monthly

`why_no_bike`

Option	Variable	Unweighted			Weighted
`why_no_bike`
Reasons for not using a bicycle in the past 30 days
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Do not have a personal bicycle or it was not working	Why No Bike 1	10,472	49.1%	16,300	2,113,175	47.4%	2,303,613
Other member(s) of my household were using the bicycle	Why No Bike 2	193	0.9%	16,300	52,293	1.2%	2,303,613
Monetary cost was too high	Why No Bike 3	244	1.1%	16,300	50,627	1.1%	2,303,613
Travel time was too long	Why No Bike 4	1,649	7.7%	16,300	326,123	7.3%	2,303,613
Disability	Why No Bike 5	1,593	7.5%	16,300	317,736	7.1%	2,303,613
Safety concerns	Why No Bike 6	3,958	18.6%	16,300	703,989	15.8%	2,303,613
Weather	Why No Bike 7	3,110	14.6%	16,300	646,433	14.5%	2,303,613
Other	Why No Bike 8	5,697	26.7%	16,300	1,279,565	28.7%	2,303,613
Logic: if bike_freq = never or less than monthly

8.3 Day

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	37,817	28.18%	0	0.00%
1	Yes	96,370	71.82%	6,759,612	100.00%
	Total valid	134,187	100.00%	6,759,612	100.00%
	Total	134,187		6,759,612

`num_trips`

Statistic	Unweighted	Weighted
`num_trips`
Number of trips
Statistic	Value	Value
N	134,187.00	6,759,611.78
Min	0.00	0.00
P25	0.00	2.00
Median	2.00	3.00
Mean	3.49	3.56
P75	5.00	5.00
P95	10.00	9.00
Max	68.00	50.00
SD	3.61	2.94

`hh_is_complete`

Value	Label	Unweighted		Weighted
`hh_is_complete`
Household day completion status
Value	Label	Count	Percent	Count	Percent
0	No	37,801	28.17%	0	0.00%
1	Yes	96,386	71.83%	6,759,612	100.00%
	Total valid	134,187	100.00%	6,759,612	100.00%
	Total	134,187		6,759,612

`is_participant`

Value	Label	Unweighted		Weighted
`is_participant`
Active participant (age 18+ and surveyable)
Value	Label	Count	Percent	Count	Percent
0	No	22,785	16.98%	1,343,373	19.87%
1	Yes	111,402	83.02%	5,416,239	80.13%
	Total valid	134,187	100.00%	6,759,612	100.00%
	Total	134,187		6,759,612

`begin_day`

Value	Label	Unweighted		Weighted
`begin_day`
Where participant began their day
Value	Label	Count	Percent	Count	Percent
1	Home	90,957	87.38%	6,330,983	93.66%
2	Someone else’s home	2,012	1.93%	77,276	1.14%
3	Work	863	0.83%	97,971	1.45%
4	Their other home (e.g., other parent, second home)	985	0.95%	91,212	1.35%
5	Traveling (e.g., red-eye flight)	104	0.10%	2,098	0.03%
7	Temporary lodging (e.g., hotel, vacation rental)	1,545	1.48%	54,957	0.81%
997	Other	7,626	7.33%	105,113	1.56%
995	Missing Response	30,095		0
	Total valid	104,092	100.00%	6,759,612	100.00%
	Total missing	30,095		0
	Total	134,187		6,759,612

`end_day`

Value	Label	Unweighted		Weighted
`end_day`
Where participant ended their day
Value	Label	Count	Percent	Count	Percent
1	Home	88,910	85.93%	6,153,682	91.04%
2	Someone else’s home	2,219	2.14%	104,717	1.55%
3	Work	1,174	1.13%	157,317	2.33%
4	Their other home (e.g., other parent, second home)	1,022	0.99%	90,836	1.34%
5	Traveling (e.g., red-eye flight)	133	0.13%	3,092	0.05%
7	Temporary lodging (e.g., hotel, vacation rental)	1,683	1.63%	63,794	0.94%
997	Other	8,329	8.05%	186,174	2.75%
995	Missing Response	30,717		0
	Total valid	103,470	100.00%	6,759,612	100.00%
	Total missing	30,717		0
	Total	134,187		6,759,612

`school_daily`

Value	Label	Unweighted		Weighted
`school_daily`
Student traveled to school
Value	Label	Count	Percent	Count	Percent
1	Yes	10,416	63.54%	942,003	80.68%
2	No	5,976	36.46%	225,552	19.32%
995	Missing Response	117,795		5,592,056
	Total valid	16,392	100.00%	1,167,556	100.00%
	Total missing	117,795		5,592,056
	Total	134,187		6,759,612
Logic: if attends school in -person or daycase at least some of the time

`attend_school`

Option	Variable	Unweighted			Weighted
`attend_school`
Traveled to school
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Yes, person traveled to school on date	Attend School 1	1,580	76.6%	132,125	368,213	71.5%	6,244,549
Yes, person traveled to another location for school (e.g., friend’s house, parent’s work) on date	Attend School 2	32	1.6%	132,125	9,756	1.9%	6,244,549
No, person did not travel to school on date	Attend School 3	440	21.3%	132,125	133,478	25.9%	6,244,549
Don’t know	Attend School 998	9	0.4%	132,125	3,692	0.7%	6,244,549
Prefer not to answer	Attend School 999	21	1.0%	132,125	5,077	1.0%	6,244,549
Logic: if person attends in-person school or daycare at least some of the time AND school was not selected as a purpose on travel day

`attend_school_no`

Option	Variable	Unweighted			Weighted
`attend_school_no`
Reason for not attending school
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Person was sick or quarantining	Attend School No 1	126	28.6%	133,747	44,002	33.0%	6,626,133
Person attended class online from home	Attend School No 2	22	5.0%	133,747	7,478	5.6%	6,626,133
Person attended class online from another location (e.g., friend’s)	Attend School No 3	0	0.0%	133,747	0	0.0%	6,626,133
School was scheduled to be closed (e.g., vacation, holiday)	Attend School No 4	121	27.5%	133,747	37,793	28.3%	6,626,133
School was closed or adjusted due to weather event (e.g., snow delay)	Attend School No 5	15	3.4%	133,747	3,228	2.4%	6,626,133
Other	Attend School No 997	142	32.3%	133,747	37,719	28.3%	6,626,133
Don’t know	Attend School No 998	2	0.5%	133,747	221	0.2%	6,626,133
Prefer not to answer	Attend School No 999	16	3.6%	133,747	4,202	3.1%	6,626,133
Logic: if did not attend school or daycare on travel day

`telecommute_time`

Value	Label	Unweighted		Weighted
`telecommute_time`
Time spent teleworking on travel day (minutes, where 600 = 10+ hours)
Value	Label	Count	Percent	Count	Percent
0	0 minutes	36,963	60.41%	2,082,982	54.34%
15	15 minutes	173	0.28%	14,690	0.38%
30	30 minutes	452	0.74%	28,667	0.75%
45	45 minutes	273	0.45%	13,484	0.35%
60	1 hour	920	1.50%	63,342	1.65%
75	1 hour, 15 minutes	240	0.39%	15,339	0.40%
90	1 hour, 30 minutes	435	0.71%	32,634	0.85%
105	1 hour, 45 minutes	131	0.21%	8,525	0.22%
120	2 hours	885	1.45%	60,278	1.57%
135	2 hours, 15 minutes	190	0.31%	11,952	0.31%
150	2 hours, 30 minutes	297	0.49%	21,445	0.56%
165	2 hours, 45 minutes	112	0.18%	7,692	0.20%
180	3 hours	635	1.04%	40,453	1.06%
195	3 hours, 15 minutes	144	0.24%	8,487	0.22%
210	3 hours, 30 minutes	233	0.38%	17,737	0.46%
225	3 hours, 45 minutes	102	0.17%	8,045	0.21%
240	4 hours	843	1.38%	62,463	1.63%
255	4 hours, 15 minutes	213	0.35%	15,172	0.40%
270	4 hours, 30 minutes	263	0.43%	13,700	0.36%
285	4 hours, 45 minutes	126	0.21%	7,005	0.18%
300	5 hours	588	0.96%	42,538	1.11%
315	5 hours, 15 minutes	158	0.26%	9,271	0.24%
330	5 hours, 30 minutes	220	0.36%	16,733	0.44%
345	5 hours, 45 minutes	97	0.16%	8,668	0.23%
360	6 hours	715	1.17%	53,002	1.38%
375	6 hours, 15 minutes	148	0.24%	6,662	0.17%
390	6 hours, 30 minutes	340	0.56%	30,945	0.81%
405	6 hours, 45 minutes	136	0.22%	8,701	0.23%
420	7 hours	941	1.54%	66,569	1.74%
435	7 hours, 15 minutes	245	0.40%	20,007	0.52%
450	7 hours, 30 minutes	894	1.46%	63,777	1.66%
465	7 hours, 45 minutes	314	0.51%	22,621	0.59%
480	8 hours	6,435	10.52%	457,377	11.93%
495	8 hours, 15 minutes	927	1.52%	66,071	1.72%
510	8 hours, 30 minutes	1,440	2.35%	119,390	3.11%
525	8 hours, 45 minutes	364	0.59%	28,425	0.74%
540	9 hours	1,340	2.19%	102,783	2.68%
555	9 hours, 15 minutes	252	0.41%	19,765	0.52%
570	9 hours, 30 minutes	271	0.44%	25,250	0.66%
585	9 hours, 45 minutes	67	0.11%	4,377	0.11%
600	10+ hours	1,663	2.72%	125,863	3.28%
NA	No value assigned	73,002		2,926,726
	Total valid	61,185	100.00%	3,832,886	100.00%
	Total missing	73,002		2,926,726
	Total	134,187		6,759,612
Logic: if employment = full/part/self/volunteer

`delivery`

Option	Variable	Unweighted			Weighted
`delivery`
Type of delivery
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Take-out/prepared food delivered to home	Delivery 2	3,719	4.7%	54,542	183,675	5.0%	3,080,918
Someone came to do work at home (e.g., babysitter, housecleaning, lawn)	Delivery 3	2,161	2.7%	54,542	134,121	3.6%	3,080,918
Groceries delivered to home	Delivery 4	1,770	2.2%	54,542	89,553	2.4%	3,080,918
Received packages at home (e.g., USPS, FedEx, UPS)	Delivery 5	22,955	28.8%	54,542	1,263,596	34.3%	3,080,918
Received personal packages at work	Delivery 6	284	0.4%	54,542	13,333	0.4%	3,080,918
Received packages at another location (e.g., Amazon Locker, package pick-up point)	Delivery 7	1,162	1.5%	54,542	65,506	1.8%	3,080,918
Other item delivered to home (e.g., appliance)	Delivery 8	359	0.5%	54,542	23,480	0.6%	3,080,918
None of the above	Delivery 996	51,506	64.7%	54,542	2,188,102	59.5%	3,080,918
Logic: if rMove or (rMove for Web and person 1)

`made_travel`

Value	Label	Unweighted		Weighted
`made_travel`
Made trips on travel day
Value	Label	Count	Percent	Count	Percent
1	Yes, person made trips on day	34	14.05%	5,464	11.62%
2	No, person did not go anywhere or make trips on day	179	73.97%	37,029	78.78%
998	Don’t know	14	5.79%	3,056	6.50%
999	Prefer not to answer	15	6.20%	1,455	3.09%
995	Missing Response	133,945		6,712,609
	Total valid	242	100.00%	47,003	100.00%
	Total missing	133,945		6,712,609
	Total	134,187		6,759,612
Logic: if using rMove and has zero trips for the day and did not say they went to school/daycare in attend_school and begin_day = end_day and begin_day is not other

`no_travel`

Option	Variable	Unweighted			Weighted
`no_travel`
Reason for no travel on date
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
I did make trips on date	No Travel 1	1,825	11.8%	118,777	24,596	2.5%	5,784,720
Not scheduled to work/took day off	No Travel 2	1,981	12.9%	118,777	131,043	13.4%	5,784,720
Worked at home for pay (e.g., telework)	No Travel 3	3,292	21.4%	118,777	205,801	21.1%	5,784,720
Hung out around home	No Travel 4	7,396	48.0%	118,777	419,892	43.1%	5,784,720
Scheduled school closure (e.g., holiday)	No Travel 5	32	0.2%	118,777	6,978	0.7%	5,784,720
No available transportation (e.g., no car, no bus)	No Travel 6	284	1.8%	118,777	16,633	1.7%	5,784,720
Sick or quarantining (self or others)	No Travel 7	1,150	7.5%	118,777	105,358	10.8%	5,784,720
Waited for visitor/delivery (e.g., plumber)	No Travel 8	359	2.3%	118,777	20,321	2.1%	5,784,720
Kids did online/remote/home school	No Travel 9	104	0.7%	118,777	25,012	2.6%	5,784,720
Weather conditions (e.g., snowstorm)	No Travel 11	646	4.2%	118,777	31,984	3.3%	5,784,720
person may have made trips, but I don’t know when or where	No Travel 12	9	0.1%	118,777	338	0.0%	5,784,720
Other reason	No Travel 99	1,343	8.7%	118,777	145,476	14.9%	5,784,720
Logic: if made zero trips on day

`congestion`

Value	Label	Unweighted		Weighted
`congestion`
Person adjusted travel time to account for congestion
Value	Label	Count	Percent	Count	Percent
1	Yes	6,373	12.49%	580,074	18.11%
2	No	44,656	87.51%	2,623,013	81.89%
995	Missing Response	83,158		3,556,524
	Total valid	51,029	100.00%	3,203,088	100.00%
	Total missing	83,158		3,556,524
	Total	134,187		6,759,612
Logic: if employment = full/part/self/volunteer and job_type IS NOT “work only from home” or “drive/bike/travel for work”

8.4 Vehicle

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	4,079	15.78%	0	0.00%
1	Yes	21,770	84.22%	4,427,881	100.00%
	Total valid	25,849	100.00%	4,427,881	100.00%
	Total	25,849		4,427,881

`make`

Value	Label	Unweighted		Weighted
`make`
Vehicle make
Value	Label	Count	Percent	Count	Percent
AMC		2	0.01%	204	0.00%
Acura		327	1.27%	62,015	1.40%
Alfa Romeo		17	0.07%	2,144	0.05%
Audi		467	1.81%	75,830	1.71%
BMW		626	2.42%	114,004	2.57%
Bentley		7	0.03%	2,077	0.05%
Buick		200	0.77%	32,372	0.73%
Cadillac		126	0.49%	24,342	0.55%
Chevrolet		1,410	5.45%	250,217	5.65%
Chrysler		163	0.63%	33,167	0.75%
Dodge		321	1.24%	53,897	1.22%
Ferrari		2	0.01%	189	0.00%
Fiat		31	0.12%	4,617	0.10%
Ford		1,886	7.30%	330,921	7.47%
Freightliner		1	0.00%	321	0.01%
GMC		337	1.30%	53,338	1.20%
Genesis		14	0.05%	2,956	0.07%
Geo		2	0.01%	1,271	0.03%
Honda		3,610	13.97%	633,202	14.30%
Hummer		6	0.02%	1,662	0.04%
Hyundai		1,227	4.75%	194,923	4.40%
Infiniti		109	0.42%	20,850	0.47%
Isuzu		5	0.02%	409	0.01%
Jaguar		20	0.08%	2,099	0.05%
Jeep		1,029	3.98%	172,222	3.89%
Kia		714	2.76%	130,383	2.94%
Lafayette		1	0.00%	1,118	0.03%
Lamborghini		1	0.00%	136	0.00%
Land Rover		53	0.21%	6,965	0.16%
Lexus		497	1.92%	75,714	1.71%
Lincoln		94	0.36%	13,861	0.31%
Lotus		3	0.01%	855	0.02%
Lucid		4	0.02%	1,430	0.03%
MINI		135	0.52%	17,602	0.40%
Maserati		3	0.01%	570	0.01%
Mazda		838	3.24%	132,841	3.00%
Mercedes-Benz		376	1.45%	62,603	1.41%
Mercury		40	0.15%	5,776	0.13%
Mitsubishi		89	0.34%	17,457	0.39%
Nissan		1,308	5.06%	228,550	5.16%
Oldsmobile		4	0.02%	583	0.01%
Opel		1	0.00%	16	0.00%
Other		316	1.22%	60,823	1.37%
Plymouth		1	0.00%	43	0.00%
Polestar		5	0.02%	1,078	0.02%
Pontiac		48	0.19%	8,113	0.18%
Porsche		71	0.27%	9,179	0.21%
Ram		117	0.45%	20,969	0.47%
Rivian		12	0.05%	2,974	0.07%
Rolls Royce		1	0.00%	146	0.00%
Saab		33	0.13%	6,969	0.16%
Saturn		23	0.09%	2,820	0.06%
Scion		29	0.11%	6,239	0.14%
Subaru		2,405	9.30%	379,027	8.56%
Suzuki		23	0.09%	1,943	0.04%
Tesla		354	1.37%	64,440	1.46%
Toyota		5,114	19.78%	888,015	20.06%
Volkswagen		775	3.00%	140,159	3.17%
Volvo		405	1.57%	68,163	1.54%
smart		11	0.04%	1,074	0.02%
	Total valid	25,849	100.00%	4,427,881	100.00%
	Total	25,849		4,427,881

`year`

Value	Label	Unweighted		Weighted
`year`
Vehicle year
Value	Label	Count	Percent	Count	Percent
1980	1980 or earlier	149	0.58%	29,593	0.67%
1981	1981	15	0.06%	2,339	0.05%
1982	1982	12	0.05%	2,961	0.07%
1983	1983	10	0.04%	1,887	0.04%
1984	1984	6	0.02%	961	0.02%
1985	1985	4	0.02%	423	0.01%
1986	1986	12	0.05%	1,268	0.03%
1987	1987	19	0.07%	2,509	0.06%
1988	1988	10	0.04%	1,995	0.05%
1989	1989	24	0.09%	3,761	0.08%
1990	1990	22	0.09%	2,922	0.07%
1991	1991	12	0.05%	2,532	0.06%
1992	1992	11	0.04%	1,113	0.03%
1993	1993	20	0.08%	3,975	0.09%
1994	1994	25	0.10%	3,554	0.08%
1995	1995	30	0.12%	4,998	0.11%
1996	1996	22	0.09%	5,035	0.11%
1997	1997	41	0.16%	6,251	0.14%
1998	1998	64	0.25%	8,929	0.20%
1999	1999	89	0.34%	14,061	0.32%
2000	2000	113	0.44%	18,252	0.41%
2001	2001	123	0.48%	18,651	0.42%
2002	2002	156	0.60%	30,941	0.70%
2003	2003	226	0.87%	40,845	0.92%
2004	2004	280	1.08%	49,504	1.12%
2005	2005	335	1.30%	54,163	1.22%
2006	2006	392	1.52%	66,604	1.50%
2007	2007	518	2.00%	92,623	2.09%
2008	2008	608	2.35%	101,414	2.29%
2009	2009	537	2.08%	95,956	2.17%
2010	2010	826	3.20%	128,194	2.90%
2011	2011	901	3.49%	151,340	3.42%
2012	2012	1,085	4.20%	195,852	4.42%
2013	2013	1,271	4.92%	217,805	4.92%
2014	2014	1,429	5.53%	243,425	5.50%
2015	2015	1,645	6.36%	284,814	6.43%
2016	2016	1,768	6.84%	307,108	6.94%
2017	2017	1,888	7.30%	316,643	7.15%
2018	2018	1,838	7.11%	317,770	7.18%
2019	2019	1,817	7.03%	313,589	7.08%
2020	2020	1,456	5.63%	242,033	5.47%
2021	2021	1,517	5.87%	270,814	6.12%
2022	2022	1,392	5.39%	226,936	5.13%
2023	2023	1,446	5.59%	253,377	5.72%
2024	2024	1,393	5.39%	242,014	5.47%
2025	2025	290	1.12%	45,873	1.04%
2026	2026	2	0.01%	273	0.01%
	Total valid	25,849	100.00%	4,427,881	100.00%
	Total	25,849		4,427,881

`fuel_type`

Value	Label	Unweighted		Weighted
`fuel_type`
Vehicle fuel type
Value	Label	Count	Percent	Count	Percent
1	Gas	22,758	88.04%	3,919,833	88.53%
2	Hybrid (HEV)	1,587	6.14%	249,862	5.64%
3	Plug-in hybrid (PHEV)	429	1.66%	66,937	1.51%
4	Electric (EV)	838	3.24%	150,449	3.40%
5	Diesel	178	0.69%	29,326	0.66%
6	Flex fuel (FFV)	41	0.16%	8,128	0.18%
7	Other (e.g., natural gas, bio-diesel)	14	0.05%	1,917	0.04%
8	Fuel cell electric vehicle (FCEV)	4	0.02%	1,430	0.03%
	Total valid	25,849	100.00%	4,427,881	100.00%
	Total	25,849		4,427,881

`vehicle_ownership`

Value	Label	Unweighted		Weighted
`vehicle_ownership`
Vehicle ownership status
Value	Label	Count	Percent	Count	Percent
1	Fully owned (not making payments)	18,754	72.55%	3,179,327	71.80%
2	Owned (making payments)	5,873	22.72%	1,026,912	23.19%
3	Leased	946	3.66%	157,106	3.55%
4	Employer provided	145	0.56%	34,203	0.77%
5	Unsure	86	0.33%	19,615	0.44%
997	Other	45	0.17%	10,718	0.24%
	Total valid	25,849	100.00%	4,427,881	100.00%
	Total	25,849		4,427,881

`transponder`

Value	Label	Unweighted		Weighted
`transponder`
Vehicle has a toll transponder
Value	Label	Count	Percent	Count	Percent
0	No	6,569	25.41%	1,189,817	26.87%
1	Yes	19,280	74.59%	3,238,064	73.13%
	Total valid	25,849	100.00%	4,427,881	100.00%
	Total	25,849		4,427,881

8.5 Location

8.6 Unlinked Trip

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	56,445	12.06%	0	0.00%
1	Yes	411,573	87.94%	30,078,667	100.00%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667

`hh_is_complete`

Value	Label	Unweighted		Weighted
`hh_is_complete`
Household day completion status
Value	Label	Count	Percent	Count	Percent
0	No	56,403	12.05%	0	0.00%
1	Yes	411,615	87.95%	30,078,667	100.00%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667

`day_is_complete`

Value	Label	Unweighted		Weighted
`day_is_complete`
Day survey completion status
Value	Label	Count	Percent	Count	Percent
0	No	57,329	12.25%	0	0.00%
1	Yes	410,689	87.75%	30,078,667	100.00%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667

`o_state`

Value	Label	Unweighted		Weighted
`o_state`
Origin– State
Value	Label	Count	Percent	Count	Percent
09	Connecticut	3,054	0.65%	121,909	0.41%
25	Massachusetts	441,684	94.37%	29,136,155	96.87%
33	New Hampshire	6,070	1.30%	248,801	0.83%
36	New York	2,681	0.57%	84,119	0.28%
44	Rhode Island	3,833	0.82%	171,068	0.57%
50	Vermont	1,225	0.26%	39,803	0.13%
None		9,471	2.02%	276,811	0.92%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667
Logic: if state borders MA

`d_state`

Value	Label	Unweighted		Weighted
`d_state`
Destination– State
Value	Label	Count	Percent	Count	Percent
09	Connecticut	3,062	0.65%	117,353	0.39%
25	Massachusetts	441,295	94.29%	29,132,220	96.85%
33	New Hampshire	6,104	1.30%	238,671	0.79%
36	New York	2,719	0.58%	83,483	0.28%
44	Rhode Island	3,847	0.82%	165,287	0.55%
50	Vermont	1,233	0.26%	40,226	0.13%
None		9,758	2.08%	301,427	1.00%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667
Logic: if state borders MA

`mode_1`

Value	Label	Unweighted		Weighted
`mode_1`
Trip mode 1
Value	Label	Count	Percent	Count	Percent
1	Walk (or jog/wheelchair)	80,801	18.79%	3,846,508	13.44%
2	Standard bicycle (my household’s)	5,537	1.29%	265,539	0.93%
3	Borrowed bicycle (e.g., a friend’s)	49	0.01%	1,308	0.00%
4	Other rented bicycle	52	0.01%	832	0.00%
5	Other	2,261	0.53%	255,011	0.89%
6	Household vehicle 1	194,017	45.11%	13,099,797	45.76%
7	Household vehicle 2	68,231	15.86%	5,869,061	20.50%
8	Household vehicle 3	9,049	2.10%	1,041,681	3.64%
9	Household vehicle 4	1,690	0.39%	247,470	0.86%
10	Household vehicle 5	325	0.08%	47,077	0.16%
11	Household vehicle 6	24	0.01%	6,460	0.02%
12	Household vehicle 7	22	0.01%	378	0.00%
13	Household vehicle 8	2	0.00%	0	0.00%
16	Other vehicle in household	3,702	0.86%	366,635	1.28%
17	Rental car	2,325	0.54%	95,841	0.33%
18	Carshare service (e.g., Zipcar)	282	0.07%	14,341	0.05%
21	Vanpool	53	0.01%	7,007	0.02%
22	Other vehicle (not my household’s)	1,805	0.42%	96,634	0.34%
23	Local bus	6,011	1.40%	294,762	1.03%
24	School bus	7,335	1.71%	521,043	1.82%
25	Intercity bus (e.g., Greyhound)	41	0.01%	1,592	0.01%
26	Other private shuttle/bus (e.g., a hotel’s, an airport’s)	362	0.08%	11,969	0.04%
27	Medical transportation service	378	0.09%	57,801	0.20%
28	Other bus	138	0.03%	8,203	0.03%
30	Subway	7,669	1.78%	300,955	1.05%
31	Airplane/helicopter	679	0.16%	37,701	0.13%
33	Car from work	6,293	1.46%	422,522	1.48%
34	Friend/relative/colleague’s car	18,101	4.21%	912,685	3.19%
36	Regular taxi (e.g., Yellow Cab)	254	0.06%	40,321	0.14%
38	University/college shuttle/bus	269	0.06%	23,868	0.08%
39	Light rail/trolley	1,232	0.29%	52,316	0.18%
41	Intercity rail (e.g., Amtrak)	172	0.04%	7,563	0.03%
42	Other rail	54	0.01%	3,486	0.01%
43	Skateboard or rollerblade	72	0.02%	15,419	0.05%
44	Golf cart	237	0.06%	4,987	0.02%
45	ATV	16	0.00%	991	0.00%
47	Other motorcycle in household	221	0.05%	12,950	0.05%
49	Uber, Lyft, or other smartphone-app ride service	3,742	0.87%	285,261	1.00%
54	Other motorcycle (not my household’s)	37	0.01%	2,914	0.01%
55	Express/commuter bus	206	0.05%	13,997	0.05%
56	Other personal bicycle (e.g., cargo, tandem, etc.)	266	0.06%	10,524	0.04%
58	Commuter rail	1,884	0.44%	111,181	0.39%
59	Peer-to-peer car rental (e.g., Turo)	34	0.01%	355	0.00%
60	Other hired car service (e.g., black car, limo)	146	0.03%	18,870	0.07%
61	Rapid transit bus (BRT)	144	0.03%	5,938	0.02%
62	Employer-provided shuttle/bus	391	0.09%	29,514	0.10%
69	Bike-share - standard bicycle	616	0.14%	26,780	0.09%
70	Bike-share - electric bicycle	200	0.05%	6,195	0.02%
74	Segway	1	0.00%	67	0.00%
75	Other micromobility device	37	0.01%	502	0.00%
76	Carpool match (e.g., Waze Carpool)	63	0.01%	4,735	0.02%
77	Personal scooter or moped (not shared)	372	0.09%	18,940	0.07%
78	Other public ferry or water taxi	228	0.05%	11,557	0.04%
79	Vehicle ferry (took vehicle on board)	74	0.02%	12,938	0.05%
80	Other boat (e.g., kayak)	352	0.08%	3,151	0.01%
81	Snowmobile	5	0.00%	38	0.00%
82	Electric bicycle (my household’s)	1,517	0.35%	66,776	0.23%
83	Scooter-share (e.g., Bird, Lime)	5	0.00%	510	0.00%
200	Paratransit/Dial-A-Ride (e.g., The RIDE)	36	0.01%	3,960	0.01%
995	Missing Response	37,901		1,451,252
	Total valid	430,117	100.00%	28,627,414	100.00%
	Total missing	37,901		1,451,252
	Total	468,018		30,078,667

`mode_2`

Value	Label	Unweighted		Weighted
`mode_2`
Trip mode 2
Value	Label	Count	Percent	Count	Percent
1	Walk (or jog/wheelchair)	7,949	49.07%	366,657	42.19%
2	Standard bicycle (my household’s)	139	0.86%	3,267	0.38%
3	Borrowed bicycle (e.g., a friend’s)	4	0.02%	466	0.05%
4	Other rented bicycle	6	0.04%	866	0.10%
5	Other	236	1.46%	8,228	0.95%
6	Household vehicle 1	1,072	6.62%	89,998	10.35%
7	Household vehicle 2	552	3.41%	41,279	4.75%
8	Household vehicle 3	33	0.20%	5,563	0.64%
9	Household vehicle 4	7	0.04%	67	0.01%
10	Household vehicle 5	5	0.03%	166	0.02%
16	Other vehicle in household	31	0.19%	6,850	0.79%
17	Rental car	57	0.35%	3,201	0.37%
18	Carshare service (e.g., Zipcar)	10	0.06%	2,802	0.32%
21	Vanpool	33	0.20%	2,046	0.24%
22	Other vehicle (not my household’s)	157	0.97%	15,035	1.73%
23	Local bus	446	2.75%	52,124	6.00%
24	School bus	204	1.26%	50,492	5.81%
25	Intercity bus (e.g., Greyhound)	31	0.19%	844	0.10%
26	Other private shuttle/bus (e.g., a hotel’s, an airport’s)	28	0.17%	5,839	0.67%
27	Medical transportation service	30	0.19%	2,136	0.25%
28	Other bus	131	0.81%	4,196	0.48%
30	Subway	2,169	13.39%	71,381	8.21%
31	Airplane/helicopter	33	0.20%	2,730	0.31%
33	Car from work	31	0.19%	10,675	1.23%
34	Friend/relative/colleague’s car	484	2.99%	26,065	3.00%
36	Regular taxi (e.g., Yellow Cab)	10	0.06%	4,013	0.46%
38	University/college shuttle/bus	116	0.72%	7,594	0.87%
39	Light rail/trolley	152	0.94%	4,364	0.50%
41	Intercity rail (e.g., Amtrak)	13	0.08%	447	0.05%
42	Other rail	74	0.46%	3,423	0.39%
43	Skateboard or rollerblade	4	0.02%	850	0.10%
45	ATV	1	0.01%	0	0.00%
47	Other motorcycle in household	14	0.09%	957	0.11%
49	Uber, Lyft, or other smartphone-app ride service	149	0.92%	6,467	0.74%
54	Other motorcycle (not my household’s)	2	0.01%	50	0.01%
55	Express/commuter bus	204	1.26%	8,937	1.03%
56	Other personal bicycle (e.g., cargo, tandem, etc.)	28	0.17%	182	0.02%
58	Commuter rail	1,091	6.73%	30,602	3.52%
59	Peer-to-peer car rental (e.g., Turo)	13	0.08%	2,201	0.25%
60	Other hired car service (e.g., black car, limo)	12	0.07%	617	0.07%
61	Rapid transit bus (BRT)	24	0.15%	2,449	0.28%
62	Employer-provided shuttle/bus	218	1.35%	12,054	1.39%
69	Bike-share - standard bicycle	63	0.39%	1,471	0.17%
70	Bike-share - electric bicycle	15	0.09%	111	0.01%
75	Other micromobility device	8	0.05%	29	0.00%
76	Carpool match (e.g., Waze Carpool)	7	0.04%	528	0.06%
77	Personal scooter or moped (not shared)	2	0.01%	7	0.00%
78	Other public ferry or water taxi	31	0.19%	1,954	0.22%
79	Vehicle ferry (took vehicle on board)	4	0.02%	0	0.00%
80	Other boat (e.g., kayak)	15	0.09%	166	0.02%
82	Electric bicycle (my household’s)	16	0.10%	260	0.03%
200	Paratransit/Dial-A-Ride (e.g., The RIDE)	36	0.22%	6,441	0.74%
995	Missing Response	451,818		29,209,522
	Total valid	16,200	100.00%	869,145	100.00%
	Total missing	451,818		29,209,522
	Total	468,018		30,078,667

`mode_3`

Value	Label	Unweighted		Weighted
`mode_3`
Trip mode 3
Value	Label	Count	Percent	Count	Percent
1	Walk (or jog/wheelchair)	2,085	67.02%	88,986	56.75%
2	Standard bicycle (my household’s)	10	0.32%	35	0.02%
4	Other rented bicycle	2	0.06%	171	0.11%
5	Other	21	0.68%	499	0.32%
6	Household vehicle 1	128	4.11%	13,993	8.92%
7	Household vehicle 2	105	3.38%	8,823	5.63%
8	Household vehicle 3	6	0.19%	34	0.02%
16	Other vehicle in household	14	0.45%	1,794	1.14%
17	Rental car	3	0.10%	1,392	0.89%
18	Carshare service (e.g., Zipcar)	2	0.06%	855	0.55%
21	Vanpool	5	0.16%	68	0.04%
22	Other vehicle (not my household’s)	18	0.58%	1,183	0.75%
23	Local bus	32	1.03%	7,889	5.03%
25	Intercity bus (e.g., Greyhound)	8	0.26%	311	0.20%
26	Other private shuttle/bus (e.g., a hotel’s, an airport’s)	4	0.13%	1,455	0.93%
27	Medical transportation service	1	0.03%	76	0.05%
28	Other bus	14	0.45%	1,707	1.09%
30	Subway	82	2.64%	5,116	3.26%
31	Airplane/helicopter	14	0.45%	162	0.10%
33	Car from work	8	0.26%	225	0.14%
34	Friend/relative/colleague’s car	110	3.54%	4,450	2.84%
36	Regular taxi (e.g., Yellow Cab)	4	0.13%	53	0.03%
38	University/college shuttle/bus	37	1.19%	1,835	1.17%
39	Light rail/trolley	28	0.90%	436	0.28%
42	Other rail	13	0.42%	39	0.02%
45	ATV	1	0.03%	0	0.00%
47	Other motorcycle in household	4	0.13%	144	0.09%
49	Uber, Lyft, or other smartphone-app ride service	19	0.61%	2,968	1.89%
54	Other motorcycle (not my household’s)	1	0.03%	54	0.03%
55	Express/commuter bus	50	1.61%	2,058	1.31%
58	Commuter rail	139	4.47%	2,103	1.34%
59	Peer-to-peer car rental (e.g., Turo)	3	0.10%	819	0.52%
60	Other hired car service (e.g., black car, limo)	3	0.10%	1,993	1.27%
61	Rapid transit bus (BRT)	6	0.19%	2,249	1.43%
62	Employer-provided shuttle/bus	88	2.83%	1,780	1.14%
69	Bike-share - standard bicycle	2	0.06%	0	0.00%
70	Bike-share - electric bicycle	21	0.68%	217	0.14%
74	Segway	3	0.10%	360	0.23%
75	Other micromobility device	3	0.10%	7	0.00%
76	Carpool match (e.g., Waze Carpool)	6	0.19%	435	0.28%
78	Other public ferry or water taxi	6	0.19%	19	0.01%
83	Scooter-share (e.g., Bird, Lime)	1	0.03%	0	0.00%
200	Paratransit/Dial-A-Ride (e.g., The RIDE)	1	0.03%	14	0.01%
995	Missing Response	464,907		29,921,857
	Total valid	3,111	100.00%	156,809	100.00%
	Total missing	464,907		29,921,857
	Total	468,018		30,078,667

`mode_4`

Value	Label	Unweighted		Weighted
`mode_4`
Trip mode 4
Value	Label	Count	Percent	Count	Percent
1	Walk (or jog/wheelchair)	299	69.53%	8,536	32.01%
4	Other rented bicycle	1	0.23%	0	0.00%
5	Other	1	0.23%	0	0.00%
6	Household vehicle 1	3	0.70%	2,037	7.64%
7	Household vehicle 2	13	3.02%	3,974	14.90%
8	Household vehicle 3	1	0.23%	1,502	5.63%
16	Other vehicle in household	2	0.47%	1,396	5.23%
21	Vanpool	1	0.23%	0	0.00%
22	Other vehicle (not my household’s)	1	0.23%	0	0.00%
23	Local bus	3	0.70%	3,066	11.50%
24	School bus	2	0.47%	93	0.35%
30	Subway	1	0.23%	26	0.10%
31	Airplane/helicopter	9	2.09%	40	0.15%
34	Friend/relative/colleague’s car	14	3.26%	547	2.05%
36	Regular taxi (e.g., Yellow Cab)	1	0.23%	1,284	4.81%
38	University/college shuttle/bus	7	1.63%	71	0.27%
39	Light rail/trolley	2	0.47%	37	0.14%
42	Other rail	1	0.23%	21	0.08%
43	Skateboard or rollerblade	7	1.63%	200	0.75%
49	Uber, Lyft, or other smartphone-app ride service	4	0.93%	1,317	4.94%
55	Express/commuter bus	3	0.70%	182	0.68%
56	Other personal bicycle (e.g., cargo, tandem, etc.)	1	0.23%	23	0.09%
58	Commuter rail	20	4.65%	832	3.12%
60	Other hired car service (e.g., black car, limo)	4	0.93%	15	0.06%
62	Employer-provided shuttle/bus	3	0.70%	22	0.08%
70	Bike-share - electric bicycle	1	0.23%	0	0.00%
76	Carpool match (e.g., Waze Carpool)	18	4.19%	1,034	3.88%
78	Other public ferry or water taxi	4	0.93%	53	0.20%
83	Scooter-share (e.g., Bird, Lime)	3	0.70%	360	1.35%
995	Missing Response	467,588		30,051,997
	Total valid	430	100.00%	26,670	100.00%
	Total missing	467,588		30,051,997
	Total	468,018		30,078,667

`transit_egress`

Value	Label	Unweighted		Weighted
`transit_egress`
Mode used to leave transit stop
Value	Label	Count	Percent	Count	Percent
1	Walked (or jogged/wheelchair)	34,864	76.05%	1,889,560	72.72%
2	Bicycle	466	1.02%	43,954	1.69%
3	Transferred to another bus	3,710	8.09%	229,396	8.83%
4	Micromobility (e.g., scooter, moped, skateboard)	119	0.26%	3,600	0.14%
5	Transferred to other transit (e.g., rail, air)	3,753	8.19%	211,727	8.15%
6	Uber/Lyft, taxi, or car service	159	0.35%	9,390	0.36%
7	Drove my own household’s vehicle (or motorcycle)	1,081	2.36%	59,567	2.29%
8	Drove another vehicle (or motorcycle)	152	0.33%	5,958	0.23%
9	Got picked up in my own household’s vehicle (or motorcycle)	453	0.99%	44,618	1.72%
10	Got picked up in another vehicle (or motorcycle)	341	0.74%	18,666	0.72%
997	Other	746	1.63%	82,064	3.16%
995	Missing Response	422,174		27,480,167
	Total valid	45,844	100.00%	2,598,499	100.00%
	Total missing	422,174		27,480,167
	Total	468,018		30,078,667
Logic: if mode = bus or rail

`transit_access`

Value	Label	Unweighted		Weighted
`transit_access`
Mode used to access transit stop
Value	Label	Count	Percent	Count	Percent
1	Walked (or jogged/wheelchair)	34,764	75.83%	1,895,901	72.96%
2	Bicycle	551	1.20%	49,890	1.92%
3	Transferred from another bus	3,586	7.82%	225,998	8.70%
4	Micromobility (e.g., scooter, moped, skateboard)	121	0.26%	2,447	0.09%
5	Transferred from other transit (e.g., rail, air)	3,253	7.10%	130,975	5.04%
6	Uber/Lyft, taxi, or car service	187	0.41%	10,477	0.40%
7	Drove and parked my own household’s vehicle (or motorcycle)	1,413	3.08%	96,979	3.73%
8	Drove and parked another vehicle (or motorcycle)	195	0.43%	6,678	0.26%
9	Got dropped off in my own household’s vehicle (or motorcycle)	605	1.32%	71,714	2.76%
10	Got dropped off in another vehicle (or motorcycle)	451	0.98%	20,358	0.78%
997	Other	721	1.57%	87,082	3.35%
995	Missing Response	422,171		27,480,167
	Total valid	45,847	100.00%	2,598,499	100.00%
	Total missing	422,171		27,480,167
	Total	468,018		30,078,667
Logic: if mode = bus or rail

`ev_charge_station`

Value	Label	Unweighted		Weighted
`ev_charge_station`
Electric vehicle charging stations at stop
Value	Label	Count	Percent	Count	Percent
1	No	6,580	57.65%	390,097	53.46%
2	Yes, and I did NOT charge the vehicle here before my next trip	2,981	26.12%	211,432	28.98%
3	Yes, and I charged the vehicle here before my next trip	1,565	13.71%	105,143	14.41%
998	Don’t know	288	2.52%	22,967	3.15%
995	Missing Response	456,604		29,349,027
	Total valid	11,414	100.00%	729,640	100.00%
	Total missing	456,604		29,349,027
	Total	468,018		30,078,667
Logic: if used household electric vehicle on trip

`ev_charge_station_level`

Option	Variable	Unweighted			Weighted
`ev_charge_station_level`
Charge station level
Option	Variable	Selected	Percent	Missing	Selected	Percent	Missing
Level 1 (2-5 miles of range per 1 hour of charging)	Ev Charge Station Level 1	978	21.5%	463,472	59,955	18.9%	29,762,091
Level 2 (10-20 miles of range per 1 hour of charging)	Ev Charge Station Level 2	3,178	69.9%	463,472	217,500	68.7%	29,762,091
Level 3/DC Fast (60+ miles of range per 1 hour of charging)	Ev Charge Station Level 3	276	6.1%	463,472	16,827	5.3%	29,762,091
Don’t know	Ev Charge Station Level 998	226	5.0%	463,472	31,673	10.0%	29,762,091
Logic: if EV charge stations were at destination

`ev_charge_station_decision`

Value	Label	Unweighted		Weighted
`ev_charge_station_decision`
Electric vehicle charging stations influenced decision to stop here
Value	Label	Count	Percent	Count	Percent
1	Agree	209	56.18%	5,894	36.38%
2	Neutral	77	20.70%	4,613	28.47%
3	Disagree	86	23.12%	5,697	35.16%
995	Missing Response	467,646		30,062,463
	Total valid	372	100.00%	16,204	100.00%
	Total missing	467,646		30,062,463
	Total	468,018		30,078,667
Logic: if used EV charge station at destination and destination is not home/work/school location

`o_purpose_category`

Value	Label	Unweighted		Weighted
`o_purpose_category`
Origin purpose category
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,100	0.88%	3,542	0.01%
1	Home	143,857	30.81%	10,302,583	34.75%
2	Work	28,035	6.00%	2,781,172	9.38%
3	Work related	23,911	5.12%	1,248,352	4.21%
4	School	14,890	3.19%	1,436,866	4.85%
5	School related	2,508	0.54%	160,014	0.54%
6	Escort	29,130	6.24%	2,668,397	9.00%
7	Shop	46,898	10.04%	2,790,710	9.41%
8	Meal	34,126	7.31%	1,510,383	5.10%
9	Social recreation	55,188	11.82%	2,248,138	7.58%
10	Errand	23,948	5.13%	1,749,146	5.90%
11	Change mode	38,956	8.34%	1,768,184	5.96%
12	Overnight	15,835	3.39%	546,346	1.84%
13	Other	5,512	1.18%	429,660	1.45%
995	Missing Response	1,124		435,173
	Total valid	466,894	100.00%	29,643,494	100.00%
	Total missing	1,124		435,173
	Total	468,018		30,078,667

`o_purpose_category_reported`

Value	Label	Unweighted		Weighted
`o_purpose_category_reported`
Reported Origin purpose category
Value	Label	Count	Percent	Count	Percent
1	Home	94,081	24.06%	4,194,035	18.93%
2	Work	26,098	6.67%	2,562,067	11.56%
3	Work related	23,114	5.91%	1,323,044	5.97%
4	School	5,786	1.48%	1,021,250	4.61%
5	School related	743	0.19%	46,364	0.21%
6	Escort	30,972	7.92%	2,814,054	12.70%
7	Shop	43,951	11.24%	2,793,144	12.60%
8	Meal	32,443	8.30%	1,529,340	6.90%
9	Social recreation	53,574	13.70%	2,309,221	10.42%
10	Errand	22,669	5.80%	1,756,065	7.92%
11	Change mode	21,638	5.53%	582,843	2.63%
12	Overnight	14,968	3.83%	551,387	2.49%
13	Other	20,952	5.36%	677,175	3.06%
995	Missing Response	49,740		2,259,010
NA	No value assigned	27,289		5,659,666
	Total valid	390,989	100.00%	22,159,990	100.00%
	Total missing	77,029		7,918,677
	Total	468,018		30,078,667

`o_purpose`

Value	Label	Unweighted		Weighted
`o_purpose`
Origin purpose
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,100	0.88%	3,542	0.01%
1	Went home	143,493	30.73%	10,273,842	34.66%
10	Went to primary workplace	27,960	5.99%	2,759,190	9.31%
11	Went to work-related activity (e.g., meeting, delivery, worksite)	20,168	4.32%	1,045,138	3.53%
13	Volunteering	1,108	0.24%	69,623	0.23%
14	Other work-related	2,041	0.44%	92,839	0.31%
21	Attend K-12 school	14,311	3.07%	1,333,066	4.50%
22	Attend college/university	216	0.05%	30,375	0.10%
23	Attend other type of class (e.g., cooking class)	187	0.04%	49,633	0.17%
24	Attend other education-related activity (e.g., field trip)	2,367	0.51%	149,243	0.50%
25	Attend vocational education class	6	0.00%	2,581	0.01%
26	Attend daycare or preschool	122	0.03%	12,564	0.04%
30	Grocery shopping	22,057	4.72%	1,422,666	4.80%
31	Got gas	5,191	1.11%	284,795	0.96%
32	Other routine shopping (e.g., pharmacy)	17,607	3.77%	948,339	3.20%
33	Errand without appointment (e.g., post office)	10,359	2.22%	606,781	2.05%
34	Medical visit (e.g., doctor, dentist)	6,559	1.40%	625,642	2.11%
36	Shopping for major item (e.g., furniture, car)	1,120	0.24%	47,210	0.16%
37	Errand with appointment (e.g., haircut)	3,156	0.68%	187,653	0.63%
44	Other activity only (e.g., attend meeting, pick-up or drop-off item)	2,089	0.45%	169,112	0.57%
45	Pick someone up	11,171	2.39%	975,065	3.29%
46	Drop someone off	12,813	2.74%	1,162,883	3.92%
47	Accompany someone only (e.g., go along for the ride)	1,207	0.26%	186,865	0.63%
48	BOTH pick up AND drop off	1,162	0.25%	110,499	0.37%
50	Dined out, got coffee, or take-out	33,739	7.23%	1,492,947	5.04%
51	Exercise or recreation (e.g., gym, jog, bike, walk dog)	26,615	5.70%	1,213,552	4.09%
52	Social activity (e.g., visit friends/relatives)	9,516	2.04%	373,796	1.26%
53	Leisure/entertainment/cultural (e.g., cinema, museum, park)	11,167	2.39%	340,058	1.15%
54	Religious/civic/volunteer activity	2,735	0.59%	109,375	0.37%
56	Family activity (e.g., watch child’s game)	3,210	0.69%	109,401	0.37%
60	Changed or transferred mode (e.g., waited for bus or exited bus)	37,939	8.13%	1,740,310	5.87%
61	Other errand	599	0.13%	199,101	0.67%
62	Other leisure	162	0.03%	49,568	0.17%
99	Other reason	14,865	3.18%	921,839	3.11%
150	Went to another residence (e.g., someone else’s home, second home)	13,394	2.87%	487,580	1.64%
152	Went to temporary lodging (e.g., hotel, vacation rental)	2,382	0.51%	54,355	0.18%
995	Missing Response	1,125		437,638
	Total valid	466,893	100.00%	29,641,028	100.00%
	Total missing	1,125		437,638
	Total	468,018		30,078,667

`d_purpose_category`

Value	Label	Unweighted		Weighted
`d_purpose_category`
Destination purpose category
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,155	0.89%	2,715	0.01%
1	Home	143,208	30.60%	10,294,853	34.23%
2	Work	27,914	5.96%	2,869,092	9.54%
3	Work related	24,260	5.18%	1,276,062	4.24%
4	School	15,041	3.21%	1,396,923	4.64%
5	School related	2,612	0.56%	163,089	0.54%
6	Escort	29,844	6.38%	2,743,198	9.12%
7	Shop	47,233	10.09%	2,825,102	9.39%
8	Meal	34,457	7.36%	1,550,718	5.16%
9	Social recreation	55,836	11.93%	2,327,498	7.74%
10	Errand	24,226	5.18%	1,772,820	5.89%
11	Change mode	38,973	8.33%	1,774,445	5.90%
12	Overnight	16,926	3.62%	686,303	2.28%
13	Other	3,333	0.71%	395,846	1.32%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667

`d_purpose`

Value	Label	Unweighted		Weighted
`d_purpose`
Destination purpose
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,155	0.89%	2,715	0.01%
1	Went home	142,791	30.51%	10,261,722	34.12%
10	Went to primary workplace	27,837	5.95%	2,843,592	9.45%
11	Went to work-related activity (e.g., meeting, delivery, worksite)	20,496	4.38%	1,071,417	3.56%
13	Volunteering	1,114	0.24%	70,714	0.24%
14	Other work-related	2,045	0.44%	94,131	0.31%
21	Attend K-12 school	14,522	3.10%	1,305,384	4.34%
22	Attend college/university	186	0.04%	29,521	0.10%
23	Attend other type of class (e.g., cooking class)	192	0.04%	41,442	0.14%
24	Attend other education-related activity (e.g., field trip)	2,468	0.53%	153,143	0.51%
25	Attend vocational education class	5	0.00%	1,868	0.01%
26	Attend daycare or preschool	85	0.02%	11,090	0.04%
30	Grocery shopping	22,211	4.75%	1,441,348	4.79%
31	Got gas	5,216	1.11%	286,933	0.95%
32	Other routine shopping (e.g., pharmacy)	17,721	3.79%	957,118	3.18%
33	Errand without appointment (e.g., post office)	10,433	2.23%	607,594	2.02%
34	Medical visit (e.g., doctor, dentist)	6,658	1.42%	643,020	2.14%
36	Shopping for major item (e.g., furniture, car)	1,154	0.25%	49,969	0.17%
37	Errand with appointment (e.g., haircut)	3,202	0.68%	189,561	0.63%
44	Other activity only (e.g., attend meeting, pick-up or drop-off item)	2,119	0.45%	173,752	0.58%
45	Pick someone up	10,917	2.33%	971,495	3.23%
46	Drop someone off	13,661	2.92%	1,213,381	4.03%
47	Accompany someone only (e.g., go along for the ride)	1,258	0.27%	199,477	0.66%
48	BOTH pick up AND drop off	1,181	0.25%	118,874	0.40%
50	Dined out, got coffee, or take-out	34,064	7.28%	1,533,230	5.10%
51	Exercise or recreation (e.g., gym, jog, bike, walk dog)	26,826	5.73%	1,230,149	4.09%
52	Social activity (e.g., visit friends/relatives)	9,705	2.07%	395,686	1.32%
53	Leisure/entertainment/cultural (e.g., cinema, museum, park)	11,297	2.41%	361,574	1.20%
54	Religious/civic/volunteer activity	2,760	0.59%	112,379	0.37%
56	Family activity (e.g., watch child’s game)	3,271	0.70%	118,416	0.39%
60	Changed or transferred mode (e.g., waited for bus or exited bus)	37,943	8.11%	1,745,414	5.80%
61	Other errand	604	0.13%	200,856	0.67%
62	Other leisure	171	0.04%	52,952	0.18%
99	Other reason	12,893	2.75%	908,620	3.02%
150	Went to another residence (e.g., someone else’s home, second home)	14,214	3.04%	599,317	1.99%
152	Went to temporary lodging (e.g., hotel, vacation rental)	2,643	0.56%	80,814	0.27%
	Total valid	468,018	100.00%	30,078,667	100.00%
	Total	468,018		30,078,667

`d_purpose_reported`

Value	Label	Unweighted		Weighted
`d_purpose_reported`
Reported destination purpose
Value	Label	Count	Percent	Count	Percent
1	Went home	114,799	27.45%	9,069,557	32.60%
10	Went to primary workplace	25,944	6.20%	2,679,546	9.63%
11	Went to work-related activity (e.g., meeting, delivery, worksite)	19,588	4.68%	1,151,720	4.14%
13	Volunteering	1,136	0.27%	77,973	0.28%
14	Other work-related	2,045	0.49%	102,658	0.37%
21	Attend K-12 school	2,130	0.51%	640,377	2.30%
22	Attend college/university	1,899	0.45%	195,703	0.70%
23	Attend other type of class (e.g., cooking class)	1,195	0.29%	82,308	0.30%
24	Attend other education-related activity (e.g., field trip)	625	0.15%	36,815	0.13%
25	Attend vocational education class	49	0.01%	17,711	0.06%
26	Attend daycare or preschool	484	0.12%	126,576	0.45%
30	Grocery shopping	21,280	5.09%	1,459,042	5.24%
31	Got gas	4,783	1.14%	275,631	0.99%
32	Other routine shopping (e.g., pharmacy)	16,232	3.88%	943,451	3.39%
33	Errand without appointment (e.g., post office)	9,783	2.34%	603,161	2.17%
34	Medical visit (e.g., doctor, dentist)	6,345	1.52%	645,755	2.32%
36	Shopping for major item (e.g., furniture, car)	1,003	0.24%	46,317	0.17%
37	Errand with appointment (e.g., haircut)	2,987	0.71%	189,394	0.68%
44	Other activity only (e.g., attend meeting, pick-up or drop-off item)	1,874	0.45%	173,469	0.62%
45	Pick someone up	11,085	2.65%	996,358	3.58%
46	Drop someone off	15,411	3.68%	1,295,856	4.66%
47	Accompany someone only (e.g., go along for the ride)	1,257	0.30%	206,387	0.74%
48	BOTH pick up AND drop off	1,178	0.28%	120,799	0.43%
50	Dined out, got coffee, or take-out	32,313	7.73%	1,559,848	5.61%
51	Exercise or recreation (e.g., gym, jog, bike, walk dog)	28,311	6.77%	1,347,052	4.84%
52	Social activity (e.g., visit friends/relatives)	8,455	2.02%	382,320	1.37%
53	Leisure/entertainment/cultural (e.g., cinema, museum, park)	10,148	2.43%	330,813	1.19%
54	Religious/civic/volunteer activity	2,608	0.62%	115,950	0.42%
56	Family activity (e.g., watch child’s game)	2,825	0.68%	107,775	0.39%
60	Changed or transferred mode (e.g., waited for bus or exited bus)	20,715	4.95%	553,550	1.99%
61	Other errand	4,917	1.18%	323,054	1.16%
62	Other leisure	1,775	0.42%	87,674	0.32%
99	Other reason	27,265	6.52%	1,192,208	4.29%
150	Went to another residence (e.g., someone else’s home, second home)	13,497	3.23%	604,034	2.17%
152	Went to temporary lodging (e.g., hotel, vacation rental)	2,337	0.56%	79,193	0.28%
995	Missing Response	49,740		2,258,632
	Total valid	418,278	100.00%	27,820,035	100.00%
	Total missing	49,740		2,258,632
	Total	468,018		30,078,667

`bike_park_loc`

Value	Label	Unweighted		Weighted
`bike_park_loc`
Bicycle parking location
Value	Label	Count	Percent	Count	Percent
1	Inside house/apartment (includes garage, porch, storage area)	2,616	30.65%	127,424	32.75%
2	Bike rack	2,038	23.88%	93,766	24.10%
3	Bike locker	118	1.38%	3,410	0.88%
4	Secured bike room	421	4.93%	14,418	3.71%
5	Locked to other object (e.g., post, tree)	771	9.03%	37,603	9.67%
6	Bike-share designated docking station	845	9.90%	34,941	8.98%
7	Unlocked on-street	392	4.59%	7,241	1.86%
8	In a parking garage/lot	314	3.68%	13,693	3.52%
10	Carried it with me	683	8.00%	37,260	9.58%
997	Other	336	3.94%	19,285	4.96%
995	Missing Response	459,484		29,689,626
	Total valid	8,534	100.00%	389,040	100.00%
	Total missing	459,484		29,689,626
	Total	468,018		30,078,667
Logic: if mode or transit_access or transit_egress = bicycle

`scooter_park_location`

Value	Label	Unweighted		Weighted
`scooter_park_location`
Scooter parking location
Value	Label	Count	Percent	Count	Percent
1	Inside house/apartment (includes garage, porch, storage area)	166	29.48%	9,146	24.84%
2	Bike/scooter rack	62	11.01%	1,495	4.06%
3	Locker for bikes/scooters	7	1.24%	140	0.38%
4	Secured room	28	4.97%	2,560	6.95%
5	Locked to other object (e.g., post, tree)	18	3.20%	95	0.26%
6	Scooter-share designated docking station	3	0.53%	86	0.23%
7	Unlocked on-street	17	3.02%	33	0.09%
8	In a parking garage/lot	33	5.86%	1,809	4.91%
10	Carried it with me	210	37.30%	21,150	57.45%
997	Other	19	3.37%	301	0.82%
995	Missing Response	467,455		30,041,851
	Total valid	563	100.00%	36,816	100.00%
	Total missing	467,455		30,041,851
	Total	468,018		30,078,667
Logic: if mode or transit_access or transit_egress = micromobility

`park_cost`

Statistic	Unweighted	Weighted
`park_cost`
Amount paid for to park
Statistic	Value	Value
N	21,337.00	950,830.20
Min	0.00	0.00
P25	0.00	0.00
Median	0.00	0.00
Mean	3.09	5.64
P75	0.00	4.00
P95	16.00	30.00
Max	800.00	800.00
SD	15.50	20.40

`taxi_cost`

Statistic	Unweighted	Weighted
`taxi_cost`
Amount paid for taxi
Statistic	Value	Value
N	3,124.00	267,658.93
Min	0.00	0.00
P25	11.00	11.00
Median	16.00	15.00
Mean	23.84	22.76
P75	25.00	27.00
P95	52.00	52.00
Max	800.00	800.00
SD	42.72	28.90

`taxi_pay`

Value	Label	Unweighted		Weighted
`taxi_pay`
Knows amount paid for taxi
Value	Label	Count	Percent	Count	Percent
1	Knows amount paid	3,124	87.17%	267,659	88.25%
2	Don’t know	460	12.83%	35,644	11.75%
995	Missing Response	464,434		29,775,364
	Total valid	3,584	100.00%	303,302	100.00%
	Total missing	464,434		29,775,364
	Total	468,018		30,078,667
Logic: if taxi_type = I paid, employer paid, split/shared

`taxi_type`

Value	Label	Unweighted		Weighted
`taxi_type`
Type of taxi used on trip
Value	Label	Count	Percent	Count	Percent
1	I paid the fare myself (no reimbursement)	2,789	62.25%	253,877	69.98%
2	Employer paid (I am reimbursed)	402	8.97%	31,308	8.63%
3	Split/shared fare with other(s)	394	8.79%	18,117	4.99%
4	Someone else paid 100% (all of fare)	748	16.70%	48,219	13.29%
5	Other	147	3.28%	11,252	3.10%
995	Missing Response	463,538		29,715,894
	Total valid	4,480	100.00%	362,773	100.00%
	Total missing	463,538		29,715,894
	Total	468,018		30,078,667
Logic: if mode or transit_access or transit_egress = taxi

`tnc_type`

Value	Label	Unweighted		Weighted
`tnc_type`
Shared smartphone-app ride service
Value	Label	Count	Percent	Count	Percent
1	Pooled (e.g., UberPool, Lyft Shared)	175	4.34%	16,734	5.63%
2	Regular (e.g., UberX, UberXL, Lyft, LyftXL)	3,725	92.39%	262,700	88.45%
3	Premium (e.g., UberBlack, Lyft Lux)	58	1.44%	3,412	1.15%
998	Don’t know	74	1.84%	14,174	4.77%
995	Missing Response	463,986		29,781,646
	Total valid	4,032	100.00%	297,021	100.00%
	Total missing	463,986		29,781,646
	Total	468,018		30,078,667
Logic: if mode_taxi = Uber/Lyft

`transit_type`

Value	Label	Unweighted		Weighted
`transit_type`
Payment method for transit
Value	Label	Count	Percent	Count	Percent
1	Free (no cost at all)	5,977	13.51%	428,445	19.73%
2	Used transit pass (any type) (e.g., LinkPass, CharlieCard, Senior CharlieCard, Transportation Access Pass (TAP), etc.)	28,983	65.52%	1,282,215	59.04%
3	Cash, credit card, or ticket(s)	8,947	20.23%	432,569	19.92%
4	Don’t know	145	0.33%	16,929	0.78%
6	Used a transfer from a previous transit trip	185	0.42%	11,506	0.53%
995	Missing Response	423,781		27,907,002
	Total valid	44,237	100.00%	2,171,664	100.00%
	Total missing	423,781		27,907,002
	Total	468,018		30,078,667
Logic: if mode = bus (except school bus) or rail

`vehicle_park_pay`

Value	Label	Unweighted		Weighted
`vehicle_park_pay`
Knows amount paid to park vehicle
Value	Label	Count	Percent	Count	Percent
1	Knows amount paid	4,974	89.11%	366,183	92.65%
2	Don’t know	608	10.89%	29,060	7.35%
995	Missing Response	462,436		29,683,424
	Total valid	5,582	100.00%	395,243	100.00%
	Total missing	462,436		29,683,424
	Total	468,018		30,078,667
Logic: if vehicle_park_type = Paid via cash, credit card, tickets or parking service

8.7 Linked Trip

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	53,283	12.70%	0	0.00%
1	Yes	366,186	87.30%	28,050,396	100.00%
	Total valid	419,469	100.00%	28,050,396	100.00%
	Total	419,469		28,050,396

`o_purpose_category`

Value	Label	Unweighted		Weighted
`o_purpose_category`
Origin purpose category
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,094	0.98%	3,542	0.01%
1	Home	143,832	34.38%	10,299,144	37.29%
2	Work	27,818	6.65%	2,775,581	10.05%
3	Work related	23,509	5.62%	1,234,736	4.47%
4	School	14,796	3.54%	1,432,089	5.19%
5	School related	2,469	0.59%	158,005	0.57%
6	Escort	28,697	6.86%	2,646,418	9.58%
7	Shop	46,652	11.15%	2,784,853	10.08%
8	Meal	33,766	8.07%	1,498,014	5.42%
9	Social recreation	47,478	11.35%	2,059,517	7.46%
10	Errand	23,735	5.67%	1,743,895	6.31%
11	Change mode	412	0.10%	10,431	0.04%
12	Overnight	15,722	3.76%	543,199	1.97%
13	Other	5,368	1.28%	426,636	1.54%
995	Missing Response	1,121		434,336
	Total valid	418,348	100.00%	27,616,061	100.00%
	Total missing	1,121		434,336
	Total	419,469		28,050,396

`o_purpose`

Value	Label	Unweighted		Weighted
`o_purpose`
Origin purpose
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,094	0.98%	3,542	0.01%
1	Went home	143,492	34.30%	10,273,824	37.21%
10	Went to primary workplace	27,743	6.63%	2,753,599	9.97%
11	Went to work-related activity (e.g., meeting, delivery, worksite)	19,814	4.74%	1,030,193	3.73%
13	Volunteering	1,098	0.26%	68,730	0.25%
14	Other work-related	1,984	0.47%	91,570	0.33%
21	Attend K-12 school	14,219	3.40%	1,328,676	4.81%
22	Attend college/university	214	0.05%	29,988	0.11%
23	Attend other type of class (e.g., cooking class)	187	0.04%	49,633	0.18%
24	Attend other education-related activity (e.g., field trip)	2,332	0.56%	147,296	0.53%
25	Attend vocational education class	6	0.00%	2,581	0.01%
26	Attend daycare or preschool	122	0.03%	12,564	0.05%
30	Grocery shopping	21,939	5.24%	1,419,595	5.14%
31	Got gas	5,168	1.24%	284,457	1.03%
32	Other routine shopping (e.g., pharmacy)	17,525	4.19%	946,287	3.43%
33	Errand without appointment (e.g., post office)	10,262	2.45%	604,634	2.19%
34	Medical visit (e.g., doctor, dentist)	6,538	1.56%	624,686	2.26%
36	Shopping for major item (e.g., furniture, car)	1,107	0.26%	47,017	0.17%
37	Errand with appointment (e.g., haircut)	3,139	0.75%	187,384	0.68%
44	Other activity only (e.g., attend meeting, pick-up or drop-off item)	2,068	0.49%	167,551	0.61%
45	Pick someone up	11,058	2.64%	971,192	3.52%
46	Drop someone off	12,568	3.00%	1,147,069	4.15%
47	Accompany someone only (e.g., go along for the ride)	1,187	0.28%	186,643	0.68%
48	BOTH pick up AND drop off	1,146	0.27%	110,146	0.40%
50	Dined out, got coffee, or take-out	33,379	7.98%	1,480,578	5.36%
51	Exercise or recreation (e.g., gym, jog, bike, walk dog)	19,393	4.64%	1,035,732	3.75%
52	Social activity (e.g., visit friends/relatives)	9,346	2.23%	369,901	1.34%
53	Leisure/entertainment/cultural (e.g., cinema, museum, park)	10,985	2.63%	336,780	1.22%
54	Religious/civic/volunteer activity	2,722	0.65%	109,241	0.40%
56	Family activity (e.g., watch child’s game)	3,150	0.75%	106,894	0.39%
60	Changed or transferred mode (e.g., waited for bus or exited bus)	377	0.09%	9,715	0.04%
61	Other errand	599	0.14%	199,101	0.72%
62	Other leisure	162	0.04%	49,568	0.18%
99	Other reason	13,560	3.24%	888,438	3.22%
150	Went to another residence (e.g., someone else’s home, second home)	13,312	3.18%	485,264	1.76%
152	Went to temporary lodging (e.g., hotel, vacation rental)	2,352	0.56%	53,523	0.19%
995	Missing Response	1,122		436,801
	Total valid	418,347	100.00%	27,613,595	100.00%
	Total missing	1,122		436,801
	Total	419,469		28,050,396

`d_purpose_category`

Value	Label	Unweighted		Weighted
`d_purpose_category`
Destination purpose category
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,149	0.99%	2,715	0.01%
1	Home	123,068	29.37%	9,357,341	33.38%
2	Work	27,697	6.61%	2,863,501	10.21%
3	Work related	23,831	5.69%	1,258,867	4.49%
4	School	14,947	3.57%	1,392,145	4.97%
5	School related	2,572	0.61%	161,063	0.57%
6	Escort	29,411	7.02%	2,721,219	9.71%
7	Shop	46,987	11.21%	2,819,246	10.06%
8	Meal	34,097	8.14%	1,538,349	5.49%
9	Social recreation	48,126	11.48%	2,138,877	7.63%
10	Errand	24,013	5.73%	1,767,570	6.31%
12	Overnight	16,813	4.01%	683,157	2.44%
13	Other	3,189	0.76%	392,822	1.40%
14	Loop trip	20,140	4.81%	937,512	3.34%
995	Missing Response	429		16,012
	Total valid	419,040	100.00%	28,034,384	100.00%
	Total missing	429		16,012
	Total	419,469		28,050,396

`d_purpose`

Value	Label	Unweighted		Weighted
`d_purpose`
Destination purpose
Value	Label	Count	Percent	Count	Percent
-1	Not imputable	4,149	0.99%	2,715	0.01%
1	Went home	142,791	34.04%	10,261,722	36.58%
10	Went to primary workplace	27,620	6.58%	2,838,001	10.12%
11	Went to work-related activity (e.g., meeting, delivery, worksite)	20,142	4.80%	1,056,472	3.77%
13	Volunteering	1,104	0.26%	69,821	0.25%
14	Other work-related	1,988	0.47%	92,861	0.33%
21	Attend K-12 school	14,430	3.44%	1,300,993	4.64%
22	Attend college/university	184	0.04%	29,133	0.10%
23	Attend other type of class (e.g., cooking class)	192	0.05%	41,442	0.15%
24	Attend other education-related activity (e.g., field trip)	2,432	0.58%	151,178	0.54%
25	Attend vocational education class	5	0.00%	1,868	0.01%
26	Attend daycare or preschool	85	0.02%	11,090	0.04%
30	Grocery shopping	22,093	5.27%	1,438,277	5.13%
31	Got gas	5,193	1.24%	286,595	1.02%
32	Other routine shopping (e.g., pharmacy)	17,639	4.21%	955,067	3.40%
33	Errand without appointment (e.g., post office)	10,336	2.46%	605,447	2.16%
34	Medical visit (e.g., doctor, dentist)	6,637	1.58%	642,064	2.29%
36	Shopping for major item (e.g., furniture, car)	1,141	0.27%	49,776	0.18%
37	Errand with appointment (e.g., haircut)	3,185	0.76%	189,292	0.67%
44	Other activity only (e.g., attend meeting, pick-up or drop-off item)	2,098	0.50%	172,191	0.61%
45	Pick someone up	10,804	2.58%	967,622	3.45%
46	Drop someone off	13,416	3.20%	1,197,567	4.27%
47	Accompany someone only (e.g., go along for the ride)	1,238	0.30%	199,255	0.71%
48	BOTH pick up AND drop off	1,165	0.28%	118,521	0.42%
50	Dined out, got coffee, or take-out	33,704	8.03%	1,520,861	5.42%
51	Exercise or recreation (e.g., gym, jog, bike, walk dog)	19,604	4.67%	1,052,329	3.75%
52	Social activity (e.g., visit friends/relatives)	9,535	2.27%	391,791	1.40%
53	Leisure/entertainment/cultural (e.g., cinema, museum, park)	11,115	2.65%	358,295	1.28%
54	Religious/civic/volunteer activity	2,747	0.65%	112,245	0.40%
56	Family activity (e.g., watch child’s game)	3,211	0.77%	115,908	0.41%
60	Changed or transferred mode (e.g., waited for bus or exited bus)	377	0.09%	13,982	0.05%
61	Other errand	604	0.14%	200,856	0.72%
62	Other leisure	171	0.04%	52,952	0.19%
99	Other reason	11,589	2.76%	875,219	3.12%
150	Went to another residence (e.g., someone else’s home, second home)	14,132	3.37%	597,002	2.13%
152	Went to temporary lodging (e.g., hotel, vacation rental)	2,613	0.62%	79,982	0.29%
	Total valid	419,469	100.00%	28,050,396	100.00%
	Total	419,469		28,050,396

`linked_trip_mode`

Value	Label	Unweighted		Weighted
`linked_trip_mode`
Linked trip mode
Value	Label	Count	Percent	Count	Percent
-1	Missing	13,644	3.25%	210,517	0.75%
1	School bus	7,533	1.80%	570,723	2.03%
11	Regional transit	2,951	0.70%	171,584	0.61%
12	Local transit	12,020	2.87%	635,369	2.27%
22	HOV 3+ persons	61,536	14.67%	3,804,399	13.56%
23	HOV 2 persons	92,892	22.15%	6,080,657	21.68%
24	SOV	148,073	35.30%	12,270,538	43.74%
25	Bike	6,853	1.63%	314,951	1.12%
26	Personal mobility	874	0.21%	37,895	0.14%
27	Shared	680	0.16%	27,348	0.10%
28	TNC	3,755	0.90%	328,760	1.17%
29	Walk	66,268	15.80%	3,392,171	12.09%
30	Long Distance Passenger	881	0.21%	49,803	0.18%
99	Other	1,509	0.36%	155,682	0.56%
	Total valid	419,469	100.00%	28,050,396	100.00%
	Total	419,469		28,050,396

`joint_status`

Value	Label	Unweighted		Weighted
`joint_status`
Indicates whether tour is individual, partially joint, or fully joint
Value	Label	Count	Percent	Count	Percent
1	Not joint	268,409	63.99%	18,211,766	64.93%
2	Partially joint	6,153	1.47%	198,957	0.71%
3	Fully joint	144,907	34.55%	9,639,673	34.37%
	Total valid	419,469	100.00%	28,050,396	100.00%
	Total	419,469		28,050,396

`escort_category`

Value	Label	Unweighted		Weighted
`escort_category`
No escort, escorted drop-off, escorted pick-up, escorting drop-off, or escorting pick-up
Value	Label	Count	Percent	Count	Percent
0	No escort	406,564	96.92%	26,256,025	93.60%
1	Escorted dropoff	3,567	0.85%	451,376	1.61%
2	Escorted pickup	3,304	0.79%	388,883	1.39%
3	Escorting dropoff	3,126	0.75%	511,424	1.82%
4	Escorting pickup	2,908	0.69%	442,689	1.58%
	Total valid	419,469	100.00%	28,050,396	100.00%
	Total	419,469		28,050,396

8.8 Tour

`is_complete`

Value	Label	Unweighted		Weighted
`is_complete`
Record is complete
Value	Label	Count	Percent	Count	Percent
0	No	20,851	13.02%	0	0.00%
1	Yes	139,240	86.98%	11,466,591	100.00%
	Total valid	160,091	100.00%	11,466,591	100.00%
	Total	160,091		11,466,591

`joint_status`

Value	Label	Unweighted		Weighted
`joint_status`
Indicates whether tour is individual, partially joint, or fully joint
Value	Label	Count	Percent	Count	Percent
1	Not joint	96,014	59.97%	6,744,141	58.82%
2	Partially joint	36,966	23.09%	2,845,148	24.81%
3	Fully joint	27,111	16.93%	1,877,303	16.37%
	Total valid	160,091	100.00%	11,466,591	100.00%
	Total	160,091		11,466,591

`tour_category`

Value	Label	Unweighted		Weighted
`tour_category`
Tour category (mandatory, non-mandatory)
Value	Label	Count	Percent	Count	Percent
1	Individual mandatory	43,714	27.31%	4,123,034	35.96%
2	Individual non-mandatory	86,335	53.93%	5,275,480	46.01%
3	At work subtour	2,931	1.83%	190,774	1.66%
4	Joint	27,111	16.93%	1,877,303	16.37%
	Total valid	160,091	100.00%	11,466,591	100.00%
	Total	160,091		11,466,591

`tour_mode`

Value	Label	Unweighted		Weighted
`tour_mode`
Tour mode
Value	Label	Count	Percent	Count	Percent
-1	Missing	3,974	2.48%	62,085	0.54%
1	School bus	4,024	2.51%	337,732	2.95%
11	Regional transit	1,851	1.16%	112,487	0.98%
12	Local transit	6,743	4.21%	378,399	3.30%
22	HOV 3+ persons	27,491	17.17%	1,945,481	16.97%
23	HOV 2 persons	35,382	22.10%	2,555,322	22.28%
24	SOV	46,627	29.13%	4,171,222	36.38%
25	Bike	2,965	1.85%	134,493	1.17%
26	Personal mobility	234	0.15%	18,640	0.16%
27	Shared	289	0.18%	12,966	0.11%
28	TNC	1,215	0.76%	129,771	1.13%
29	Walk	28,186	17.61%	1,507,669	13.15%
30	Long Distance Passenger	734	0.46%	45,190	0.39%
99	Other	376	0.23%	55,134	0.48%
	Total valid	160,091	100.00%	11,466,591	100.00%
	Total	160,091		11,466,591

`tour_purpose`

Value	Label	Unweighted		Weighted
`tour_purpose`
Tour purpose
Value	Label	Count	Percent	Count	Percent
-1	Missing	1,331	0.83%	1,672	0.01%
2	Work	21,087	13.19%	2,459,087	21.47%
3	Work-related	11,117	6.95%	601,549	5.25%
4	School	13,928	8.71%	1,252,401	10.93%
5	School-related	2,240	1.40%	177,458	1.55%
6	Escort	14,327	8.96%	1,377,469	12.03%
7	Shop	20,209	12.64%	1,284,455	11.21%
8	Meal	11,980	7.49%	574,132	5.01%
9	Social/Recreation	23,896	14.95%	1,249,734	10.91%
10	Errand	12,291	7.69%	966,850	8.44%
12	Overnight	3,145	1.97%	161,080	1.41%
13	Other	807	0.50%	146,253	1.28%
14	Loop	23,499	14.70%	1,202,129	10.50%
995	Missing Response	234		12,323
	Total valid	159,857	100.00%	11,454,269	100.00%
	Total missing	234		12,323
	Total	160,091		11,466,591

9 How to Use This Guide

This handbook provides practical, end-to-end guidance for analysts working with the Massachusetts Travel Study dataset. It focuses on how to work from the prepared study tables and codebook tables, join, filter, weight, and analyze the data with reproducible examples. It should serve as the primary resource for descriptive analysis, common metrics, and design-aware inference using these data.

This guide is written in R, but the same principles apply in other statistical software such as Python, Stata, or SAS. The examples below assume the prepared tables and codebook objects are already available in hts and codebook.

Use this handbook alongside the dataset overview in Section 6 and the codebook in Section 7. The dataset overview explains which tables are included, the codebook explains what variables mean, and the later sections of this handbook explain analytic units, weights, and common metrics.

10 Setup and Initial Exploration

10.1 System Requirements and Software

This guide focuses on using R for analysis. Many of the same ideas also apply in other software such as Python, Stata, or SAS.

To follow the examples in this guide, you will need:

R (tested with version 4.4.3)
An R development environment such as Positron or RStudio

The following packages are used throughout:

data.table for large table workflows
dplyr and tidyr for data manipulation
srvyr for survey-weighted analysis
ggplot2 and gt for figures and tables
stringr for string handling
lubridate for date/time processing

Install these packages if they are not already available in your environment.

suppressPackageStartupMessages({
  library(data.table)
  library(dplyr)
  library(tidyr)
  library(srvyr)
  library(ggplot2)
  library(gt)
  library(stringr)
  library(lubridate)
})

10.2 Load Data

This code assumes you have manually unzipped the dataset to a local folder. Adjust the data_dir variable to point to your unzipped dataset location.

The list of .csv files should include:

hh.csv
persons.csv
day.csv
vehicle.csv
trip_unlinked.csv
trip_linked.csv
tour.csv
location.csv

Additionally, you should have two .csv files for the codebook:

value_labels.csv
variable_list.csv

The code below reads all CSV files into a list-of-data.frames called hts (for household travel survey), plus a separate list codebook for the codebook tables. If your study also includes a standalone sample plan CSV, read that file separately rather than expecting it inside the delivered dataset ZIP.

We use data.table::fread() for efficient reading of large CSV files; this can be replaced with read.csv() or other functions as needed, but only if you handle large integers manually. Both base::read.csv() and readr::read_csv cast long IDs as floating-point numbers, which can lead to duplicate IDs, particularly for linked trips. If using those functions, specify colClasses to read ID columns as character or use the bit64 package to handle 64-bit integers.

# Folder where you manually unzipped the dataset
data_dir <- "data_cache"

csv_paths <- list.files(
  data_dir,
  pattern = "\\.csv$",
  full.names = TRUE,
  recursive = TRUE
)

object_names <- tolower(gsub("\\.csv$", "", basename(csv_paths)))
object_names <- make.names(object_names, unique = TRUE)

# Read all csvs
all_data <- setNames(
  lapply(csv_paths, data.table::fread),
  object_names
)

# Separate codebook tables
codebook <- list(
  value_labels = all_data$value_labels,
  variable_list = all_data$variable_list
)

# Separate core HTS tables
hts <- all_data[setdiff(names(all_data), c("value_labels", "variable_list"))]

# Optional: standalone weighting sample plan CSV
sample_plan_path <- "path/to/sample_plan.csv"
sample_plan <- data.table::fread(sample_plan_path)

rm(all_data)

For MassDOT, one of the first setup steps after loading the data should be defining the complete-household analytic universe. Most person-, day-, trip-, and vehicle-level analyses should be limited to households where hts$hh$is_complete == 1.

complete_hh_ids <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::pull(hh_id)

Use complete_hh_ids when you need to restrict lower-level tables to complete households.

10.3 Inspect Tables

Once the files are loaded, inspect the tables before starting analysis.

Get List of Tables

Start by listing the prepared tables that are available in hts.

table_names <- data.frame(
  table = names(hts),
  stringsAsFactors = FALSE
)

Table 32 confirms which prepared HTS tables are loaded for analysis.

Code

gt::gt(table_names) %>%
  gt::tab_header(title = "Loaded HTS Tables")

table
Loaded HTS Tables
hh
person
day
vehicle
location
trip_unlinked
trip_linked
tour
value_labels
variable_list

Table 32: Loaded HTS tables.

Glimpse Data

Each table includes a mix of identifiers, survey variables, and often one or more weight columns. A quick glance at the person table is usually a good starting point.

dplyr::glimpse(hts$person)
#> Rows: 37,616
#> Columns: 269
#> $ person_id                 <chr> "2400008901", "2400008902", "2400012201", "2…
#> $ person_num                <int> 1, 2, 1, 1, 2, 3, 4, 1, 1, 2, 1, 1, 2, 1, 1,…
#> $ hh_id                     <chr> "24000089", "24000089", "24000122", "2400014…
#> $ surveyable                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ is_participant            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ is_proxy                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ has_proxy                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ has_phone                 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ phone_type                <int> 1, 1, 2, 995, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, …
#> $ hh_is_complete            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ is_complete               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ num_days_complete         <int> 1, 1, 7, 7, 1, 1, 1, 1, 1, 1, 1, 7, 7, 7, 6,…
#> $ num_trips                 <int> 2, 5, 39, 45, 2, 4, 2, 0, 4, 3, 3, 36, 13, 1…
#> $ relationship              <int> 0, 1, 0, 0, 2, 2, 1, 0, 0, 1, 0, 0, 1, 0, 0,…
#> $ age                       <int> 8, 9, 5, 9, 5, 4, 8, 7, 9, 9, 6, 10, 9, 10, …
#> $ gender                    <int> 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2,…
#> $ race_other                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ ethnicity_other           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ employment                <int> 5, 5, 5, 1, 1, 2, 1, 1, 5, 2, 1, 5, 5, 5, 1,…
#> $ work_mode                 <int> 995, 995, 995, 104, 100, 100, 1, 100, 995, 1…
#> $ job_type                  <int> 995, 995, 995, 5, 1, 5, 1, 1, 995, 2, 1, 995…
#> $ num_jobs                  <int> 995, 995, 995, 1, 1, 2, 1, 1, 995, 1, 2, 995…
#> $ work_lon                  <dbl> NA, NA, NA, -71.05214, -71.79986, -72.67345,…
#> $ work_lat                  <dbl> NA, NA, NA, 42.35606, 42.26745, 41.76257, 42…
#> $ work_in_region            <int> 995, 995, 995, 1, 1, 0, 1, 1, 995, 995, 1, 9…
#> $ work_state                <chr> NA, NA, NA, "25", "25", "09", "25", "25", NA…
#> $ work_county               <chr> NA, NA, NA, "25025", "25027", "09003", "2502…
#> $ work_bg_2010              <chr> NA, NA, NA, "250250701018", "250277317001", …
#> $ work_bg_2020              <chr> NA, NA, NA, "250250701042", "250277317002", …
#> $ work_puma_2012            <chr> NA, NA, NA, "03302", "00300", "00302", "0030…
#> $ work_puma_2022            <chr> NA, NA, NA, "00802", "00505", "20201", "0050…
#> $ education                 <int> 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 6, 2, 3, 6,…
#> $ student                   <int> 2, 2, 0, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
#> $ school_mode               <int> 995, 995, 1, 995, 995, 995, 995, 995, 995, 9…
#> $ school_type               <int> 995, 995, 13, 995, 995, 13, 995, 995, 995, 9…
#> $ school_freq               <int> 995, 995, 4, 995, 995, 995, 995, 995, 995, 9…
#> $ remote_class_freq         <int> 995, 995, 996, 995, 995, 2, 995, 995, 995, 9…
#> $ school_in_region          <int> 995, 995, 1, 995, 995, 995, 995, 995, 995, 9…
#> $ school_state              <chr> NA, NA, "25", NA, NA, NA, NA, NA, NA, NA, NA…
#> $ school_county             <chr> NA, NA, "25017", NA, NA, NA, NA, NA, NA, NA,…
#> $ school_puma_2012          <chr> NA, NA, "03400", NA, NA, NA, NA, NA, NA, NA,…
#> $ school_puma_2022          <chr> NA, NA, "00613", NA, NA, NA, NA, NA, NA, NA,…
#> $ school_bg_2010            <chr> NA, NA, "250173736002", NA, NA, NA, NA, NA, …
#> $ school_bg_2020            <chr> NA, NA, "250173736002", NA, NA, NA, NA, NA, …
#> $ school_lon                <dbl> NA, NA, -71.16924, NA, NA, NA, NA, NA, NA, N…
#> $ school_lat                <dbl> NA, NA, 42.33609, NA, NA, NA, NA, NA, NA, NA…
#> $ second_home               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ second_home_in_region     <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ second_home_state         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_county        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_puma_2012     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_puma_2022     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_bg_2010       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_bg_2020       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_lon           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ second_home_lat           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ can_drive                 <int> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ vehicle                   <int> 6, 7, 995, 6, 6, 9, 7, 8, 6, 7, 6, 6, 6, 6, …
#> $ transit_freq              <int> 4, 8, 5, 4, 9, 9, 9, 9, 8, 9, 8, 9, 9, 9, 8,…
#> $ tnc_freq                  <int> 7, 8, 8, 8, 995, 995, 995, 995, 995, 995, 8,…
#> $ bike_freq                 <int> 8, 996, 8, 996, 8, 996, 8, 996, 996, 996, 8,…
#> $ vanpool_freq              <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bikeshare_freq            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ scootshare_freq           <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ walk_freq                 <int> 4, 5, 1, 2, 5, 5, 1, 1, 2, 2, 1, 1, 8, 8, 4,…
#> $ transit_pass              <int> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ disability                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,…
#> $ participate               <int> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1,…
#> $ barriers_1                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ barriers_10               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_2                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_3                <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_4                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_5                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_6                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_7                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_8                <int> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1,…
#> $ barriers_9                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_997              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_999              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ barriers_other            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bicycle_other             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bicycle_type_1            <int> 1, 995, 1, 995, 995, 995, 995, 1, 1, 995, 99…
#> $ bicycle_type_2            <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bicycle_type_997          <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_comfort_lane         <int> 3, NA, 2, 1, NA, NA, NA, 2, 4, NA, 2, 4, 4, …
#> $ bike_comfort_local        <int> 1, NA, 1, 2, NA, NA, NA, 1, 1, NA, 2, 3, 4, …
#> $ bike_comfort_major        <int> 4, NA, 4, 1, NA, NA, NA, 4, 4, NA, 4, 4, 4, …
#> $ bike_comfort_minor        <int> 3, NA, 4, 1, NA, NA, NA, 2, 3, NA, 3, 4, 4, …
#> $ bike_comfort_neighborhood <int> 2, NA, 2, 2, NA, NA, NA, 1, 1, NA, 1, 3, 4, …
#> $ bike_comfort_paths        <int> 1, NA, 1, 2, NA, NA, NA, 1, 1, NA, 1, 1, 4, …
#> $ bike_comfort_street       <int> 2, NA, 3, 2, NA, NA, NA, 1, 2, NA, 2, 4, 4, …
#> $ bike_comfort_striped      <int> 3, NA, 3, 1, NA, NA, NA, 3, 4, NA, 3, 4, 4, …
#> $ bike_factors_1            <int> 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,…
#> $ bike_factors_10           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_11           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_12           <int> 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1,…
#> $ bike_factors_2            <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_3            <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_4            <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_5            <int> 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,…
#> $ bike_factors_6            <int> 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0,…
#> $ bike_factors_7            <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_8            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_9            <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ bike_factors_other        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bike_purpose_1            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_2            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_3            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_4            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_5            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_6            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_7            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_8            <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ bike_purpose_other        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ bike_safety_1             <int> 0, 995, 1, 995, 995, 995, 995, 995, 1, 0, 0,…
#> $ bike_safety_2             <int> 1, 995, 1, 995, 995, 995, 995, 995, 0, 0, 1,…
#> $ bike_safety_3             <int> 0, 995, 1, 995, 995, 995, 995, 995, 0, 0, 1,…
#> $ bike_safety_4             <int> 0, 995, 1, 995, 995, 995, 995, 995, 0, 0, 1,…
#> $ bike_safety_5             <int> 1, 995, 0, 995, 995, 995, 995, 995, 0, 0, 0,…
#> $ bike_safety_6             <int> 0, 995, 0, 995, 995, 995, 995, 995, 0, 0, 0,…
#> $ bike_safety_7             <int> 0, 995, 0, 995, 995, 995, 995, 995, 0, 0, 0,…
#> $ bike_safety_8             <int> 0, 995, 0, 995, 995, 995, 995, 995, 0, 1, 0,…
#> $ bike_safety_other         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "At our …
#> $ bike_store_1              <int> 1, 995, 1, 995, 995, 995, 995, 1, 0, 995, 99…
#> $ bike_store_2              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_3              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_4              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_5              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_6              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_7              <int> 0, 995, 0, 995, 995, 995, 995, 0, 0, 995, 99…
#> $ bike_store_997            <int> 0, 995, 0, 995, 995, 995, 995, 0, 1, 995, 99…
#> $ carshare_freq             <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ commute_days_1            <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_days_2            <int> 995, 995, 995, 1, 1, 0, 1, 0, 995, 0, 1, 995…
#> $ commute_days_3            <int> 995, 995, 995, 1, 1, 0, 1, 0, 995, 0, 1, 995…
#> $ commute_days_4            <int> 995, 995, 995, 1, 1, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_days_5            <int> 995, 995, 995, 0, 1, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_days_6            <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_days_7            <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_days_996          <int> 995, 995, 995, 0, 0, 1, 0, 1, 995, 1, 0, 995…
#> $ commute_subsidy_1         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_subsidy_10        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_11        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_subsidy_12        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_13        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_14        <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_2         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_3         <int> 995, 995, 995, 0, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ commute_subsidy_4         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 1, 995…
#> $ commute_subsidy_5         <int> 995, 995, 995, 1, 0, 0, 0, 1, 995, 0, 0, 995…
#> $ commute_subsidy_6         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_7         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_8         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_9         <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_996       <int> 995, 995, 995, 0, 1, 1, 0, 0, 995, 1, 0, 995…
#> $ commute_subsidy_998       <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ commute_subsidy_use_1     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_10    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_11    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_12    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_13    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_14    <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_2     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_3     <int> 995, 995, 995, 0, 995, 995, 1, 1, 995, 995, …
#> $ commute_subsidy_use_4     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_5     <int> 995, 995, 995, 1, 995, 995, 0, 1, 995, 995, …
#> $ commute_subsidy_use_6     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_7     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_8     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_9     <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ commute_subsidy_use_996   <int> 995, 995, 995, 0, 995, 995, 0, 0, 995, 995, …
#> $ ethnicity_1               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ ethnicity_2               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_3               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_4               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_997             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ethnicity_999             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ ev_subsidies              <int> 4, 995, 995, 5, 995, 995, 995, 5, 5, 995, 2,…
#> $ ev_typical_charge_1       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_2       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_3       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_4       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_5       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_6       <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ ev_typical_charge_997     <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ home_vehicle_park         <int> 1, 995, 995, 3, 995, 995, 995, 1, 1, 995, 1,…
#> $ home_vehicle_park_pay     <int> 0, 995, 995, 0, 995, 995, 995, 0, 0, 995, 0,…
#> $ home_vehicle_park_permit  <int> 995, 995, 995, 1, 995, 995, 995, 995, 995, 9…
#> $ micromobility_devices_1   <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ micromobility_devices_2   <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ micromobility_devices_3   <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ micromobility_devices_996 <int> 1, 995, 1, 1, 995, 995, 995, 1, 1, 995, 1, 1…
#> $ micromobility_devices_997 <int> 0, 995, 0, 0, 995, 995, 995, 0, 0, 995, 0, 0…
#> $ num_bicycles              <int> 1, 995, 1, 0, 995, 995, 995, 2, 2, 995, 0, 0…
#> $ peerrent_freq             <int> 995, 995, 995, 995, 995, 995, 995, 995, 995,…
#> $ race_1                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_2                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_3                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ race_4                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_5                    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,…
#> $ race_997                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ race_999                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_2                   <int> 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,…
#> $ share_3                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_4                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_5                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ share_6                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_7                   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ share_996                 <int> 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,…
#> $ telework_days_1           <int> 995, 995, 995, 1, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ telework_days_2           <int> 995, 995, 995, 0, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ telework_days_3           <int> 995, 995, 995, 0, 0, 0, 1, 1, 995, 0, 0, 995…
#> $ telework_days_4           <int> 995, 995, 995, 0, 0, 0, 0, 1, 995, 0, 0, 995…
#> $ telework_days_5           <int> 995, 995, 995, 1, 0, 0, 1, 0, 995, 0, 0, 995…
#> $ telework_days_6           <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ telework_days_7           <int> 995, 995, 995, 0, 0, 0, 0, 0, 995, 0, 0, 995…
#> $ telework_days_996         <int> 995, 995, 995, 0, 1, 1, 0, 0, 995, 1, 1, 995…
#> $ telework_freq_pre_covid   <int> 995, 995, 3, 8, 996, 8, 8, 7, 995, 996, 996,…
#> $ transit_factors_1         <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,…
#> $ transit_factors_10        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ transit_factors_11        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_12        <int> 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0,…
#> $ transit_factors_2         <int> 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,…
#> $ transit_factors_3         <int> 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,…
#> $ transit_factors_4         <int> 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_5         <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_6         <int> 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0,…
#> $ transit_factors_7         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,…
#> $ transit_factors_8         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_9         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ transit_factors_other     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ transit_purpose_1         <int> 0, 995, 1, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_2         <int> 1, 995, 1, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_3         <int> 0, 995, 0, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_4         <int> 0, 995, 1, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_5         <int> 0, 995, 0, 1, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_6         <int> 0, 995, 0, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_7         <int> 1, 995, 0, 0, 995, 995, 995, 995, 995, 995, …
#> $ transit_purpose_other     <chr> "cultural events", NA, NA, NA, NA, NA, NA, N…
#> $ walk_purpose_1            <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_2            <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_3            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_4            <int> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_5            <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 995, 995…
#> $ walk_purpose_6            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_7            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 995, 995…
#> $ walk_purpose_8            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 995, 995…
#> $ walk_purpose_other        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ why_no_bike_1             <int> 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,…
#> $ why_no_bike_2             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ why_no_bike_3             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ why_no_bike_4             <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ why_no_bike_5             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
#> $ why_no_bike_6             <int> 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,…
#> $ why_no_bike_7             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ why_no_bike_8             <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,…
#> $ person_type               <int> 4, 4, 3, 1, 1, 3, 1, 1, 4, 2, 1, 5, 4, 5, 1,…
#> $ person_weight             <dbl> 157.42583, 157.42583, 103.34663, 188.21553, …
#> $ race_imputed              <chr> "white", "white", "white", "white", "white",…
#> $ ethnicity_imputed         <chr> "not_hispanic", "not_hispanic", "not_hispani…
#> $ gender_imputed            <chr> "female", "male", "female", "female", "male"…
#> $ person_weight_tue         <dbl> 187.88768, 187.88768, 59.87428, 458.44733, 2…
#> $ person_weight_fri         <dbl> NA, NA, 75.95139, 655.71500, NA, NA, NA, NA,…
#> $ person_weight_mon         <dbl> NA, NA, 65.09989, 546.06550, NA, NA, NA, NA,…
#> $ person_weight_sat         <dbl> NA, NA, 76.29594, 651.46690, NA, NA, NA, NA,…
#> $ person_weight_sun         <dbl> NA, NA, 74.33908, 640.43646, NA, NA, NA, NA,…
#> $ person_weight_thu         <dbl> NA, NA, 64.49532, 489.71295, NA, NA, NA, NA,…
#> $ person_weight_wed         <dbl> NA, NA, 58.57788, 463.14780, NA, NA, NA, NA,…

View Table Dimensions

Table sizes reflect the hierarchical structure of the dataset: households contain people, people contain travel days, and days may contain zero or more trips.

table_dimensions <- data.frame(
  table = character(),
  rows = integer(),
  columns = integer(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  table_dimensions <- rbind(
    table_dimensions,
    data.frame(
      table = table_name,
      rows = nrow(hts[[table_name]]),
      columns = ncol(hts[[table_name]]),
      stringsAsFactors = FALSE
    )
  )
}

Use Table 33 to confirm that the row counts follow the expected household-to-person-to-day-to-trip hierarchy.

Code

gt::gt(table_dimensions) %>%
  gt::fmt_number(
    columns = c(rows, columns),
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    table = "Table",
    rows = "Rows",
    columns = "Columns"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )

Table	Rows	Columns
hh	18,122	52
person	37,616	269
day	134,187	65
vehicle	25,849	19
location	8,607,225	9
trip_unlinked	468,018	139
trip_linked	419,469	58
tour	160,091	57
value_labels	2,422	3
variable_list	567	19

Table 33: HTS table dimensions.

For MassDOT, it is also useful to compare delivered row counts with the complete-household subset before calculating any substantive estimate.

complete_household_table_dimensions <- data.frame(
  table = character(),
  complete_household_rows = integer(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  if ("hh_id" %in% names(hts[[table_name]])) {
    complete_household_rows <- sum(hts[[table_name]]$hh_id %in% complete_hh_ids)
  } else {
    complete_household_rows <- NA_integer_
  }

  complete_household_table_dimensions <- rbind(
    complete_household_table_dimensions,
    data.frame(
      table = table_name,
      complete_household_rows = complete_household_rows,
      stringsAsFactors = FALSE
    )
  )
}

Table 34 shows the number of rows that belong to complete households in each table with a household identifier.

Code

gt::gt(complete_household_table_dimensions) %>%
  gt::fmt_number(
    columns = complete_household_rows,
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    table = "Table",
    complete_household_rows = "Complete-Household Rows"
  )

Table	Complete-Household Rows
hh	15,641
person	31,255
day	96,370
vehicle	21,770
location	NA
trip_unlinked	411,573
trip_linked	366,186
tour	139,240
value_labels	NA
variable_list	NA

Table 34: Rows belonging to complete households.

View Sample Records

Before summarizing a table, it is often useful to preview a few records and confirm that the key fields look as expected.

person_preview <- head(hts$person)

Table 35 shows the first few person records from the prepared data.

Code

gt::gt(person_preview) %>%
  gt::tab_header(title = "Sample Person Records")

person_id	person_num	hh_id	surveyable	is_participant	is_proxy	has_proxy	has_phone	phone_type	hh_is_complete	is_complete	num_days_complete	num_trips	relationship	age	gender	race_other	ethnicity_other	employment	work_mode	job_type	num_jobs	work_lon	work_lat	work_in_region	work_state	work_county	work_bg_2010	work_bg_2020	work_puma_2012	work_puma_2022	education	student	school_mode	school_type	school_freq	remote_class_freq	school_in_region	school_state	school_county	school_puma_2012	school_puma_2022	school_bg_2010	school_bg_2020	school_lon	school_lat	second_home	second_home_in_region	second_home_state	second_home_county	second_home_puma_2012	second_home_puma_2022	second_home_bg_2010	second_home_bg_2020	second_home_lon	second_home_lat	can_drive	vehicle	transit_freq	tnc_freq	bike_freq	vanpool_freq	bikeshare_freq	scootshare_freq	walk_freq	transit_pass	disability	participate	barriers_1	barriers_10	barriers_2	barriers_3	barriers_4	barriers_5	barriers_6	barriers_7	barriers_8	barriers_9	barriers_997	barriers_999	barriers_other	bicycle_other	bicycle_type_1	bicycle_type_2	bicycle_type_997	bike_comfort_lane	bike_comfort_local	bike_comfort_major	bike_comfort_minor	bike_comfort_neighborhood	bike_comfort_paths	bike_comfort_street	bike_comfort_striped	bike_factors_1	bike_factors_10	bike_factors_11	bike_factors_12	bike_factors_2	bike_factors_3	bike_factors_4	bike_factors_5	bike_factors_6	bike_factors_7	bike_factors_8	bike_factors_9	bike_factors_other	bike_purpose_1	bike_purpose_2	bike_purpose_3	bike_purpose_4	bike_purpose_5	bike_purpose_6	bike_purpose_7	bike_purpose_8	bike_purpose_other	bike_safety_1	bike_safety_2	bike_safety_3	bike_safety_4	bike_safety_5	bike_safety_6	bike_safety_7	bike_safety_8	bike_safety_other	bike_store_1	bike_store_2	bike_store_3	bike_store_4	bike_store_5	bike_store_6	bike_store_7	bike_store_997	carshare_freq	commute_days_1	commute_days_2	commute_days_3	commute_days_4	commute_days_5	commute_days_6	commute_days_7	commute_days_996	commute_subsidy_1	commute_subsidy_10	commute_subsidy_11	commute_subsidy_12	commute_subsidy_13	commute_subsidy_14	commute_subsidy_2	commute_subsidy_3	commute_subsidy_4	commute_subsidy_5	commute_subsidy_6	commute_subsidy_7	commute_subsidy_8	commute_subsidy_9	commute_subsidy_996	commute_subsidy_998	commute_subsidy_use_1	commute_subsidy_use_10	commute_subsidy_use_11	commute_subsidy_use_12	commute_subsidy_use_13	commute_subsidy_use_14	commute_subsidy_use_2	commute_subsidy_use_3	commute_subsidy_use_4	commute_subsidy_use_5	commute_subsidy_use_6	commute_subsidy_use_7	commute_subsidy_use_8	commute_subsidy_use_9	commute_subsidy_use_996	ethnicity_1	ethnicity_2	ethnicity_3	ethnicity_4	ethnicity_997	ethnicity_999	ev_subsidies	ev_typical_charge_1	ev_typical_charge_2	ev_typical_charge_3	ev_typical_charge_4	ev_typical_charge_5	ev_typical_charge_6	ev_typical_charge_997	home_vehicle_park	home_vehicle_park_pay	home_vehicle_park_permit	micromobility_devices_1	micromobility_devices_2	micromobility_devices_3	micromobility_devices_996	micromobility_devices_997	num_bicycles	peerrent_freq	race_1	race_2	race_3	race_4	race_5	race_997	race_999	share_2	share_3	share_4	share_5	share_6	share_7	share_996	telework_days_1	telework_days_2	telework_days_3	telework_days_4	telework_days_5	telework_days_6	telework_days_7	telework_days_996	telework_freq_pre_covid	transit_factors_1	transit_factors_10	transit_factors_11	transit_factors_12	transit_factors_2	transit_factors_3	transit_factors_4	transit_factors_5	transit_factors_6	transit_factors_7	transit_factors_8	transit_factors_9	transit_factors_other	transit_purpose_1	transit_purpose_2	transit_purpose_3	transit_purpose_4	transit_purpose_5	transit_purpose_6	transit_purpose_7	transit_purpose_other	walk_purpose_1	walk_purpose_2	walk_purpose_3	walk_purpose_4	walk_purpose_5	walk_purpose_6	walk_purpose_7	walk_purpose_8	walk_purpose_other	why_no_bike_1	why_no_bike_2	why_no_bike_3	why_no_bike_4	why_no_bike_5	why_no_bike_6	why_no_bike_7	why_no_bike_8	person_type	person_weight	race_imputed	ethnicity_imputed	gender_imputed	person_weight_tue	person_weight_fri	person_weight_mon	person_weight_sat	person_weight_sun	person_weight_thu	person_weight_wed
Sample Person Records
2400008901	1	24000089	1	1	0	0	1	1	1	1	1	2	0	8	1	NA	NA	5	995	995	995	NA	NA	995	NA	NA	NA	NA	NA	NA	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	6	4	7	8	995	995	995	4	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	1	0	0	3	1	4	3	2	1	2	3	1	0	0	0	1	1	0	0	0	0	0	0	NA	995	995	995	995	995	995	995	995	NA	0	1	0	0	1	0	0	0	NA	1	0	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	4	995	995	995	995	995	995	995	1	0	995	0	0	0	1	0	1	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	0	0	0	0	1	1	0	1	1	0	0	0	NA	0	1	0	0	0	0	1	cultural events	0	1	0	1	0	0	0	1	NA	0	0	0	0	0	1	0	0	4	157.42583	white	not_hispanic	female	187.88768	NA	NA	NA	NA	NA	NA
2400008902	2	24000089	1	1	0	0	1	1	1	1	1	5	1	9	2	NA	NA	5	995	995	995	NA	NA	995	NA	NA	NA	NA	NA	NA	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	7	8	8	996	995	995	995	5	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	1	0	0	0	0	0	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	0	0	0	0	1	1	1	1	1	0	0	0	NA	995	995	995	995	995	995	995	NA	0	0	0	1	0	0	0	1	NA	0	0	0	0	0	0	0	1	4	157.42583	white	not_hispanic	male	187.88768	NA	NA	NA	NA	NA	NA
2400012201	1	24000122	1	1	0	0	1	2	1	1	7	39	0	5	1	NA	NA	5	995	995	995	NA	NA	995	NA	NA	NA	NA	NA	NA	7	0	1	13	4	996	1	25	25017	03400	00613	250173736002	250173736002	-71.16924	42.33609	0	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	5	8	8	995	995	995	1	1	0	1	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	1	0	0	2	1	4	4	2	1	3	3	1	0	0	0	1	0	0	0	0	1	0	0	NA	995	995	995	995	995	995	995	995	NA	1	1	1	1	0	0	0	0	NA	1	0	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	0	0	0	1	0	1	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	995	995	995	995	995	995	995	995	3	1	0	0	0	1	0	1	0	0	0	0	0	NA	1	1	0	1	0	0	0	NA	1	1	0	1	1	0	0	1	NA	0	0	0	1	0	1	0	0	3	103.34663	white	not_hispanic	female	59.87428	75.95139	65.09989	76.29594	74.33908	64.49532	58.57788
2400014001	1	24000140	1	1	0	0	1	995	1	1	7	45	0	9	1	NA	NA	1	104	5	1	-71.05214	42.35606	1	25	25025	250250701018	250250701042	03302	00802	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	6	4	8	996	995	995	995	2	1	0	1	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	1	2	1	1	2	2	2	1	0	0	0	1	0	0	0	0	0	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	1	0	0	0	0	0	5	995	995	995	995	995	995	995	3	0	1	0	0	0	1	0	0	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0	8	0	0	0	1	0	0	0	0	0	0	0	0	NA	0	0	0	0	1	0	0	NA	0	0	0	0	0	0	0	1	NA	1	0	0	0	0	0	0	0	1	188.21553	white	not_hispanic	female	458.44733	655.71500	546.06550	651.46690	640.43646	489.71295	463.14780
2400015802	2	24000158	1	1	0	0	1	2	1	1	1	2	2	5	2	NA	NA	1	100	1	1	-71.79986	42.26745	1	25	25027	250277317001	250277317002	00300	00505	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	6	9	995	8	995	995	995	5	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	1	0	0	0	0	0	1	0	1	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	996	0	0	0	0	0	0	1	0	0	0	0	0	NA	995	995	995	995	995	995	995	NA	0	0	0	0	0	0	0	1	NA	1	0	0	0	0	0	0	0	1	50.69754	white	not_hispanic	male	216.88728	NA	NA	NA	NA	NA	NA
2400015803	3	24000158	1	1	0	0	1	2	1	1	1	4	2	4	2	NA	NA	2	100	5	2	-72.67345	41.76257	0	09	09003	090035021002	090035021001	00302	20201	6	3	995	13	995	2	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	9	9	995	996	995	995	995	5	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	0	0	0	0	1	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	8	0	0	0	0	0	0	0	0	1	0	0	0	NA	995	995	995	995	995	995	995	NA	0	0	0	0	0	0	0	1	NA	1	0	0	0	0	0	0	0	3	50.69754	white	not_hispanic	male	216.88728	NA	NA	NA	NA	NA	NA

Table 35: Sample person records.

Inspect Weight Columns

Weights are typically included in the household, person, day, trip, and tour tables when weighted estimates are supported.

weight_summaries <- data.frame(
  table = character(),
  weight_column = character(),
  min = numeric(),
  median = numeric(),
  mean = numeric(),
  max = numeric(),
  zero_count = integer(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  weight_columns <- grep("_weight$", names(hts[[table_name]]), value = TRUE)

  if (length(weight_columns) == 0L) {
    next
  }

  weight_col <- weight_columns[[1]]
  weight_vec <- hts[[table_name]][[weight_col]]

  weight_summaries <- rbind(
    weight_summaries,
    data.frame(
      table = table_name,
      weight_column = weight_col,
      min = min(weight_vec, na.rm = TRUE),
      median = stats::median(weight_vec, na.rm = TRUE),
      mean = mean(weight_vec, na.rm = TRUE),
      max = max(weight_vec, na.rm = TRUE),
      zero_count = sum(weight_vec == 0, na.rm = TRUE),
      stringsAsFactors = FALSE
    )
  )
}

Table 36 is a quick way to check whether the main weight columns are present and populated.

Code

gt::gt(weight_summaries) %>%
  gt::fmt_number(
    columns = c(min, median, mean, max),
    decimals = 3
  ) %>%
  gt::fmt_number(
    columns = zero_count,
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    table = "Table",
    weight_column = "Weight Column",
    min = "Min",
    median = "Median",
    mean = "Mean",
    max = "Max",
    zero_count = "Zero Count"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )

Table	Weight Column	Min	Median	Mean	Max	Zero Count
hh	hh_weight	15.780	106.787	180.980	1,129.933	0
person	person_weight	0.000	112.787	217.239	4,230.993	1,556
day	day_weight	0.000	30.002	108.071	3,570.328	13,520
vehicle	hh_weight	15.780	116.151	204.342	1,129.933	0
trip_unlinked	trip_weight	0.000	28.591	117.434	5,604.895	56,013
trip_linked	linked_trip_weight	0.000	29.896	124.146	5,604.895	51,963
tour	tour_weight	0.000	30.100	128.477	5,373.668	21,243

Table 36: Weight summaries by table.

The codebook object can also be inspected immediately after loading.

variable_list_preview <- head(codebook$variable_list)
value_labels_preview <- head(codebook$value_labels)

Review Table 37 and Table 38 before analysis so you can confirm variable definitions and labeled response values.

Code

gt::gt(variable_list_preview) %>%
  gt::tab_header(title = "Variable List Preview")

order	source	variable	is_checkbox	hh	person	day	vehicle	location	unlinked_trip	linked_trip	tour	logic	description	data_type	write_to_export	exclude_from_frequencies	category	exclude
Variable List Preview
1	pipeline	hh_id	0	1	1	1	1	0	1	1	1	NA	Household ID	integer	TRUE	TRUE	NA	NA
2	pipeline	is_complete	0	1	1	1	1	0	1	1	1	NA	Record is complete	integer/categorical	TRUE	FALSE	NA	NA
3	pipeline	num_trips	0	1	1	1	0	0	0	0	0	NA	Number of trips	integer	TRUE	FALSE	NA	NA
4	pipeline	num_days_complete	0	1	1	0	0	0	0	0	0	NA	Number of complete days	integer/categorical	TRUE	FALSE	NA	NA
5	pipeline	first_travel_date	0	1	0	0	0	0	1	0	0	NA	First travel date	date	TRUE	TRUE	NA	NA
6	pipeline	last_travel_date	0	1	0	0	0	0	1	0	0	NA	Last travel date	date	TRUE	TRUE	NA	NA

Table 37: Variable list preview.

Code

gt::gt(value_labels_preview) %>%
  gt::tab_header(title = "Value Labels Preview")

variable	value	label
Value Labels Preview
added_trip	0	No
added_trip	1	Yes
age	1	Age under 5
age	10	Age 75-84
age	11	Age 85 up
age	2	Age 5-15

Table 38: Value label preview.

11 Data Structure and Joins

The dataset is organized as a set of related tables. Each table represents a different unit of observation, such as households, people, travel days, vehicles, trips, locations, or tours. Most analyses require using more than one table, so it is important to understand how records are linked before joining tables. The examples below use the same dplyr join pattern used throughout the analyst handbook.

For MassDOT, the main linkage pattern is:

Households are the primary sampling unit and are identified by hh_id.
Persons are nested within households and linked by hh_id.
Each person can have multiple travel days, linked by person_id and day_id.
Each day can record zero or more trips, linked from the day and person tables by day_id and person_id.
Trips can be analyzed at the linked or unlinked level depending on the research question.
Location records provide point-level context along trips and are linked by trip and day identifiers.
Tours summarize sequences of linked trips that begin and end at the same anchor location.

Figure 9 repeats the dataset overview figure so the table hierarchy is visible while working through joins.

The first step in any join workflow is to inspect the identifier columns that connect the tables.

id_columns_summary <- data.frame(
  table = character(),
  id_columns = character(),
  stringsAsFactors = FALSE
)

for (table_name in names(hts)) {
  id_columns <- grep("_id$", names(hts[[table_name]]), value = TRUE)

  if (length(id_columns) == 0L) {
    next
  }

  id_columns_summary <- rbind(
    id_columns_summary,
    data.frame(
      table = table_name,
      id_columns = paste(id_columns, collapse = ", "),
      stringsAsFactors = FALSE
    )
  )
}

Use Table 39 to confirm which keys are available before you start joining tables.

Code

gt::gt(id_columns_summary) %>%
  gt::cols_label(
    table = "Table",
    id_columns = "Identifier Columns"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )

Table	Identifier Columns
hh	hh_id
person	person_id, hh_id
day	day_id, person_id, hh_id
vehicle	hh_id, vehicle_id
location	trip_id
trip_unlinked	trip_id, day_id, hh_id, person_id, linked_trip_id, joint_trip_id, tour_id
trip_linked	linked_trip_id, hh_id, person_id, day_id, joint_trip_id, tour_id
tour	tour_id, hh_id, person_id, day_id, out_chauffeur_id, inb_chauffeur_id, out_chauffeur_tour_id, inb_chauffeur_tour_id, parent_tour_id, joint_tour_id

Table 39: Identifier columns by table.

11.1 Common Join Patterns

The most common joins follow the hierarchy from households to lower-level records.

For MassDOT, many substantive analyses should also carry the household completion rule through the join process. A common pattern is to join hh$is_complete or filter lower-level tables with hh_id %in% complete_hh_ids before summarizing.

For example, to join household characteristics to people, first select the household fields that belong in the person-level analysis file.

household_join_fields <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::mutate(hh_is_complete = is_complete)

if ("income_detailed" %in% names(hts$hh)) {
  household_join_fields <- household_join_fields %>%
    dplyr::mutate(household_income = income_detailed)
}

if ("num_vehicles" %in% names(hts$hh)) {
  household_join_fields <- household_join_fields %>%
    dplyr::mutate(household_vehicles = num_vehicles)
}

household_join_fields <- household_join_fields %>%
  dplyr::select(
    hh_id,
    hh_is_complete,
    dplyr::any_of(c("household_income", "household_vehicles"))
  )

person_with_household <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    household_join_fields,
    by = "hh_id"
  )

Table 40 shows the person table after household variables have been joined in.

Code

gt::gt(head(person_with_household)) %>%
  gt::tab_header(title = "Person Records Joined to Household Variables")

person_id	person_num	hh_id	surveyable	is_participant	is_proxy	has_proxy	has_phone	phone_type	hh_is_complete.x	is_complete	num_days_complete	num_trips	relationship	age	gender	race_other	ethnicity_other	employment	work_mode	job_type	num_jobs	work_lon	work_lat	work_in_region	work_state	work_county	work_bg_2010	work_bg_2020	work_puma_2012	work_puma_2022	education	student	school_mode	school_type	school_freq	remote_class_freq	school_in_region	school_state	school_county	school_puma_2012	school_puma_2022	school_bg_2010	school_bg_2020	school_lon	school_lat	second_home	second_home_in_region	second_home_state	second_home_county	second_home_puma_2012	second_home_puma_2022	second_home_bg_2010	second_home_bg_2020	second_home_lon	second_home_lat	can_drive	vehicle	transit_freq	tnc_freq	bike_freq	vanpool_freq	bikeshare_freq	scootshare_freq	walk_freq	transit_pass	disability	participate	barriers_1	barriers_10	barriers_2	barriers_3	barriers_4	barriers_5	barriers_6	barriers_7	barriers_8	barriers_9	barriers_997	barriers_999	barriers_other	bicycle_other	bicycle_type_1	bicycle_type_2	bicycle_type_997	bike_comfort_lane	bike_comfort_local	bike_comfort_major	bike_comfort_minor	bike_comfort_neighborhood	bike_comfort_paths	bike_comfort_street	bike_comfort_striped	bike_factors_1	bike_factors_10	bike_factors_11	bike_factors_12	bike_factors_2	bike_factors_3	bike_factors_4	bike_factors_5	bike_factors_6	bike_factors_7	bike_factors_8	bike_factors_9	bike_factors_other	bike_purpose_1	bike_purpose_2	bike_purpose_3	bike_purpose_4	bike_purpose_5	bike_purpose_6	bike_purpose_7	bike_purpose_8	bike_purpose_other	bike_safety_1	bike_safety_2	bike_safety_3	bike_safety_4	bike_safety_5	bike_safety_6	bike_safety_7	bike_safety_8	bike_safety_other	bike_store_1	bike_store_2	bike_store_3	bike_store_4	bike_store_5	bike_store_6	bike_store_7	bike_store_997	carshare_freq	commute_days_1	commute_days_2	commute_days_3	commute_days_4	commute_days_5	commute_days_6	commute_days_7	commute_days_996	commute_subsidy_1	commute_subsidy_10	commute_subsidy_11	commute_subsidy_12	commute_subsidy_13	commute_subsidy_14	commute_subsidy_2	commute_subsidy_3	commute_subsidy_4	commute_subsidy_5	commute_subsidy_6	commute_subsidy_7	commute_subsidy_8	commute_subsidy_9	commute_subsidy_996	commute_subsidy_998	commute_subsidy_use_1	commute_subsidy_use_10	commute_subsidy_use_11	commute_subsidy_use_12	commute_subsidy_use_13	commute_subsidy_use_14	commute_subsidy_use_2	commute_subsidy_use_3	commute_subsidy_use_4	commute_subsidy_use_5	commute_subsidy_use_6	commute_subsidy_use_7	commute_subsidy_use_8	commute_subsidy_use_9	commute_subsidy_use_996	ethnicity_1	ethnicity_2	ethnicity_3	ethnicity_4	ethnicity_997	ethnicity_999	ev_subsidies	ev_typical_charge_1	ev_typical_charge_2	ev_typical_charge_3	ev_typical_charge_4	ev_typical_charge_5	ev_typical_charge_6	ev_typical_charge_997	home_vehicle_park	home_vehicle_park_pay	home_vehicle_park_permit	micromobility_devices_1	micromobility_devices_2	micromobility_devices_3	micromobility_devices_996	micromobility_devices_997	num_bicycles	peerrent_freq	race_1	race_2	race_3	race_4	race_5	race_997	race_999	share_2	share_3	share_4	share_5	share_6	share_7	share_996	telework_days_1	telework_days_2	telework_days_3	telework_days_4	telework_days_5	telework_days_6	telework_days_7	telework_days_996	telework_freq_pre_covid	transit_factors_1	transit_factors_10	transit_factors_11	transit_factors_12	transit_factors_2	transit_factors_3	transit_factors_4	transit_factors_5	transit_factors_6	transit_factors_7	transit_factors_8	transit_factors_9	transit_factors_other	transit_purpose_1	transit_purpose_2	transit_purpose_3	transit_purpose_4	transit_purpose_5	transit_purpose_6	transit_purpose_7	transit_purpose_other	walk_purpose_1	walk_purpose_2	walk_purpose_3	walk_purpose_4	walk_purpose_5	walk_purpose_6	walk_purpose_7	walk_purpose_8	walk_purpose_other	why_no_bike_1	why_no_bike_2	why_no_bike_3	why_no_bike_4	why_no_bike_5	why_no_bike_6	why_no_bike_7	why_no_bike_8	person_type	person_weight	race_imputed	ethnicity_imputed	gender_imputed	person_weight_tue	person_weight_fri	person_weight_mon	person_weight_sat	person_weight_sun	person_weight_thu	person_weight_wed	hh_is_complete.y	household_income	household_vehicles
Person Records Joined to Household Variables
2400008901	1	24000089	1	1	0	0	1	1	1	1	1	2	0	8	1	NA	NA	5	995	995	995	NA	NA	995	NA	NA	NA	NA	NA	NA	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	6	4	7	8	995	995	995	4	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	1	0	0	3	1	4	3	2	1	2	3	1	0	0	0	1	1	0	0	0	0	0	0	NA	995	995	995	995	995	995	995	995	NA	0	1	0	0	1	0	0	0	NA	1	0	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	4	995	995	995	995	995	995	995	1	0	995	0	0	0	1	0	1	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	0	0	0	0	1	1	0	1	1	0	0	0	NA	0	1	0	0	0	0	1	cultural events	0	1	0	1	0	0	0	1	NA	0	0	0	0	0	1	0	0	4	157.42583	white	not_hispanic	female	187.88768	NA	NA	NA	NA	NA	NA	1	999	2
2400008902	2	24000089	1	1	0	0	1	1	1	1	1	5	1	9	2	NA	NA	5	995	995	995	NA	NA	995	NA	NA	NA	NA	NA	NA	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	7	8	8	996	995	995	995	5	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	1	0	0	0	0	0	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	0	0	0	0	1	1	1	1	1	0	0	0	NA	995	995	995	995	995	995	995	NA	0	0	0	1	0	0	0	1	NA	0	0	0	0	0	0	0	1	4	157.42583	white	not_hispanic	male	187.88768	NA	NA	NA	NA	NA	NA	1	999	2
2400012201	1	24000122	1	1	0	0	1	2	1	1	7	39	0	5	1	NA	NA	5	995	995	995	NA	NA	995	NA	NA	NA	NA	NA	NA	7	0	1	13	4	996	1	25	25017	03400	00613	250173736002	250173736002	-71.16924	42.33609	0	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	5	8	8	995	995	995	1	1	0	1	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	1	0	0	2	1	4	4	2	1	3	3	1	0	0	0	1	0	0	0	0	1	0	0	NA	995	995	995	995	995	995	995	995	NA	1	1	1	1	0	0	0	0	NA	1	0	0	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	0	0	0	1	0	1	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	995	995	995	995	995	995	995	995	3	1	0	0	0	1	0	1	0	0	0	0	0	NA	1	1	0	1	0	0	0	NA	1	1	0	1	1	0	0	1	NA	0	0	0	1	0	1	0	0	3	103.34663	white	not_hispanic	female	59.87428	75.95139	65.09989	76.29594	74.33908	64.49532	58.57788	1	4	0
2400014001	1	24000140	1	1	0	0	1	995	1	1	7	45	0	9	1	NA	NA	1	104	5	1	-71.05214	42.35606	1	25	25025	250250701018	250250701042	03302	00802	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	6	4	8	996	995	995	995	2	1	0	1	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	1	2	1	1	2	2	2	1	0	0	0	1	0	0	0	0	0	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	1	0	0	0	0	0	5	995	995	995	995	995	995	995	3	0	1	0	0	0	1	0	0	995	0	0	0	0	1	0	0	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0	8	0	0	0	1	0	0	0	0	0	0	0	0	NA	0	0	0	0	1	0	0	NA	0	0	0	0	0	0	0	1	NA	1	0	0	0	0	0	0	0	1	188.21553	white	not_hispanic	female	458.44733	655.71500	546.06550	651.46690	640.43646	489.71295	463.14780	1	9	1
2400015802	2	24000158	1	1	0	0	1	2	1	1	1	2	2	5	2	NA	NA	1	100	1	1	-71.79986	42.26745	1	25	25027	250277317001	250277317002	00300	00505	7	2	995	995	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	6	9	995	8	995	995	995	5	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	1	0	0	0	0	0	1	0	1	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	996	0	0	0	0	0	0	1	0	0	0	0	0	NA	995	995	995	995	995	995	995	NA	0	0	0	0	0	0	0	1	NA	1	0	0	0	0	0	0	0	1	50.69754	white	not_hispanic	male	216.88728	NA	NA	NA	NA	NA	NA	1	999	4
2400015803	3	24000158	1	1	0	0	1	2	1	1	1	4	2	4	2	NA	NA	2	100	5	2	-72.67345	41.76257	0	09	09003	090035021002	090035021001	00302	20201	6	3	995	13	995	2	995	NA	NA	NA	NA	NA	NA	NA	NA	0	995	NA	NA	NA	NA	NA	NA	NA	NA	1	9	9	995	996	995	995	995	5	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	NA	NA	995	995	995	NA	NA	NA	NA	NA	NA	NA	NA	0	0	0	0	0	0	0	0	1	0	0	0	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	NA	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	1	0	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	8	0	0	0	0	0	0	0	0	1	0	0	0	NA	995	995	995	995	995	995	995	NA	0	0	0	0	0	0	0	1	NA	1	0	0	0	0	0	0	0	3	50.69754	white	not_hispanic	male	216.88728	NA	NA	NA	NA	NA	NA	1	999	4

Table 40: Person records with household fields.

To join person characteristics to trips, build a person-level lookup first and then join it to the trip table.

person_join_fields <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::transmute(
    person_id,
    hh_id,
    person_age = age,
    person_gender = gender,
    person_employment = employment
  )

trip_with_person <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    person_join_fields,
    by = "person_id"
  )

Table 41 shows the trip table after person-level fields have been added.

Code

gt::gt(head(trip_with_person)) %>%
  gt::tab_header(title = "Trip Records Joined to Person Variables")

trip_id	day_id	trip_num	hh_id.x	first_travel_date	last_travel_date	person_id	travel_date	travel_dow	day_num	hh_day_complete	hh_is_complete	day_is_complete	trip_survey_complete	depart_time	depart_date	depart_dow	depart_hour	depart_minute	depart_seconds	arrive_time	arrive_date	arrive_dow	arrive_hour	arrive_minute	arrive_second	distance_meters	distance_miles	duration_seconds	duration_minutes	dwell_mins	speed_mph	speed_flag	o_in_region	o_state	o_county	o_puma_2012	o_puma_2022	o_bg_2010	o_bg_2020	o_lon	o_lat	d_in_region	d_state	d_county	d_puma_2012	d_puma_2022	d_bg_2010	d_bg_2020	d_lon	d_lat	mode_type	mode_1	mode_2	mode_3	mode_4	mode_other_specify	transit_egress	transit_access	num_travelers	num_hh_travelers	num_non_hh_travelers	hh_member_1	hh_member_2	hh_member_3	hh_member_4	hh_member_5	hh_member_6	hh_member_7	hh_member_8	driver	ev_charge_station	ev_charge_station_decision	o_purpose_category_reported	o_purpose_category	o_purpose_reported	o_purpose	d_purpose_category_reported	d_purpose_reported	d_purpose_category	d_purpose	d_purpose_other	park_location	park_type	bike_park_loc	scooter_park_location	park_cost	taxi_cost	taxi_pay	taxi_type	tnc_type	transit_type	user_merged	user_split	user_deleted	added_type	copied_from_proxy	unlinked_split	split_loop	days_first_trip	days_last_trip	is_transit_leg	linked_trip_id	is_transit	is_access	is_egress	has_access	has_egress	transit_quality_flag	has_synthetic_access	has_synthetic_egress	added_trip	person_num	ev_charge_station_level_1	ev_charge_station_level_2	ev_charge_station_level_3	ev_charge_station_level_998	other_bicycle	vehicle_park_pay	distance_beeline	joint_trip_num	joint_trip_id	corrected_hh_members	imputed_record_type	imputed_host_trip	imputed_joint_trip	home_distance	linked_trip_num	tour_num	tour_id	trip_weight	trip_weight_tue	trip_weight_fri	trip_weight_mon	trip_weight_sat	trip_weight_sun	trip_weight_thu	trip_weight_wed	is_complete	hh_id.y	person_age	person_gender	person_employment
Trip Records Joined to Person Variables
2400008901001	240000890101	1	24000089	2024-06-11	2024-06-11	2400008901	2024-06-11	2	1	1	1	1	1	2024-06-11 18:20:00	2024-06-11	2	14	20	0	2024-06-11 18:27:00	2024-06-11	2	14	27	0	4710	2.9266656	420	7	95	25.085705	0	1	25	25009	00703	00706	250092174003	250092174023	-70.87968	42.54132	1	25	25009	00703	00706	250092176004	250092176012	-70.85492	42.57236	8	6	995	995	995	None	995	995	1	1	0	1	0	0	0	0	0	0	0	1	995	995	NA	1	NA	1	9	54	9	54	None	1	995	995	995	NA	NA	995	995	995	995	995	995	0	NA	0	0	0	1	0	0	2400008901010101	0	995	995	995	995	None	995	995	1	1	995	995	995	995	None	995	4007.6875	-1	-1	0	0	-1	0	2.9798525	1	1	24000089010101	247.1357	316.6946	NA	NA	NA	NA	NA	NA	1	24000089	8	1	5
2400008901002	240000890101	2	24000089	2024-06-11	2024-06-11	2400008901	2024-06-11	2	1	1	1	1	1	2024-06-11 20:02:00	2024-06-11	2	16	2	0	2024-06-11 20:15:00	2024-06-11	2	16	15	0	4730	2.9390930	780	13	645	13.565045	0	1	25	25009	00703	00706	250092176004	250092176012	-70.85492	42.57236	1	25	25009	00703	00706	250092174003	250092174023	-70.87968	42.54132	8	6	995	995	995	None	995	995	1	1	0	1	0	0	0	0	0	0	0	1	995	995	9	9	54	54	1	1	1	1	None	1	995	995	995	NA	NA	995	995	995	995	995	995	0	NA	0	0	0	0	1	0	2400008901010102	0	995	995	995	995	None	995	995	1	1	995	995	995	995	None	995	4007.6875	-1	-1	0	0	-1	0	0.0000000	2	1	24000089010101	247.1357	316.6946	NA	NA	NA	NA	NA	NA	1	24000089	8	1	5
2400008902001	240000890201	1	24000089	2024-06-11	2024-06-11	2400008902	2024-06-11	2	1	1	1	1	1	2024-06-11 12:00:00	2024-06-11	2	8	0	0	2024-06-11 12:05:00	2024-06-11	2	8	5	0	666	0.4138342	300	5	75	4.966011	0	1	25	25009	00703	00706	250092174003	250092174023	-70.87968	42.54132	1	25	25009	00703	00706	250092174004	250092174022	-70.88607	42.54144	8	7	995	995	995	None	995	995	1	1	0	0	1	0	0	0	0	0	0	1	995	995	NA	1	NA	1	8	50	8	50	None	4	1	995	995	NA	NA	995	995	995	995	995	995	0	NA	0	0	0	1	0	0	2400008902010101	0	995	995	995	995	None	995	995	1	2	995	995	995	995	None	995	524.2716	-1	-1	0	0	-1	0	0.7113409	1	1	24000089020101	247.1357	316.6946	NA	NA	NA	NA	NA	NA	1	24000089	9	2	5
2400008902002	240000890201	2	24000089	2024-06-11	2024-06-11	2400008902	2024-06-11	2	1	1	1	1	1	2024-06-11 13:20:00	2024-06-11	2	9	20	0	2024-06-11 13:25:00	2024-06-11	2	9	25	0	1025	0.6369071	300	5	15	7.642885	0	1	25	25009	00703	00706	250092174004	250092174022	-70.88607	42.54144	1	25	25009	00703	00706	250092174002	250092174013	-70.88079	42.54626	8	7	995	995	995	None	995	995	1	1	0	0	1	0	0	0	0	0	0	1	995	995	8	8	50	50	10	33	10	33	None	4	1	995	995	NA	NA	995	995	995	995	995	995	0	NA	0	0	0	0	0	0	2400008902010102	0	995	995	995	995	None	995	995	1	2	995	995	995	995	None	995	689.5092	-1	-1	0	0	-1	0	0.2184301	2	1	24000089020101	247.1357	316.6946	NA	NA	NA	NA	NA	NA	1	24000089	9	2	5
2400008902003	240000890201	3	24000089	2024-06-11	2024-06-11	2400008902	2024-06-11	2	1	1	1	1	1	2024-06-11 13:40:00	2024-06-11	2	9	40	0	2024-06-11 13:45:00	2024-06-11	2	9	45	0	1084	0.6735680	300	5	95	8.082817	0	1	25	25009	00703	00706	250092174002	250092174013	-70.88079	42.54626	1	25	25009	00703	00706	250092174003	250092174023	-70.87968	42.54132	8	7	995	995	995	None	995	995	1	1	0	0	1	0	0	0	0	0	0	1	995	995	10	10	33	33	1	1	1	1	None	1	995	995	995	NA	NA	995	995	995	995	995	995	0	NA	0	0	0	0	0	0	2400008902010103	0	995	995	995	995	None	995	995	1	2	995	995	995	995	None	995	557.4029	-1	-1	0	0	-1	0	0.0000000	3	1	24000089020101	247.1357	316.6946	NA	NA	NA	NA	NA	NA	1	24000089	9	2	5
2400008902004	240000890201	4	24000089	2024-06-11	2024-06-11	2400008902	2024-06-11	2	1	1	1	1	1	2024-06-11 15:20:00	2024-06-11	2	11	20	0	2024-06-11 15:30:00	2024-06-11	2	11	30	0	1782	1.1072862	600	10	35	6.643717	0	1	25	25009	00703	00706	250092174003	250092174023	-70.87968	42.54132	1	25	25009	00703	00706	250092173003	250092173003	-70.87901	42.55242	8	7	995	995	995	None	995	995	1	1	0	0	1	0	0	0	0	0	0	1	995	995	1	1	1	1	10	37	10	37	None	4	1	995	995	NA	NA	995	995	995	995	995	995	0	NA	0	0	0	0	0	0	2400008902010201	0	995	995	995	995	None	995	995	1	2	995	995	995	995	None	995	1236.8675	-1	-1	0	0	-1	0	0.4115587	1	2	24000089020102	247.1357	316.6946	NA	NA	NA	NA	NA	NA	1	24000089	9	2	5

Table 41: Trip records with person fields.

If the dataset includes separate linked and unlinked trip tables, choose the trip table that matches the analysis question before joining. Linked trips are usually the better starting point for origin-destination, purpose, and whole-trip analyses. Unlinked trips are more appropriate when the question depends on leg detail, transfer behavior, or segment-level mode information.

If tours are included, they summarize groups of trips into larger travel patterns. Tour joins should be approached cautiously because the same household, person, day, or trip can appear in multiple downstream analytic datasets depending on the question being asked.

11.2 Join Cautions

Always choose the analytic unit before joining tables. A join can change the number of rows in an analysis dataset if the relationship between tables is one-to-many or many-to-many.

For example:

Joining household data to person data creates one row per person, not one row per household.
Joining person data to trip data creates one row per trip, not one row per person.
Joining trip data to location data may duplicate trip records if multiple locations are associated with one trip or household.
Joining lower-level tables back to higher-level tables can change the interpretation of counts and rates.
Shared travel fields can create the appearance of duplicated movements because one physical trip may be represented on multiple person-trip records.

Before calculating summaries, check that the resulting row count still matches the intended analytic unit.

trip_row_count_before <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  nrow()

person_join_fields <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(person_id, hh_id)

trip_with_person <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    person_join_fields,
    by = "person_id"
  )

trip_row_count_after <- nrow(trip_with_person)

row_count_check <- data.frame(
  metric = c("Before join", "After join", "Difference"),
  n = c(
    trip_row_count_before,
    trip_row_count_after,
    trip_row_count_after - trip_row_count_before
  ),
  stringsAsFactors = FALSE
)

Table 42 helps confirm that the join preserved the trip-level analytic unit.

Code

gt::gt(row_count_check) %>%
  gt::fmt_number(columns = n, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    metric = "Metric",
    n = "Rows"
  )

Metric	Rows
Before join	411,573
After join	411,573
Difference	0

Table 42: Trip row counts before and after the join.

If the row count changes unexpectedly, review the join keys and confirm that the joined table has the intended unit of observation. A row-count check is often the quickest way to catch a mistaken join before it affects a summary table or chart.

For MassDOT, it is often worth checking both row preservation and universe preservation: after the join, confirm that the analysis file still contains only records from complete households unless the question intentionally includes incomplete households.

11.3 Joining Trip and Vehicle Tables

When the selected DUG trip table preserves household vehicle numbering in the detailed mode fields, reshape those mode fields to long format, extract the household vehicle number, and then join to the vehicle table.

mode_value_labels <- codebook$value_labels %>%
  dplyr::filter(
    grepl("^mode_[0-9]+$", variable),
    variable %in% trip_vehicle_mode_columns
  ) %>%
  dplyr::transmute(
    mode_num = variable,
    mode_value = as.character(value),
    mode_label = label
  )

vehicle_trips_long <- hts$trip_unlinked %>%
  dplyr::select(
    hh_id,
    person_id,
    day_id,
    trip_id,
    dplyr::all_of(trip_vehicle_mode_columns)
  ) %>%
  tidyr::pivot_longer(
    cols = dplyr::all_of(trip_vehicle_mode_columns),
    names_to = "mode_num",
    values_to = "mode_value"
  ) %>%
  dplyr::mutate(mode_value = as.character(mode_value)) %>%
  dplyr::left_join(
    mode_value_labels,
    by = c("mode_num", "mode_value")
  ) %>%
  dplyr::mutate(
    vehicle_num = ifelse(
      grepl("Household vehicle [0-9]+", mode_label),
      as.integer(stringr::str_extract(mode_label, "[0-9]+")),
      NA_integer_
    )
  ) %>%
  dplyr::left_join(
    hts$vehicle %>%
      dplyr::select(hh_id, vehicle_num, vehicle_id),
    by = c("hh_id", "vehicle_num")
  ) %>%
  dplyr::filter(!is.na(vehicle_id))

Table 43 shows the trip and vehicle records after the long-format reshape and join steps are complete.

Code

gt::gt(head(vehicle_trips_long)) %>%
  gt::tab_header(title = "Trip-to-Vehicle Linkage Preview")

hh_id	person_id	day_id	trip_id	mode_num	mode_value	mode_label	vehicle_num	vehicle_id
Trip-to-Vehicle Linkage Preview
24000089	2400008901	240000890101	2400008901001	mode_1	6	Household vehicle 1	1	2400008901
24000089	2400008901	240000890101	2400008901002	mode_1	6	Household vehicle 1	1	2400008901
24000089	2400008902	240000890201	2400008902001	mode_1	7	Household vehicle 2	2	2400008902
24000089	2400008902	240000890201	2400008902002	mode_1	7	Household vehicle 2	2	2400008902
24000089	2400008902	240000890201	2400008902003	mode_1	7	Household vehicle 2	2	2400008902
24000089	2400008902	240000890201	2400008902004	mode_1	7	Household vehicle 2	2	2400008902

Table 43: Trip-to-vehicle linkage preview.

11.4 Recommended Practice

For most analyses:

Start with the table that already represents the desired analytic unit.
Join only the fields needed from other tables.
Check row counts before and after joins.
Match the weight to the analytic unit after the joins are complete.
Use the codebook to confirm variable definitions and value labels.

12 Choosing the Right Analytic Unit

Section 6 describes the structure of households, persons, days, trips, linked trips, tours, and vehicles. This section shifts from structure to practice: how do you choose the correct analytic unit for the question you want to answer? Most analyses in an HTS fail not because of weighting errors but because the wrong table was chosen as the starting point. These examples assume the prepared tables are already available in hts.

Choosing the analytic unit is the first design decision in any analysis. The correct unit aligns with three things:

Who or what is being measured? (a household, a person, a person-day, a trip, a tour, or a vehicle)
What the variable conceptually describes (a household attribute, a person characteristic, a daily behavior, a movement, or a chain of movements)
At what level the population is represented in sampling weights

Table 44 connects each analytic unit to its best use cases, without repeating definitions already given in the Dataset Overview.

Analytic Unit	Starting Table	Typical Use
Household	hh	Household characteristics and household-level summaries
Person	person	Demographics, employment, student status, and person-level summaries
Person-day	day	Daily behavior, zero-trip days, deliveries, and trip-rate denominators
Linked trip	trip_linked	Origin-destination, purpose, and whole-trip summaries
Unlinked trip	trip_unlinked	Leg-level mode detail, transfers, and segment-level summaries
Tour	tour	Tour-pattern analysis across linked travel chains
Vehicle	vehicle	Household fleet summaries and vehicle characteristics

Table 44: Prepared tables and common uses by analytic unit.

12.1 Household-Level Analyses

Use households as the analytic unit when the phenomenon is shared or decided collectively: income, vehicle fleet, home location, delivery behavior, household makeup, or whether a household has zero vehicles. Even if a household variable is influenced by individual people, the household is still the right level because sampling occurred at the household level.

Example question: “What is the average household income in the study area?”

12.2 Person-Level Analyses

Analyses about people, such as demographics, employment status, student status, or attitudinal questions, belong at the person level. Each person’s weight represents them in the population. Use day or trip tables only when the metric you want to measure exists at those levels.

Example question: “What percentage of people in the study area are employed?”

12.3 Day-Level Analyses

Use person-days when studying daily behavior: trip rates, telework frequency, deliveries, or analyses that depend on people who made zero trips. The day table keeps all sampled days in scope, not only days with trips.

Example question: “What is the average number of trips per person-day?”

12.4 Trip-Level Analyses

Most movement-based analyses start with trips. Linked trips are usually the better starting point for origin-destination, purpose, and whole-trip summaries. Unlinked trips are appropriate for leg-level mode detail, transfer behavior, or segment-specific metrics.

Example question: “What is the average trip distance for work trips?”

12.5 Tour-Level Analyses

Use tours when the analysis focuses on full activity patterns or concepts aligned with activity-based modeling: stop-making, work subtours, escorting, home-based versus non-home-based travel, or mode hierarchy across a chain of trips.

Example question: “What percentage of tours include a stop at a school?”

12.6 Vehicle-Level Analyses

The vehicle table is the correct unit for vehicle fleet summaries, EV prevalence, fuel-type distributions, household fleet size, or daily mileage when paired with trip data. Vehicles belong to households, so vehicle analyses usually rely on household weights.

Example question: “What is the average daily mileage for electric vehicles?”

13 Working with Variables

13.1 Categorical Response Data

The majority of data collected in an HTS are categorical variables, where respondents select from a predefined list of options. These appear as:

Single-response categorical variables (SRCVs) where respondents select one option from a predefined list
Multiple-response categorical variables (MRCVs) where respondents can select multiple options
Grouped categorical variables, sometimes called “question batteries,” stored as sets of related indicator columns
Count variables with top-coding where the highest category is open ended (e.g., “5 or more vehicles”)

General Considerations

When working with categorical response data, keep the following best practices in mind:

Start with the codebook. Confirm variable definitions, valid values, table membership, and skip logic there before opening the questionnaire for extra context.
Do not mistake missing for no. Only recode missing to zero when the respondent was logically not asked the question. For example, if a respondent was not asked about how long they teleworked on a given day because they are not employed, then it is appropriate to recode missing telework duration to zero. However, if a respondent was asked about telework duration but did not answer, then the missing value should be retained as missing rather than recoded to zero.

Single-Response Categorical Variables (SRCVs)

Single-response categorical variables are variables where respondents select one option from a predefined list. Examples include gender, employment status, or broad income category.

For example, the household income variable income_broad can be labeled using the codebook before it is summarized.

income_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "income_broad") %>%
  dplyr::transmute(
    income_code = value,
    income_key = as.character(value),
    income_broad_label = label
  )

hh_income <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::mutate(income_key = as.character(income_broad)) %>%
  dplyr::left_join(
    income_value_labels %>%
      dplyr::select(income_key, income_broad_label),
    by = "income_key"
  ) %>%
  dplyr::mutate(
    income_broad_label = factor(
      income_broad_label,
      levels = income_value_labels$income_broad_label
    )
  )

hh_income_counts <- hh_income %>%
  dplyr::group_by(income_broad_label) %>%
  dplyr::summarize(n = dplyr::n(), .groups = "drop")

After joining the value labels, group the labeled variable to produce the counts used in the final table. Table 45 shows the resulting counts of households by broad income category.

Code

gt::gt(hh_income_counts) %>%
  gt::fmt_number(columns = n, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    income_broad_label = "Household Income",
    n = "Households"
  ) %>%
  gt::tab_header(title = "Household Counts by Income Category")

Household Income	Households
Household Counts by Income Category
Under $25,000	1,855
$25,000-$49,999	1,892
$50,000-$74,999	1,963
$75,000-$99,999	2,028
$100,000-$199,999	4,210
$200,000 or more	2,170
Prefer not to answer	1,523

Table 45: Household counts by income category.

Multiple-Response Categorical Variables (MRCVs)

Multiple-response variables are often delivered as groups of checkbox columns. When checkbox-style variables are present, reshaping them to long format is usually the clearest way to count selections and label the results.

delivery_checkbox_regex <- "^delivery_"

delivery_variable_list <- codebook$variable_list %>%
  dplyr::filter(
    day == 1,
    is_checkbox == 1,
    stringr::str_detect(variable, delivery_checkbox_regex)
  ) %>%
  dplyr::select(variable, description)

delivery_none_of_above <- "delivery_996"

delivery_variables <- delivery_variable_list %>%
  dplyr::filter(variable != delivery_none_of_above) %>%
  dplyr::pull(variable)

delivery_descriptions <- delivery_variable_list %>%
  dplyr::filter(variable %in% delivery_variables)

delivery_long <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(day_id, dplyr::all_of(delivery_variables)) %>%
  tidyr::pivot_longer(
    cols = dplyr::all_of(delivery_variables),
    names_to = "variable",
    values_to = "selected"
  ) %>%
  dplyr::filter(selected == 1) %>%
  dplyr::left_join(
    delivery_descriptions,
    by = "variable"
  ) %>%
  dplyr::count(description, name = "n_days") %>%
  dplyr::arrange(dplyr::desc(n_days))

In MassDOT, these delivery questions are stored on the day table as checkbox columns named delivery_*. This example excludes delivery_996, which is the codebook field for “None of the above,” so the table focuses on positive delivery types.

Table 46 makes it easier to review how often each delivery type was selected across reported person-days.

Code

gt::gt(delivery_long) %>%
  gt::fmt_number(columns = n_days, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    description = "Delivery Variable",
    n_days = "Days"
  ) %>%
  gt::tab_header(title = "Delivery Checkboxes Selected Across Days")

Delivery Variable	Days
Delivery Checkboxes Selected Across Days
Type of delivery: Received packages at home (e.g., USPS, FedEx, UPS)	20,759
Type of delivery: Take-out/prepared food delivered to home	3,351
Type of delivery: Someone came to do work at home (e.g., babysitter, housecleaning, lawn)	1,995
Type of delivery: Groceries delivered to home	1,588
Type of delivery: Received packages at another location (e.g., Amazon Locker, package pick-up point)	1,012
Type of delivery: Other item delivered to home (e.g., appliance)	316
Type of delivery: Received personal packages at work	248

Table 46: Delivery checkbox counts.

Missing Categorical Data

Missing categorical data should be handled explicitly rather than silently dropped. The codebook labels are often the quickest way to confirm whether a value represents a valid response category, an inapplicable record, or a nonresponse code.

gender_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "gender") %>%
  dplyr::transmute(
    gender_key = as.character(value),
    label
  )

gender_counts <- hts$person %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::count(gender, name = "n") %>%
  dplyr::mutate(gender_key = as.character(gender)) %>%
  dplyr::left_join(
    gender_value_labels,
    by = "gender_key"
  ) %>%
  dplyr::select(-gender_key) %>%
  dplyr::arrange(gender)

Table 47 helps distinguish valid response codes from missing or special-case values.

Code

gt::gt(gender_counts) %>%
  gt::fmt_number(columns = n, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    gender = "Gender Code",
    n = "Persons",
    label = "Gender Label"
  ) %>%
  gt::tab_header(title = "Gender Counts with Codebook Labels")

Gender Code	Persons	Gender Label
Gender Counts with Codebook Labels
1	15,253	Female
2	13,165	Male
4	273	Non-binary
995	1,552	Missing Response
997	71	Other/prefer to self-describe
999	941	Prefer not to answer

Table 47: Gender counts with labels.

Count Variables with Top-Coding

Variables such as num_vehicles are often stored as integer-coded categories rather than unconstrained numeric counts. Treat them as categories unless the codebook clearly indicates that they are true numeric measures.

vehicle_count_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "num_vehicles") %>%
  dplyr::transmute(
    vehicle_count_code = value,
    vehicle_count_key = as.character(value),
    num_vehicles_label = label
  )

vehicle_count_distribution <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::transmute(vehicle_count_code = num_vehicles) %>%
  dplyr::count(vehicle_count_code, name = "n_households") %>%
  dplyr::mutate(vehicle_count_key = as.character(vehicle_count_code)) %>%
  dplyr::left_join(
    vehicle_count_labels %>%
      dplyr::select(vehicle_count_key, num_vehicles_label),
    by = "vehicle_count_key"
  ) %>%
  dplyr::select(-vehicle_count_key) %>%
  dplyr::arrange(vehicle_count_code)

Table 48 treats the coded vehicle-count variable as a set of labeled categories rather than a continuous measure.

Code

gt::gt(vehicle_count_distribution) %>%
  gt::fmt_number(columns = n_households, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    vehicle_count_code = "Vehicles",
    n_households = "Households",
    num_vehicles_label = "Vehicle Count Label"
  ) %>%
  gt::tab_header(title = "Household Vehicle Count Distribution")

Vehicles	Households	Vehicle Count Label
Household Vehicle Count Distribution
0	2,161	0 (no vehicles in my household)
1	7,122	1 vehicle
2	4,925	2 vehicles
3	1,066	3 vehicles
4	274	4 vehicles
5	68	5 vehicles
6	16	6 vehicles
7	4	7 vehicles
8	5	8 or more vehicles

Table 48: Household vehicle count categories.

13.2 Numeric Variables

The HTS dataset contains several numeric variables, such as trip distances, durations, and speeds. These variables can include extreme values or outliers that affect analysis results, so it is good practice to inspect definitions and distributions before calculating summaries.

For most analytic examples in this chapter, the code filters to complete households first. If your goal is delivered-data quality assurance rather than population analysis, you may choose a broader universe.

Consult the Codebook First

Before analyzing any numeric variable, verify its meaning and units from the codebook. Many errors stem from assuming what a variable represents rather than confirming it.

distance_variables <- codebook$variable_list %>%
  dplyr::filter(
    stringr::str_detect(variable, "distance"),
    data_type %in% c("integer", "numeric")
  ) %>%
  dplyr::select(variable, description)

Review Table 49 before choosing which distance field to summarize.

Code

gt::gt(distance_variables) %>%
  gt::tab_header(title = "Distance Variables in the Codebook")

variable	description
Distance Variables in the Codebook
distance_meters	Distance (meters)
distance_beeline	Beeline distance (meters)
distance_miles	Distance (miles)
home_distance	Trip distance from home (miles)

Table 49: Distance variables in the codebook.

Inspect the Data

Before calculating any metric:

check for missing values
generate a quick summary()
inspect the distribution with a histogram or boxplot
review the minimum and maximum for plausibility

Start with a direct summary of the trip-distance field to understand its central tendency and range.

summary(
  hts$trip_unlinked %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(distance_miles)
)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
#>     0.000     0.680     2.225     8.577     6.004 12348.275     11747

Then visualize the distribution in Figure 10 so you can see the shape of the variable and the upper tail.

Code

trip_distance_plot_data <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(!is.na(distance_miles), distance_miles > 0)

trip_distance_histogram <- ggplot2::ggplot(
  trip_distance_plot_data,
  ggplot2::aes(x = distance_miles)
) +
  ggplot2::geom_histogram(bins = 40, fill = "#17384e", color = "white") +
  ggplot2::scale_x_log10(
    labels = scales::label_number(big.mark = ",")
  ) +
  ggplot2::labs(
    title = "Distribution of Trip Distance",
    subtitle = "Trip-distance histogram with a log-scaled distance axis",
    x = "Distance (miles, log scale)",
    y = "Trips"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    panel.grid.minor = ggplot2::element_blank()
  )

trip_distance_histogram

Handle Outliers

Outlier handling depends on the study context and the variable being analyzed. In many cases, the first step is simply to identify the upper tail before deciding whether trimming, filtering, or a different summary statistic is appropriate.

trip_distance_quantiles <- stats::quantile(
  hts$trip_unlinked %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(distance_miles),
  probs = c(0.5, 0.9, 0.95, 0.99),
  na.rm = TRUE
)

trip_distance_summary <- data.frame(
  statistic = names(trip_distance_quantiles),
  miles = as.numeric(trip_distance_quantiles),
  stringsAsFactors = FALSE
)

long_trips <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(distance_miles > trip_distance_quantiles[[4]]) %>%
  dplyr::transmute(
    trip_id,
    household_id = hh_id,
    person_id,
    distance_miles,
    duration_minutes
  )

Start by reviewing Table 50 to locate the upper tail of the distance distribution.

Code

gt::gt(trip_distance_summary) %>%
  gt::fmt_number(columns = miles, decimals = 2) %>%
  gt::cols_label(
    statistic = "Statistic",
    miles = "Miles"
  ) %>%
  gt::tab_header(title = "Trip Distance Quantiles")

Statistic	Miles
Trip Distance Quantiles
50%	2.22
90%	15.01
95%	24.69
99%	69.37

Table 50: Trip distance quantiles.

Then inspect Table 51 for a small sample of trips above the 99th percentile to see which records deserve follow-up.

Code

gt::gt(dplyr::slice_head(long_trips, n = 10)) %>%
  gt::fmt_number(
    columns = c(distance_miles, duration_minutes),
    decimals = 2
  ) %>%
  gt::tab_header(title = "Trips Above the 99th Percentile of Distance")

trip_id	household_id	person_id	distance_miles	duration_minutes
Trips Above the 99th Percentile of Distance
2400322201024	24003222	2400322201	87.77	216.00
2400322201026	24003222	2400322201	90.21	81.00
2400322202009	24003222	2400322202	90.21	81.00
2400346501001	24003465	2400346501	106.80	140.00
2400361001075	24003610	2400361001	73.16	79.00
2400453801032	24004538	2400453801	74.66	89.00
2400478201047	24004782	2400478201	1,189.56	243.00
2400478201048	24004782	2400478201	84.40	83.00
2400478201053	24004782	2400478201	1,129.97	0.00
2400478202036	24004782	2400478202	1,189.56	243.00

Table 51: Trips above the 99th percentile.

Example: Trip Speeds by Mode

Speed summaries are a good example of why both labels and outlier checks matter.

mode_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "mode") %>%
  dplyr::transmute(
    mode_code = value,
    mode_key = as.character(value),
    mode_label = label
  )

speed_by_mode <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(!is.na(speed_mph), speed_mph >= 0) %>%
  dplyr::mutate(mode_key = as.character(mode_type)) %>%
  dplyr::left_join(
    mode_value_labels %>%
      dplyr::select(mode_key, mode_label),
    by = "mode_key"
  ) %>%
  dplyr::group_by(mode_label) %>%
  dplyr::summarize(
    mean_speed_mph = mean(speed_mph, na.rm = TRUE),
    median_speed_mph = stats::median(speed_mph, na.rm = TRUE),
    n_trips = dplyr::n(),
    .groups = "drop"
  ) %>%
  dplyr::arrange(dplyr::desc(mean_speed_mph))

Table 52 combines codebook labels with simple speed summaries so that mode-level differences are easy to review.

Code

gt::gt(speed_by_mode) %>%
  gt::fmt_number(
    columns = c(mean_speed_mph, median_speed_mph),
    decimals = 1
  ) %>%
  gt::fmt_number(columns = n_trips, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    mode_label = "Mode",
    mean_speed_mph = "Mean Speed (mph)",
    median_speed_mph = "Median Speed (mph)",
    n_trips = "Trips"
  ) %>%
  gt::tab_header(title = "Trip Speeds by Mode")

Mode	Mean Speed (mph)	Median Speed (mph)	Trips
Trip Speeds by Mode
Long Distance Passenger	92,224.8	260.0	926
Shuttle / Vanpool	2,335.7	9.6	939
Car share	1,325.1	13.5	372
Missing Response	694.8	11.9	15,716
Ferry	649.5	12.3	265
Walk	385.4	2.5	84,494
Other	175.1	9.7	2,739
Car	130.8	19.7	264,558
TNC	111.9	13.5	3,337
Bike share	101.6	6.1	785
Transit	97.0	8.9	17,443
Bike	66.2	8.4	6,602
Taxi	40.7	12.6	325
School bus	11.6	7.9	1,319
Scooter share	6.2	6.2	6

Table 52: Trip speeds by mode.

13.3 Date and Time Data

Trip departure and arrival times are stored as a set of split date and time components rather than a single datetime object. This structure avoids timezone conversion issues when importing data across different software environments and makes it straightforward to work with time-of-day components (e.g., filtering by hour) without parsing a full timestamp.

Time Variables on the Trip Table

The following fields together define when each trip departed and arrived:

Variable	Type	Description
`travel_date`	Date (`YYYY-MM-DD`)	Diary-day date carried on the trip record.
`depart_date`	Date (`YYYY-MM-DD`)	Calendar date of departure.
`depart_hour`	Integer (0-23)	Hour of departure in 24-hour time, local to the study area.
`depart_minute`	Integer (0-59)	Minute of departure.
`depart_seconds`	Integer (0-59)	Second of departure.
`arrive_date`	Date (`YYYY-MM-DD`)	Calendar date of arrival. Will differ from `depart_date` for trips crossing midnight.
`arrive_hour`	Integer (0-23)	Hour of arrival in 24-hour time.
`arrive_minute`	Integer (0-59)	Minute of arrival.
`arrive_second`	Integer (0-59)	Second of arrival.

Timezone. When reconstructing timestamps in this guide, use the study timezone from settings.yml, which is America/New_York for this MassDOT delivery.

Travel day boundary. In the prepared MassDOT trip table, trips departing before 3:00 AM are attached to the previous diary day. For those records, travel_date reflects the diary day while depart_date reflects the wall-clock calendar date.

Cross-midnight trips. For trips that depart before midnight and arrive after midnight, arrive_date will be one calendar day later than depart_date. When reconstructing durations, always use the full datetime (date + time) rather than differencing hours alone.

Reconstructing Timestamps in R

When you need a full POSIXct timestamp, for example to calculate duration, filter by time window, or plot a time series, recombine the split fields explicitly using lubridate:

trip_with_datetime <- hts$trip_unlinked %>%
  dplyr::mutate(
    depart_datetime = lubridate::ymd_hms(
      paste(
        depart_date,
        sprintf("%02d:%02d:%02d", depart_hour, depart_minute, depart_seconds)
      ),
      tz = study_timezone
    ),
    arrive_datetime = lubridate::ymd_hms(
      paste(
        arrive_date,
        sprintf("%02d:%02d:%02d", arrive_hour, arrive_minute, arrive_second)
      ),
      tz = study_timezone
    )
  )

The preview below shows the reconstructed departure and arrival timestamps for the first few trip records.

Code

gt::gt(
  trip_with_datetime %>%
    dplyr::select(trip_id, depart_datetime, arrive_datetime) %>%
    utils::head()
) %>%
  gt::tab_header(title = "Reconstructed Trip Timestamps")

Table 53: Reconstructed trip timestamps.

Common Pitfalls

Using depart_hour alone for peak-period analysis is fine for most purposes, but for trips near the hour boundary (e.g., 7:58 AM), use the full datetime if precision matters.
Differencing arrive_hour - depart_hour will produce incorrect (negative) durations for cross-midnight trips. Always difference full datetimes or use the pre-calculated duration_minutes field instead.

14 Weights and Inference

14.1 Getting Started

Weights exist so that the sample can represent the target population. In household travel surveys, analysts usually work with household, person, day, trip, and sometimes tour weights. The correct weight depends on the final analytic unit, not just the first table that was opened.

For analyses that compare travel behavior across specific weekdays, use the day-of-week workflow described in Section 16. The standard household, person, day, and trip weights remain the default for overall average-day estimates.

For MassDOT, most weighted estimates should also be restricted to complete households. The examples below define that universe from hts$hh$is_complete and then carry it into lower-level files through hh_id.

Choosing the Right Weight

Begin by matching each analytic unit to the starting table and weight column used in the prepared data.

weight_lookup <- data.frame(
  analytic_unit = character(),
  starting_table = character(),
  weight_variable = character(),
  stringsAsFactors = FALSE
)

weight_lookup <- rbind(
  weight_lookup,
  data.frame(
    analytic_unit = "Household",
    starting_table = "hh",
    weight_variable = "hh_weight"
  )
)

weight_lookup <- rbind(
  weight_lookup,
  data.frame(
    analytic_unit = "Person",
    starting_table = "person",
    weight_variable = "person_weight"
  )
)

weight_lookup <- rbind(
  weight_lookup,
  data.frame(
    analytic_unit = "Person-day",
    starting_table = "day",
    weight_variable = "day_weight"
  )
)

if ("trip_linked" %in% names(hts)) {
  weight_lookup <- rbind(
    weight_lookup,
    data.frame(
      analytic_unit = "Linked trip",
      starting_table = "trip_linked",
      weight_variable = "trip_weight"
    )
  )
}

if ("trip_unlinked" %in% names(hts)) {
  weight_lookup <- rbind(
    weight_lookup,
    data.frame(
      analytic_unit = "Unlinked trip",
      starting_table = "trip_unlinked",
      weight_variable = "trip_weight"
    )
  )
}

if ("tour" %in% names(hts) && "tour_weight" %in% names(hts$tour)) {
  weight_lookup <- rbind(
    weight_lookup,
    data.frame(
      analytic_unit = "Tour",
      starting_table = "tour",
      weight_variable = "tour_weight"
    )
  )
}

Use Table 54 to confirm the correct weight before building any weighted estimate.

Code

gt::gt(weight_lookup) %>%
  gt::cols_label(
    analytic_unit = "Analytic Unit",
    starting_table = "Starting Table",
    weight_variable = "Weight Variable"
  ) %>%
  gt::tab_header(title = "Recommended Weight by Analytic Unit")

Analytic Unit	Starting Table	Weight Variable
Recommended Weight by Analytic Unit
Household	hh	hh_weight
Person	person	person_weight
Person-day	day	day_weight
Linked trip	trip_linked	trip_weight
Unlinked trip	trip_unlinked	trip_weight
Tour	tour	tour_weight

Table 54: Recommended weights by analytic unit.

Calculating Simple Weighted Estimates

Before moving to design-aware inference, it is useful to confirm the weighted numerator and denominator directly.

zero_vehicle_share <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::summarize(
    weighted_zero_vehicle_households = sum(
      hh_weight * (num_vehicles == 0),
      na.rm = TRUE
    ),
    weighted_households = sum(hh_weight, na.rm = TRUE),
    share_zero_vehicle = weighted_zero_vehicle_households / weighted_households
  )

Table 55 shows the numerator, denominator, and resulting share in one table.

Code

gt::gt(zero_vehicle_share) %>%
  gt::fmt_number(
    columns = c(weighted_zero_vehicle_households, weighted_households),
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::fmt_percent(columns = share_zero_vehicle, decimals = 1) %>%
  gt::tab_header(title = "Simple Weighted Share of Zero-Vehicle Households")

weighted_zero_vehicle_households	weighted_households	share_zero_vehicle
Simple Weighted Share of Zero-Vehicle Households
325,071	2,814,595	11.5%

Table 55: Weighted zero-vehicle household share.

14.2 Survey-Aware Methods for Inference

Simple weighted proportions are often enough for descriptive summaries, but they are not enough when you need valid standard errors, confidence intervals, or hypothesis tests. Those cases require a survey design object that respects clustering, stratification, and weights.

When Do You Need Survey-Aware Methods?

Use survey-aware methods when:

reporting standard errors or confidence intervals
comparing estimates across groups
fitting regression models
working with small subgroups where design effects matter
estimating totals, means, or proportions for publication or external reporting

Specifying the Survey Design

In Massachusetts Travel Study, the household is the primary sampling unit (PSU; see Section 2.1.2). Even when analyzing person-, day-, or trip-level records, observations remain clustered within households.

The examples below use:

hh_id as the PSU
the weight that matches the analytic unit
sample_segment as the design strata

Start by defining a household-level survey design object with the fields needed for the analysis.

hh_design <- hts$hh %>%
  dplyr::filter(is_complete == 1) %>%
  dplyr::transmute(
    hh_id,
    sample_segment,
    analysis_weight = hh_weight,
    vehicle_count = num_vehicles
  ) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(analysis_weight),
    analysis_weight > 0
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = analysis_weight
  )

For trip- or day-level analysis, join the design fields needed for clustering, stratification, and weights before building the survey object.

trip_design <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  dplyr::mutate(analysis_weight = trip_weight) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(analysis_weight),
    analysis_weight > 0
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = analysis_weight
  )

Set the lonely-PSU handling before running summaries that request standard errors or confidence intervals.

options(
  survey.lonely.psu = "adjust",
  srvyr.lonely.psu = "adjust"
)

Using the Survey Design for Weighted Estimates

Once the design object is defined, use survey_mean(), survey_total(), or survey_prop() instead of manually calculating standard errors.

hh_vehicle_summary <- hh_design %>%
  dplyr::group_by(vehicle_count) %>%
  dplyr::summarize(
    weighted_households = srvyr::survey_total(vartype = c("se", "ci")),
    weighted_share = srvyr::survey_prop(vartype = c("se", "ci")),
    .groups = "drop"
  )

Table 56 shows the weighted totals and shares with their uncertainty measures.

Code

gt::gt(hh_vehicle_summary) %>%
  gt::fmt_number(
    columns = c(weighted_households, weighted_households_se, weighted_households_low, weighted_households_upp),
    decimals = 1,
    sep_mark = ","
  ) %>%
  gt::fmt_percent(
    columns = c(weighted_share, weighted_share_se, weighted_share_low, weighted_share_upp),
    decimals = 1
  ) %>%
  gt::tab_header(title = "Weighted Household Vehicle Summary")

vehicle_count	weighted_households	weighted_households_se	weighted_households_low	weighted_households_upp	weighted_share	weighted_share_se	weighted_share_low	weighted_share_upp
Weighted Household Vehicle Summary
0	325,070.5	10,284.6	304,911.6	345,229.5	11.5%	0.4%	10.9%	12.3%
1	1,096,480.0	17,548.7	1,062,082.5	1,130,877.4	39.0%	0.6%	37.8%	40.1%
2	1,002,864.0	20,037.9	963,587.3	1,042,140.6	35.6%	0.6%	34.5%	36.8%
3	275,203.7	12,361.9	250,972.9	299,434.5	9.8%	0.4%	9.0%	10.6%
4	84,670.2	7,340.9	70,281.2	99,059.3	3.0%	0.3%	2.5%	3.6%
5	22,895.8	3,708.6	15,626.5	30,165.1	0.8%	0.1%	0.6%	1.1%
6	5,609.9	2,114.4	1,465.5	9,754.3	0.2%	0.1%	0.1%	0.4%
7	1,166.8	848.5	−496.3	2,829.8	0.0%	0.0%	0.0%	0.2%
8	634.5	343.3	−38.5	1,307.5	0.0%	0.0%	0.0%	0.1%

Table 56: Weighted household vehicle summary.

Filtering Data vs. Filtering the Survey Design

Filtering records before defining the survey design is not always the same as defining the survey design first and then subsetting it. The difference matters most when a subgroup should remain part of the original design rather than being treated as a new standalone sample.

For example, an adult-only trip analysis should define adulthood from the person table and then carry that flag into the trip-level survey design.

adult_age_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "age") %>%
  dplyr::transmute(
    age_key = as.character(value),
    age_label = sub("^Age\\s+", "", label)
  ) %>%
  dplyr::mutate(
    age_label = ifelse(age_label == "85 up", "85 or older", age_label)
  )

adult_trip_design <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::transmute(
        person_id,
        age_key = as.character(age)
      ) %>%
      dplyr::left_join(
        adult_age_labels,
        by = "age_key"
      ),
    by = "person_id"
  ) %>%
  dplyr::filter(
    age_label %in% c(
      "18-24",
      "25-34",
      "35-44",
      "45-54",
      "55-64",
      "65-74",
      "75-84",
      "85 or older"
    ),
    !is.na(trip_weight),
    trip_weight > 0
  ) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = trip_weight
  )

Calculating Estimate Reliability (RSE)

One simple reliability check is to compare the estimate to its own standard error using a relative standard error (RSE).

trip_mode_rse <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  dplyr::mutate(mode_type = as.character(mode_type)) %>%
  dplyr::left_join(
    codebook$value_labels %>%
      dplyr::filter(variable == "mode") %>%
      dplyr::transmute(
        analysis_mode_key = as.character(value),
        analysis_mode = label
      ),
    by = c("mode_type" = "analysis_mode_key")
  ) %>%
  dplyr::mutate(
    analysis_weight = trip_weight
  ) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(analysis_weight),
    !is.na(analysis_mode),
    analysis_weight > 0
  ) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = analysis_weight
  ) %>%
  dplyr::group_by(analysis_mode) %>%
  dplyr::summarize(
    share = srvyr::survey_prop(vartype = "se"),
    .groups = "drop"
  ) %>%
  dplyr::mutate(
    rse = ifelse(share > 0, share_se / share, NA_real_)
  )

Table 57 reports each estimate beside its standard error and relative standard error.

Code

gt::gt(trip_mode_rse) %>%
  gt::fmt_percent(columns = c(share, share_se, rse), decimals = 1) %>%
  gt::cols_label(
    analysis_mode = "Trip Mode",
    share = "Share",
    share_se = "Std. Error",
    rse = "RSE"
  ) %>%
  gt::tab_header(title = "Relative Standard Errors for Trip Mode Shares")

Trip Mode	Share	Std. Error	RSE
Relative Standard Errors for Trip Mode Shares
Bike	1.2%	0.1%	8.7%
Bike share	0.1%	0.0%	31.7%
Car	73.3%	0.5%	0.7%
Car share	0.1%	0.0%	40.3%
Ferry	0.1%	0.0%	26.8%
Long Distance Passenger	0.2%	0.0%	12.7%
Missing Response	2.3%	0.1%	4.8%
Other	0.9%	0.1%	9.4%
School bus	1.3%	0.1%	7.1%
Scooter share	0.0%	0.0%	49.7%
Shuttle / Vanpool	0.3%	0.0%	15.0%
TNC	0.9%	0.1%	10.4%
Taxi	0.2%	0.0%	19.0%
Transit	2.8%	0.1%	3.9%
Walk	16.4%	0.4%	2.4%

Table 57: Relative standard errors by trip mode.

Working with Small Sample Sizes

When estimates are unstable:

Broaden the reporting domain.
Collapse sparse categories where it is substantively reasonable.
Keep zero-valued days or households in the denominator when they are part of the analytic universe.
Consider model-based approaches rather than repeated subgroup slicing.
Report uncertainty clearly instead of presenting small-cell estimates as precise.

15 Common Travel Metrics

15.1 Mode Share

Example: Mode Share by Trip Geography

In the Massachusetts Travel Study dataset, trips that stay entirely within the study region are much shorter on average than trips with at least one end outside the region. Mixing these two populations in a single mode-share calculation can obscure differences that matter for regional planning.

The trip table includes two binary flags to separate these populations:

Trip Type	`o_in_region`	`d_in_region`
Fully within region	1	1
Leaving region	1	0
Entering region	0	1
Fully outside region	0	0

Fully outside-region trips are uncommon. For most regional planning applications, fully within-region trips are the primary analytic population, with cross-boundary trips treated separately.

Start by pulling mode labels from the codebook to use in the final output. In the Massachusetts Travel Study dataset, the trip column is mode_type, but the corresponding value labels are stored under mode in the codebook.

mode_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "mode") %>%
  dplyr::transmute(
    mode_key  = as.character(value),
    mode_label = label
  )

Next, classify each trip by its geographic pattern and join the mode labels.

trips_classified <- hts[[default_trip_table_name]] %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::filter(!is.na(o_in_region), !is.na(d_in_region)) %>%
  dplyr::mutate(
    trip_geography = dplyr::case_when(
      o_in_region == 1 & d_in_region == 1 ~ "Within region",
      TRUE                                 ~ "At least one end outside region"
    ),
    mode_key = as.character(mode_type)
  ) %>%
  dplyr::left_join(mode_value_labels, by = "mode_key") %>%
  dplyr::filter(!is.na(mode_label), !is.na(trip_weight), trip_weight > 0)

Then calculate weighted trip counts and mode shares within each geographic group.

mode_share_by_region <- trips_classified %>%
  dplyr::group_by(trip_geography, mode_label) %>%
  dplyr::summarize(
    wtd_trips = sum(trip_weight, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::group_by(trip_geography) %>%
  dplyr::mutate(mode_share = wtd_trips / sum(wtd_trips)) %>%
  dplyr::ungroup() %>%
  dplyr::arrange(trip_geography, dplyr::desc(mode_share))

The table below shows how mode choice differs between trips that stay within the region and those that cross its boundary.

Code

gt::gt(mode_share_by_region) %>%
  gt::fmt_percent(columns = mode_share, decimals = 1) %>%
  gt::fmt_number(columns = wtd_trips, decimals = 0, sep_mark = ",") %>%
  gt::cols_label(
    trip_geography = "Trip Geography",
    mode_label     = "Mode",
    wtd_trips      = "Weighted Trips",
    mode_share     = "Mode Share"
  ) %>%
  gt::tab_header(title = "Mode Share by Trip Geography") %>%
  gt::opt_row_striping()

Trip Geography	Mode	Weighted Trips	Mode Share
Mode Share by Trip Geography
At least one end outside region	Car	1,044,672	77.5%
At least one end outside region	Walk	149,343	11.1%
At least one end outside region	Long Distance Passenger	48,373	3.6%
At least one end outside region	TNC	27,346	2.0%
At least one end outside region	Transit	25,895	1.9%
At least one end outside region	Other	14,578	1.1%
At least one end outside region	Missing Response	12,135	0.9%
At least one end outside region	Shuttle / Vanpool	7,982	0.6%
At least one end outside region	Taxi	7,355	0.5%
At least one end outside region	Bike	5,215	0.4%
At least one end outside region	School bus	3,558	0.3%
At least one end outside region	Ferry	741	0.1%
At least one end outside region	Car share	363	0.0%
At least one end outside region	Scooter share	31	0.0%
At least one end outside region	Bike share	4	0.0%
Within region	Car	21,007,329	73.1%
Within region	Walk	4,769,101	16.6%
Within region	Transit	822,747	2.9%
Within region	Missing Response	685,980	2.4%
Within region	School bus	394,266	1.4%
Within region	Bike	341,726	1.2%
Within region	Other	256,593	0.9%
Within region	TNC	252,902	0.9%
Within region	Shuttle / Vanpool	70,780	0.2%
Within region	Taxi	46,813	0.2%
Within region	Bike share	30,984	0.1%
Within region	Car share	24,759	0.1%
Within region	Ferry	22,638	0.1%
Within region	Long Distance Passenger	3,005	0.0%
Within region	Scooter share	1,452	0.0%

Notes

NA flags indicate missing or unmatched coordinates; these records are excluded here and should be excluded from any analysis using o_in_region or d_in_region.
o_state / d_state provide finer detail for characterizing cross-border travel when you need more than a simple in-region / out-of-region split.
Reliability. Out-of-region trips are a smaller share of the sample, so their mode share estimates carry more uncertainty. Check reliability using the RSE approach in Section 14 before reporting.

15.2 Trip Rates

Understanding trip rates requires aligning the unit of analysis with the survey’s hierarchical structure and weighting design. This section introduces the recommended analytic units for trips and person-days, outlines how to calculate weighted trip rates correctly, and highlights key pitfalls to avoid.

For analyses that compare trip-making across specific weekdays, use the day-of-week workflow described in Section 16. The standard day and trip weights remain the default for overall average-day trip-rate estimates.

For MassDOT, these examples use complete households as the default analytic universe.

To calculate a weighted trip rate, divide the weighted count of trips by the weighted count of person-days. This approach keeps both travelers and non-travelers represented correctly.

When both linked and unlinked trip tables are available, linked trips are usually the better numerator for whole-trip rates. Unlinked trips are more appropriate when the analysis is explicitly about trip legs or segments.

Start by calculating the weighted numerator and denominator directly.

weighted_trip_rate <- sum(
  hts$trip_unlinked %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(trip_weight),
  na.rm = TRUE
) / sum(
  hts$day %>%
    dplyr::filter(hh_id %in% complete_hh_ids) %>%
    dplyr::pull(day_weight),
  na.rm = TRUE
)

Table 59 reports the resulting weighted trip rate.

Code

gt::gt(data.frame(weighted_trip_rate = weighted_trip_rate)) %>%
  gt::fmt_number(columns = weighted_trip_rate, decimals = 2) %>%
  gt::tab_header(title = "Weighted Trip Rate")

weighted_trip_rate
Weighted Trip Rate
4.45

Table 59: Weighted trip rate.

Why the Denominator (Household, Person, Day) Weights Matter

Trip rates depend on both the number of trips recorded and the number of diary days those trips came from. Without day weights, respondents who provide more usable diary days can exert disproportionate influence, and zero-trip days can drop out of the denominator.

Why Trip Weights Matter

Trip weights expand recorded trips to population-level trip totals. A correct trip-rate calculation therefore uses:

trip weights in the numerator
household-, person-, or day-level weights in the denominator, depending on the metric

Why Zero-Travel Days Matter

Even after correcting for nonresponse and trip underreporting, people who did not travel on a given day remain part of the analytic universe. Excluding zero-trip days overstates trip rates because the denominator omits valid person-days with no travel.

Constructing a Person-Day Trip Rate Dataset

A typical workflow begins by aggregating trips to the day level, joining that summary back to the day table, and filling in zeros for days without travel.

weighted_trips <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::group_by(day_id) %>%
  dplyr::summarize(
    weighted_trips = sum(trip_weight, na.rm = TRUE),
    .groups = "drop"
  )

day_trip_rates <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    weighted_trips,
    by = "day_id"
  ) %>%
  dplyr::mutate(
    weighted_trips = ifelse(is.na(weighted_trips), 0, weighted_trips),
    weighted_trips_per_day = weighted_trips / day_weight
  )

Table 60 shows the day-level dataset after weighted trips have been joined back to the person-day denominator.

Code

gt::gt(dplyr::slice_head(day_trip_rates, n = 10)) %>%
  gt::fmt_number(
    columns = c(day_weight, weighted_trips, weighted_trips_per_day),
    decimals = 2
  ) %>%
  gt::tab_header(title = "Day-Level Weighted Trip Rates")

day_id	person_id	travel_date	day_num	travel_dow	person_num	surveyable	is_participant	hh_id	travel_day	hh_day_complete	num_complete_trip_surveys	num_trips	is_complete	hh_is_complete	proxy_complete	begin_day	end_day	school_daily	telecommute_time	made_travel	num_reasons_no_travel	attend_school_1	attend_school_2	attend_school_3	attend_school_998	attend_school_999	attend_school_no_1	attend_school_no_2	attend_school_no_4	attend_school_no_5	attend_school_no_997	attend_school_no_998	attend_school_no_999	congestion	delivery_2	delivery_3	delivery_4	delivery_5	delivery_6	delivery_7	delivery_8	delivery_996	no_travel_1	no_travel_11	no_travel_12	no_travel_2	no_travel_3	no_travel_4	no_travel_5	no_travel_6	no_travel_7	no_travel_8	no_travel_9	no_travel_99	attend_school_no_3	daily_activity_pattern	day_weight	day_weight_tue	day_weight_fri	day_weight_mon	day_weight_sat	day_weight_sun	day_weight_thu	day_weight_wed	weighted_trips	weighted_trips_per_day
Day-Level Weighted Trip Rates
240000890101	2400008901	2024-06-11	1	2	1	1	1	24000089	1	1	2	2	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	1	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	2	157.43	187.88768	NA	NA	NA	NA	NA	NA	494.27	3.14
240000890201	2400008902	2024-06-11	1	2	2	1	1	24000089	1	1	5	5	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	995	2	157.43	187.88768	NA	NA	NA	NA	NA	NA	1,235.68	7.85
240001220101	2400012201	2024-06-11	1	2	1	1	1	24000122	1	1	5	7	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	995	995	995	995	995	995	995	995	995	995	995	995	995	1	25.84	59.87428	NA	NA	NA	NA	NA	NA	180.86	7.00
240001220102	2400012201	2024-06-12	2	3	1	1	1	24000122	1	1	5	5	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	1	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	1	25.84	NA	NA	NA	NA	NA	NA	58.57788	129.18	5.00
240001220103	2400012201	2024-06-13	3	4	1	1	1	24000122	1	1	5	5	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	995	995	995	995	995	995	995	995	995	995	995	995	995	2	25.84	NA	NA	NA	NA	NA	64.49532	NA	129.18	5.00
240001220104	2400012201	2024-06-14	4	5	1	1	1	24000122	1	1	2	2	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	995	995	995	995	995	995	995	995	995	995	995	995	995	3	NA	NA	75.95139	NA	NA	NA	NA	NA	0.00	NA
240001220105	2400012201	2024-06-15	5	6	1	1	1	24000122	1	1	8	10	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	1	0	0	0	0	995	995	995	995	995	995	995	995	995	995	995	995	995	2	NA	NA	NA	NA	76.29594	NA	NA	NA	0.00	NA
240001220106	2400012201	2024-06-16	6	7	1	1	1	24000122	1	1	6	6	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	995	995	995	995	995	995	995	995	995	995	995	995	995	2	NA	NA	NA	NA	NA	74.33908	NA	NA	0.00	NA
240001220107	2400012201	2024-06-17	7	1	1	1	1	24000122	1	1	4	4	1	1	995	1	1	995	NA	995	0	995	995	995	995	995	995	995	995	995	995	995	995	995	0	0	0	0	0	0	0	1	995	995	995	995	995	995	995	995	995	995	995	995	995	2	25.84	NA	NA	65.09989	NA	NA	NA	NA	103.35	4.00
240001400101	2400014001	2024-06-14	1	5	1	1	1	24000140	1	1	2	2	1	1	995	1	1	995	390	995	0	995	995	995	995	995	995	995	995	995	995	995	995	2	0	0	0	0	0	0	0	1	995	995	995	995	995	995	995	995	995	995	995	995	995	2	NA	NA	655.71500	NA	NA	NA	NA	NA	0.00	NA

Table 60: Day-level weighted trip rates.

15.3 Person-Miles Traveled (PMT) and Vehicle-Miles Traveled (VMT)

Analysis of person-miles and vehicle-miles traveled proceeds similarly to the analysis of trip rates, with some additional considerations for occupancy and drive-mode identification.

Calculating PMT

Because the trip table is a person-trip table, total person-miles traveled can be calculated by summing the product of trip distance and trip weight.

total_pmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::summarize(
    total_pmt = sum(
      distance_miles * trip_weight,
      na.rm = TRUE
    )
  )

Table 61 reports the weighted person-miles represented by the trip table.

Code

gt::gt(total_pmt) %>%
  gt::fmt_number(columns = total_pmt, decimals = 2) %>%
  gt::tab_header(title = "Total PMT")

total_pmt
Total PMT
294,281,801.73

Table 61: Total weighted PMT.

Calculating VMT

Vehicle-miles traveled require an occupancy adjustment. In the normalized fixtures, num_travelers is the most common starting point.

trip_mode_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "mode") %>%
  dplyr::transmute(
    mode_code = value,
    mode_key = as.character(value),
    mode_label = label
  )

total_vmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::mutate(
    mode_key = as.character(mode_type),
    occupancy = num_travelers
  ) %>%
  dplyr::left_join(
    trip_mode_labels %>%
      dplyr::select(mode_key, mode_label),
    by = "mode_key"
  ) %>%
  dplyr::filter(
    !is.na(occupancy),
    occupancy > 0,
    occupancy != 995,
    stringr::str_detect(
      mode_label,
      "Drive|Car|SOV|Hov|Motorcycle"
    )
  ) %>%
  dplyr::mutate(vmt = distance_miles / occupancy) %>%
  dplyr::summarize(
    total_vmt = sum(vmt * trip_weight, na.rm = TRUE)
  )

Table 62 reports the weighted VMT after filtering to drive-mode records and adjusting each trip by occupancy.

Code

gt::gt(total_vmt) %>%
  gt::fmt_number(columns = total_vmt, decimals = 2) %>%
  gt::tab_header(title = "Total VMT")

total_vmt
Total VMT
151,301,985.08

Table 62: Total weighted VMT.

Disaggregating PMT and VMT by Population Subgroups

To disaggregate PMT or VMT by population subgroup, aggregate the trip data to the day level first and then join the resulting day-level totals back to the day or household table.

day_trip_vmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::mutate(occupancy = num_travelers) %>%
  dplyr::filter(!is.na(occupancy), occupancy > 0, occupancy != 995) %>%
  dplyr::mutate(vmt = distance_miles / occupancy) %>%
  dplyr::group_by(day_id) %>%
  dplyr::summarize(
    total_wtd_vmt_on_day = sum(
      vmt * trip_weight,
      na.rm = TRUE
    ),
    .groups = "drop"
  )

day_trip_vmt <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::transmute(
    day_id,
    day_weight,
    household_id = hh_id
  ) %>%
  dplyr::left_join(
    day_trip_vmt,
    by = "day_id"
  ) %>%
  dplyr::mutate(
    total_wtd_vmt_on_day = ifelse(is.na(total_wtd_vmt_on_day), 0, total_wtd_vmt_on_day),
    wtd_vmt_per_day = total_wtd_vmt_on_day / day_weight
  )

Table 63 shows the day-level file that can be joined to other denominator tables for subgroup analysis.

Code

gt::gt(dplyr::slice_head(day_trip_vmt, n = 10)) %>%
  gt::fmt_number(
    columns = c(day_weight, total_wtd_vmt_on_day, wtd_vmt_per_day),
    decimals = 2
  ) %>%
  gt::tab_header(title = "Day-Level VMT Summary")

day_id	day_weight	household_id	total_wtd_vmt_on_day	wtd_vmt_per_day
Day-Level VMT Summary
240000890101	157.43	24000089	1,449.64	9.21
240000890201	157.43	24000089	973.44	6.18
240001220101	25.84	24000122	182.02	7.05
240001220102	25.84	24000122	90.33	3.50
240001220103	25.84	24000122	95.57	3.70
240001220104	NA	24000122	0.00	NA
240001220105	NA	24000122	0.00	NA
240001220106	NA	24000122	0.00	NA
240001220107	25.84	24000122	125.95	4.87
240001400101	NA	24000140	0.00	NA

Table 63: Day-level weighted VMT summary.

15.4 Identifying Work Days and Telework Status

MassDOT does not include a separate day-level work_time field, but it does include a day-level telework duration field. In the codebook, telecommute_time is defined as time spent teleworking on the travel day, so for this study it can be used directly as a diary-day telework measure rather than as a prior-week proxy.

To identify work days, join the day table to person employment status and then flag days with at least one work-purpose trip. For MassDOT, d_purpose_category == 2 is Work and d_purpose_category == 3 is Work related, so it is usually best to use code 2 for travel to a work location and keep Work related separate unless the analysis specifically needs broader job-related travel.

The preparation steps below build the worker-day file in a linear sequence so each analytic decision stays visible. This example focuses on respondents coded as employed full-time, employed part-time, or self-employed (employment values 1, 2, and 3).

Start by selecting the day-level telework field and joining employment status from the person table.

worker_day_status <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(day_id, hh_id, person_id, telecommute_time) %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::select(person_id, employment),
    by = "person_id"
  )

Then restrict the file to workers and add day-level work-trip flags from the default trip table.

worker_day_status <- worker_day_status %>%
  dplyr::filter(employment %in% c(1, 2, 3)) %>%
  dplyr::left_join(
    hts[[default_trip_table_name]] %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::group_by(day_id) %>%
      dplyr::summarize(
        has_work_trip = any(d_purpose_category == 2, na.rm = TRUE),
        has_work_related_trip = any(d_purpose_category == 3, na.rm = TRUE),
        .groups = "drop"
      ),
    by = "day_id"
  )

Next, derive the telework and work-day flags used in the classification table.

worker_day_status <- worker_day_status %>%
  dplyr::mutate(
    has_work_trip = ifelse(is.na(has_work_trip), FALSE, has_work_trip),
    has_work_related_trip = ifelse(is.na(has_work_related_trip), FALSE, has_work_related_trip),
    telework_min = telecommute_time,
    teleworked_any = dplyr::if_else(
      is.na(telework_min),
      NA,
      telework_min > 0
    ),
    telework_flag = dplyr::case_when(
      is.na(teleworked_any) ~ "Missing telework response",
      teleworked_any ~ "telecommute_time > 0",
      TRUE ~ "telecommute_time == 0"
    ),
    work_trip_flag = dplyr::if_else(
      has_work_trip,
      "Work trip present",
      "No work trip"
    ),
    work_day_type = dplyr::case_when(
      is.na(teleworked_any) & has_work_trip ~ "Missing telework response / work trip present",
      is.na(teleworked_any) & !has_work_trip ~ "Missing telework response / no work trip",
      teleworked_any & !has_work_trip ~ "Telework only",
      teleworked_any & has_work_trip ~ "Hybrid",
      !teleworked_any & has_work_trip ~ "In-person only",
      !teleworked_any & !has_work_trip ~ "Non-work day"
    )
  )

Finally, collapse the worker-day records into the cross-tab used in the handbook table.

worker_day_telework_crosstab <- worker_day_status %>%
  dplyr::count(telework_flag, work_trip_flag, name = "n_worker_days") %>%
  tidyr::pivot_wider(
    names_from = work_trip_flag,
    values_from = n_worker_days,
    values_fill = 0
  ) %>%
  dplyr::mutate(
    total = `No work trip` + `Work trip present`
  )

Table 64 shows the observed cross-tab for complete-household worker days in the prepared MassDOT data. These are unweighted record counts, so use them to understand coding patterns rather than as population estimates.

Code

gt::gt(worker_day_telework_crosstab) %>%
  gt::fmt_number(
    columns = c(`No work trip`, `Work trip present`, total),
    decimals = 0,
    sep_mark = ","
  ) %>%
  gt::cols_label(
    telework_flag = "Telework Status",
    `No work trip` = "No Work Trip",
    `Work trip present` = "Work Trip Present",
    total = "Total"
  ) %>%
  gt::tab_header(title = "Worker-Day Telework and Work-Trip Cross-Tab")

Telework Status	No Work Trip	Work Trip Present	Total
Worker-Day Telework and Work-Trip Cross-Tab
Missing telework response	1,088	199	1,287
telecommute_time == 0	20,225	13,171	33,396
telecommute_time > 0	16,434	5,690	22,124

Table 64: Telework minutes and work-trip presence across complete-household worker days.

This cross-tab maps cleanly to the common diary-day categories:

Telework only: telecommute_time > 0 and no work trip
Hybrid: telecommute_time > 0 and a work trip is present
In-person only: telecommute_time == 0 and a work trip is present
Non-work day: telecommute_time == 0 and no work trip

Keep missing telework responses separate rather than forcing them into 0. In the prepared MassDOT files, missing telecommute_time values are already stored as NA.

16 Day-of-Week Analysis

Use the alternate day-of-week weights when the question is explicitly about differences across Monday through Sunday. The standard weights remain the default for overall household, person, and average-day reporting; the alternate day-of-week weights are for day-specific person-day and trip analysis.

16.1 Trip Rates by Day of Week

The key pattern is:

reshape the weekday-specific day weights to long form
reshape the matching weekday-specific trip weights to long form
aggregate weighted trips to the day level
join those weighted trips back to the weighted person-day denominator
estimate weekday means with a survey design that uses the weekday-specific day weights

Use the prep step below to build the weekday-specific person-day analysis file before calculating the final estimates.

day_weight_lookup <- day_of_week_day_weight_columns %>%
  dplyr::select(weekday, weight_column)

trip_weight_lookup <- day_of_week_trip_weight_columns %>%
  dplyr::select(weekday, weight_column)

day_long <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::select(hh_id, sample_segment),
    by = "hh_id"
  ) %>%
  tidyr::pivot_longer(
    cols = day_weight_lookup$weight_column,
    names_to = "weight_column",
    values_to = "day_weight_dow"
  ) %>%
  dplyr::left_join(
    day_weight_lookup,
    by = "weight_column"
  ) %>%
  dplyr::mutate(
    weekday = factor(weekday, levels = day_of_week_weekday_order)
  ) %>%
  dplyr::filter(
    !is.na(sample_segment),
    !is.na(day_weight_dow),
    day_weight_dow > 0
  )

trip_long <- hts[[day_of_week_trip_table_name]] %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  tidyr::pivot_longer(
    cols = trip_weight_lookup$weight_column,
    names_to = "weight_column",
    values_to = "trip_weight_dow"
  ) %>%
  dplyr::left_join(
    trip_weight_lookup,
    by = "weight_column"
  ) %>%
  dplyr::mutate(
    weekday = factor(weekday, levels = day_of_week_weekday_order)
  ) %>%
  dplyr::filter(
    !is.na(trip_weight_dow),
    trip_weight_dow > 0
  )

weighted_trips_by_day <- trip_long %>%
  dplyr::group_by(day_id, weekday) %>%
  dplyr::summarize(
    weighted_trips = sum(trip_weight_dow, na.rm = TRUE),
    .groups = "drop"
  )

day_trip_rates_dow <- day_long %>%
  dplyr::left_join(
    weighted_trips_by_day,
    by = c("day_id", "weekday")
  ) %>%
  dplyr::mutate(
    weighted_trips = ifelse(is.na(weighted_trips), 0, weighted_trips),
    wtd_trips_on_day = weighted_trips / day_weight_dow
  )

trip_rate_by_weekday <- day_trip_rates_dow %>%
  dplyr::filter(!is.na(wtd_trips_on_day)) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = day_weight_dow
  ) %>%
  dplyr::group_by(weekday) %>%
  dplyr::summarize(
    trip_rate = srvyr::survey_mean(wtd_trips_on_day, vartype = "ci"),
    .groups = "drop"
  )

Table 65 shows the weekday-specific trip-rate estimates after the long-format weights and day-level totals have been assembled.

Code

gt::gt(trip_rate_by_weekday) %>%
  gt::fmt_number(
    columns = c(trip_rate, trip_rate_low, trip_rate_upp),
    decimals = 2
  ) %>%
  gt::cols_label(
    weekday = "Weekday",
    trip_rate = "Trip Rate",
    trip_rate_low = "CI Low",
    trip_rate_upp = "CI High"
  ) %>%
  gt::tab_header(title = "Trip Rates by Day of Week")

Weekday	Trip Rate	CI Low	CI High
Trip Rates by Day of Week
Monday	4.54	4.40	4.68
Tuesday	4.70	4.57	4.82
Wednesday	4.76	4.64	4.89
Thursday	4.74	4.62	4.87
Friday	4.98	4.82	5.13
Saturday	4.91	4.74	5.09
Sunday	4.07	3.92	4.23

Table 65: Trip rates by day of week.

This workflow keeps zero-trip days in the denominator, which is critical for valid person-day trip rates.

It also keeps the day-of-week estimates inside the complete-household analytic universe used for most MassDOT reporting.

16.2 Telework Rates by Day of Week

Use the same weekday-specific person-day design for telework participation. In the prepared MassDOT files, missing telecommute_time values are already stored as NA, and positive minutes indicate that some telework occurred on that day.

telework_rate_by_weekday <- day_long %>%
  dplyr::mutate(
    telework_min = telecommute_time,
    teleworked_any = dplyr::if_else(
      is.na(telework_min),
      NA,
      telework_min > 0
    )
  ) %>%
  dplyr::filter(!is.na(teleworked_any)) %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = day_weight_dow
  ) %>%
  dplyr::group_by(weekday) %>%
  dplyr::summarize(
    telework_rate = srvyr::survey_mean(teleworked_any, vartype = "ci"),
    .groups = "drop"
  )

Table 66 reports the weekday-specific telework participation rates from the same day-level survey design.

Code

gt::gt(telework_rate_by_weekday) %>%
  gt::fmt_percent(
    columns = c(telework_rate, telework_rate_low, telework_rate_upp),
    decimals = 1
  ) %>%
  gt::cols_label(
    weekday = "Weekday",
    telework_rate = "Telework Rate",
    telework_rate_low = "CI Low",
    telework_rate_upp = "CI High"
  ) %>%
  gt::tab_header(title = "Telework Rates by Day of Week")

Weekday	Telework Rate	CI Low	CI High
Telework Rates by Day of Week
Monday	43.5%	41.4%	45.6%
Tuesday	44.1%	42.2%	46.1%
Wednesday	43.9%	42.1%	45.8%
Thursday	43.2%	41.3%	45.1%
Friday	45.4%	43.1%	47.6%
Saturday	14.5%	12.7%	16.4%
Sunday	12.9%	11.1%	14.6%

Table 66: Telework rates by day of week.

When the goal is a single overall estimate for the study area, return to the standard average-day workflow in Section 15 and Section 14. Use the alternate day-of-week weights only when the day itself is part of the analytic question.

17 Advanced Analysis

17.1 From Description to Inference: Using Weighted Models

Simple weighted proportions, with accompanying standard errors or confidence intervals, are an excellent first tool for describing population patterns. However, there are many situations where weighted proportions alone are not sufficient for reliable inference. When subgroup sample sizes are small or design effects are large, analysts should use weighted multivariate models rather than relying solely on repeated subgroup tabulations.

Weighted models keep the full sample intact, improve statistical precision, and allow analysts to estimate the unique contribution of each factor while holding others constant. This approach avoids the instability that arises from slicing the data into many small subpopulations.

Using Survey Weights in Regression Models

Most analysts will work in R, Stata, SPSS, or SAS. Each platform provides dedicated tools for fitting regression models that correctly incorporate survey weights, clustering, and stratification. Across platforms, the key principle is the same: define the survey design once, then fit models using functions that respect the sampling structure to obtain valid, population-representative inferences.

Does Telework Reduce VMT?

One common example is a survey-weighted regression that estimates daily VMT as a function of telework status while controlling for household and person characteristics.

For MassDOT, the model example below begins from complete households so the day-level outcome and predictors reflect the same household-complete analytic universe used elsewhere in the guide.

Start by aggregating trip-level VMT to the diary-day level so the outcome matches the day-level telework measure.

day_trip_vmt <- hts$trip_unlinked %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::mutate(occupancy = num_travelers) %>%
  dplyr::filter(!is.na(occupancy), occupancy > 0, occupancy != 995) %>%
  dplyr::mutate(vmt = distance_miles / occupancy) %>%
  dplyr::group_by(day_id) %>%
  dplyr::summarize(
    total_wtd_vmt_on_day = sum(
      vmt * trip_weight,
      na.rm = TRUE
    ),
    .groups = "drop"
  )

day_trip_vmt <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::transmute(
    day_id,
    day_weight
  ) %>%
  dplyr::left_join(
    day_trip_vmt,
    by = "day_id"
  ) %>%
  dplyr::mutate(
    total_wtd_vmt_on_day = ifelse(is.na(total_wtd_vmt_on_day), 0, total_wtd_vmt_on_day),
    wtd_vmt_per_day = total_wtd_vmt_on_day / day_weight
  )

Next, assemble the model dataset by joining the day-level outcome to the household and person predictors used in the regression.

model_data <- hts$day %>%
  dplyr::filter(hh_id %in% complete_hh_ids) %>%
  dplyr::select(person_id, hh_id, day_id, day_weight, telecommute_time) %>%
  dplyr::left_join(
    day_trip_vmt %>%
      dplyr::select(day_id, wtd_vmt_per_day),
    by = "day_id"
  ) %>%
  dplyr::left_join(
    hts$hh %>%
      dplyr::filter(is_complete == 1) %>%
      dplyr::transmute(
        hh_id,
        sample_segment,
        num_vehicles,
        income_broad,
        income_key = as.character(income_broad)
      ),
    by = "hh_id"
  ) %>%
  dplyr::left_join(
    income_value_labels,
    by = "income_key"
  ) %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::select(person_id),
    by = "person_id"
  ) %>%
  dplyr::filter(day_weight > 0) %>%
  dplyr::mutate(
    telework_min = telecommute_time,
    telework_group = dplyr::case_when(
      is.na(telework_min) ~ "Missing",
      telework_min == 0 ~ "0 min",
      telework_min <= 120 ~ "1-120 min",
      telework_min <= 240 ~ "121-240 min",
      telework_min > 240 ~ "240+ min"
    ),
    telework_group = factor(
      telework_group,
      levels = c("0 min", "1-120 min", "121-240 min", "240+ min", "Missing")
    ),
    num_vehicles_group = factor(
      dplyr::case_when(
        num_vehicles %in% c(995, 999) ~ "Missing",
        num_vehicles >= 4 ~ "4+",
        TRUE ~ as.character(num_vehicles)
      ),
      levels = c("0", "1", "2", "3", "4+", "Missing")
    ),
    income_broad_label = factor(
      income_broad_label,
      levels = income_value_labels$income_broad_label
    )
  )

model_data <- model_data %>%
  dplyr::mutate(
    telework_group = droplevels(telework_group),
    num_vehicles_group = droplevels(num_vehicles_group),
    income_broad_label = droplevels(income_broad_label)
  )

If the analytic question also depends on age, a useful extension is to collapse the delivered age categories into broader groups before fitting the regression.

age_value_labels <- codebook$value_labels %>%
  dplyr::filter(variable == "age") %>%
  dplyr::transmute(
    age_key = as.character(value),
    age_label = sub("^Age\\s+", "", label)
  ) %>%
  dplyr::mutate(
    age_label = ifelse(age_label == "85 up", "85 or older", age_label)
  )

model_data <- model_data %>%
  dplyr::left_join(
    hts$person %>%
      dplyr::filter(hh_id %in% complete_hh_ids) %>%
      dplyr::transmute(
        person_id,
        age_key = as.character(age)
      ) %>%
      dplyr::left_join(
        age_value_labels,
        by = "age_key"
      ),
    by = "person_id"
  ) %>%
  dplyr::mutate(
    age_group = dplyr::case_when(
      age_label %in% c("18-24", "25-34") ~ "18-34",
      age_label %in% c("35-44", "45-54") ~ "35-54",
      age_label %in% c("55-64", "65-74", "75-84", "85 or older") ~ "55+",
      TRUE ~ "Missing"
    ),
    age_group = factor(age_group, levels = c("18-34", "35-54", "55+", "Missing"))
  ) %>%
  dplyr::mutate(age_group = droplevels(age_group))

vmt_model_formula <- wtd_vmt_per_day ~ telework_group + num_vehicles_group + income_broad_label + age_group

Finally, define the survey design, fit the weighted model, and tidy the coefficients for display.

vmt_design <- model_data %>%
  srvyr::as_survey_design(
    ids = hh_id,
    strata = sample_segment,
    weights = day_weight
  )

vmt_model_formula <- wtd_vmt_per_day ~ telework_group + num_vehicles_group + income_broad_label

vmt_model <- survey::svyglm(
  vmt_model_formula,
  design = vmt_design
)

model_tbl <- broom::tidy(vmt_model, conf.int = TRUE) %>%
  dplyr::mutate(
    term_clean = dplyr::case_when(
    term == "(Intercept)" ~ "Intercept",
    stringr::str_detect(term, "^telework_group") ~ stringr::str_replace(term, "telework_group", "Telework: "),
    stringr::str_detect(term, "^num_vehicles_group") ~ stringr::str_replace(term, "num_vehicles_group", "Vehicles: "),
    stringr::str_detect(term, "^income_broad_label") ~ stringr::str_replace(term, "income_broad_label", "Income: "),
    TRUE ~ term
  ),
    stars = dplyr::case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      TRUE ~ ""
    )
  ) %>%
  dplyr::select(term_clean, estimate, std.error, statistic, p.value, stars, conf.low, conf.high)

Table 67 presents the base weighted model with standard errors and confidence intervals.

Code

gt::gt(model_tbl) %>%
  gt::fmt_number(
    columns = c(estimate, std.error, statistic, conf.low, conf.high),
    decimals = 2
  ) %>%
  gt::fmt_number(columns = p.value, decimals = 3) %>%
  gt::cols_label(
    term_clean = "Term",
    estimate = "Estimate",
    std.error = "Std. Error",
    statistic = "t-value",
    p.value = "p-value",
    stars = "",
    conf.low = "CI Low",
    conf.high = "CI High"
  ) %>%
  gt::tab_header(
    title = "Base Survey-Weighted Linear Model of Daily VMT",
    subtitle = "Outcome: Weighted Vehicle-Miles Traveled per Diary Day"
  ) %>%
  gt::tab_options(
    table.font.size = gt::px(13),
    data_row.padding = gt::px(4)
  )

Term	Estimate	Std. Error	t-value	p-value		CI Low	CI High
Base Survey-Weighted Linear Model of Daily VMT
Outcome: Weighted Vehicle-Miles Traveled per Diary Day
Intercept	25.45	5.58	4.56	0.000	***	14.51	36.40
Telework: 1-120 min	2.98	7.18	0.41	0.679		−11.11	17.06
Telework: 121-240 min	−0.13	6.43	−0.02	0.983		−12.74	12.47
Telework: 240+ min	−17.63	5.75	−3.07	0.002	**	−28.89	−6.37
Telework: Missing	−22.98	6.27	−3.66	0.000	***	−35.27	−10.69
Vehicles: 1	6.08	3.35	1.82	0.069		−0.48	12.65
Vehicles: 2	17.76	5.69	3.12	0.002	**	6.61	28.90
Vehicles: 3	17.77	9.62	1.85	0.065		−1.10	36.64
Vehicles: 4+	21.04	9.44	2.23	0.026	*	2.54	39.54
Income: $25,000-$49,999	11.98	9.45	1.27	0.205		−6.54	30.51
Income: $50,000-$74,999	3.48	3.84	0.91	0.364		−4.04	11.01
Income: $75,000-$99,999	−0.31	3.04	−0.10	0.920		−6.27	5.66
Income: $100,000-$199,999	13.84	7.55	1.83	0.067		−0.96	28.65
Income: $200,000 or more	2.76	4.12	0.67	0.503		−5.31	10.83
Income: Prefer not to answer	1.52	4.53	0.34	0.737		−7.36	10.40

Table 67: Base survey-weighted daily VMT model.

Why Use Weighted Models?

Weighted models become especially useful when analysts need to compare groups, adjust for multiple factors at once, or stabilize estimates for small subgroups. They do not replace descriptive tables, but they provide a more reliable route to inference when the question extends beyond simple description.