Group PUMAs by sample rate using spectral clustering
spectral_zone_groups.RdCreate PUMA zone groups with similar sample rates and adjacency using spectral clustering. Use to balance sample rates and spatial contiguity for weighting pipelines.
Arguments
- seed
data.table with required columns:
puma_id— PUMA identifier p_total— total persons h_total— total households Rows: one per household. Keys: ( puma_id). Modified by reference: no (returns copy).
- targets
data.table with required columns:
puma_idp_totalh_totalRows: one per household. Keys: ( puma_id). Modified by reference: no.
- puma_sf
sf object. PUMA geometries. Must include PUMA ID column.
- k_groups
integer or 'auto'. Number of groups to create. If 'auto', selects optimal k.
- rng_seed
integer. Random seed for reproducibility. Default 4321.
Value
data.table with columns:
puma_id— PUMA identifier zone_group— assigned group Rows: one per PUMA. Keys: ( puma_id).
Details
Constructs adjacency matrix from PUMA geometries (sf polygons).
Calculates node and edge weights using household sample rates.
Computes Laplacian matrix and eigenvectors for clustering.
Assigns zone groups via k-means on spectral features.
Returns a copy; does not modify by reference.
Assumes valid PUMA IDs and geometries; errors if missing.
See also
scripts/weighting/zone_groups.R
Other clustering utilities:
entropy_zone_groups(),
get_kmeans_opt_k(),
kmeans_zone_groups()