Group PUMAs by sample rate using spectral clustering

Create PUMA zone groups with similar sample rates and adjacency using spectral clustering. Use to balance sample rates and spatial contiguity for weighting pipelines.

Usage

spectral_zone_groups(
  seed,
  targets,
  puma_sf,
  k_groups = "auto",
  rng_seed = 4321
)

Arguments

seed

data.table with required columns:

puma_id — PUMA identifier
p_total — total persons
h_total — total households Rows: one per household. Keys: (puma_id). Modified by reference: no (returns copy).

targets

data.table with required columns:

puma_id
p_total
h_total Rows: one per household. Keys: (puma_id). Modified by reference: no.

puma_sf

sf object. PUMA geometries. Must include PUMA ID column.

k_groups

integer or 'auto'. Number of groups to create. If 'auto', selects optimal k.

rng_seed

integer. Random seed for reproducibility. Default 4321.

Value

data.table with columns:

puma_id — PUMA identifier
zone_group — assigned group Rows: one per PUMA. Keys: (puma_id).

Details

Constructs adjacency matrix from PUMA geometries (sf polygons).
Calculates node and edge weights using household sample rates.
Computes Laplacian matrix and eigenvectors for clustering.
Assigns zone groups via k-means on spectral features.
Returns a copy; does not modify by reference.
Assumes valid PUMA IDs and geometries; errors if missing.

Settings

None.

Examples

## Not run:
spectral_zone_groups(seed, targets, puma_sf, k_groups = 3)
#> Error: object 'seed' not found
## End(Not run)