Specs
Built and hosted in the United States.
Where it runs
- App + API + cron — Vercel (a Delaware corp, headquartered in San Francisco). Next.js App Router, Server Components only, Fluid Compute functions for server-side rendering + API routes. Deployed to U.S. regions (Vercel's default
iad1US-East / Washington DC for our primary, with edge cache distributed across all Vercel POPs for read-only HTML + JSON responses). - Database — Neon Postgres (a Delaware corp), provisioned through the Vercel Marketplace in
us-east-1(AWS Virginia). Single primary + managed read replicas for hot paths. Encrypted at rest (AES-256) and in transit (TLS 1.2+). - Object storage — Vercel Blob in US regions, used for OG images + occasional cached source PDFs.
- Vector search — Voyage AI embeddings (US-hosted), with the 1024-dim vectors stored back in Neon Postgres via pgvector (HNSW index). Queries never leave U.S. infrastructure.
- Transactional mail — SendGrid (Twilio subsidiary, U.S. company). Sign-in magic links only. No marketing email, no newsletter list, no transactional notifications beyond auth.
- Payments — Stripe (Delaware corp, U.S. processor). Stripe-hosted Checkout — no card data touches pac.dog. Donations only, via /donate.
- Auxiliary ingest worker — a 4 vCPU / 8 GB DigitalOcean droplet in NYC3 (New York City) runs the heavier FEC bulk + LDA paginated ingest jobs that exceed Vercel function durations. Tracks every request via a daily
/api/cron/*handoff back to the Vercel database.
Where the data comes from
Every row in pac.dog is mirrored from a primary U.S. government source. No third-party data vendors, no scraped aggregators, no editorial intermediation:
- FEC — candidates, committees, contribution caps, all 4 schedule bulks (A/B/E/F + pas2)
- congress.gov — members + bills + sponsorships
- govinfo.gov BILLSTATUS — per-bill action timeline + text-version metadata
- clerk.house.gov + senate.gov — roll-call vote XML
- clerk.house.gov MemberData + senate.gov senators_cfm — legislator DC office contact info
- IRS Form 990 (via ProPublica Nonprofit Explorer mirror) — 501(c) financials
- lda.senate.gov — federal Lobbying Disclosure Act filings
- Each state Secretary of State / election commission — state campaign-finance limits + filing thresholds + voter-reg + primary/general dates
See /status for live row counts + last-write timestamps per source.
Staying current with rule + form changes
Filing forms, contribution caps, itemization thresholds, and per-state report cadences change. Here's how pac.dog keeps the codified rules + the e-file generator in sync with the underlying law:
- FEC + 11 CFR changes — every change to FEC regulations or forms goes through federalregister.gov first. A daily cron (
/api/cron/watch-federal-register) hits the FedReg API filtered toagency=federal-election-commission+document_types=RULE,PRORULE,NOTICE+ a 24h window; new notices land in a watchlist for review. The codified modules insrc/lib/content/federal-limits.ts+federal-thresholds.tscarry agenerator_versionstring we bump per rule change, which writes a newlimit_rules/filing_thresholdsrow witheffective_from = $todayand closes the prior row. Historical filings keep matching against the rule that was in effect on their date. - FEC indexed amounts (BCRA inflation) — the FEC publishes new individual-to-candidate / individual-to-party / national-party-to-Senate-candidate caps each January for the upcoming odd-year cycle (e.g. January 2027 for the 2027-28 cycle). The cron flags the announcement; the operator adds the new amounts to the
INDEXEDtable insrc/lib/content/federal-limits.tsand bumps the generator version. - FECFile format versioning — the canonical
.fecfile carries a header version (e.g.FECFILE^^^^^^^^v8.4^). The e-file generator emits the current spec version explicitly; when the FEC publishes a new FECfile spec the generator templates get a corresponding version bump + the prior generator stays available for filings covering periods before the cut-over. - State forms + cadences — no national clearinghouse. The
state-deadlines.ts+state-limits.ts+state-thresholds.ts+state-lobbying-sources.tsregistries cite each state's controlling statute. A quarterly manual review walks all 51 jurisdictions; commission-published changes between quarters get caught when the state SOS posts a rule update we notice in a state-bar bulletin, a news scrape, or a tip. Each state row has its owngenerator_versionstring. - How state filing forms stay aligned (the part vendors struggle with) — instead of one bespoke generator per state per form, each form is a schema-driven spec in
src/lib/efile/forms/<state>-<form>.ts: column list, per-field validation rules, conditional logic, output-format encoder. When a state revs a form, the diff is almost always a single field added/renamed — edit the spec, bumpgenerator_version, the new spec becomes the one used for filings dated after the rev. The prior spec stays on file for re-generating historical periods. Cross-validation: for states whose commission publishes past filings publicly (CA Cal-Access, FL, TX, etc.), a nightly regression test re-generates a sample of historical filings through our exporter and diffs against the official version. Any drift surfaces as a test failure with the offending field + form spec. The boring shared structure helps: receipts, disbursements, beginning balance, ending balance, summary totals — every state form is a remix of those primitives, not a fundamentally different shape. Per-state field mappings ride on top of a shared abstract engine instead of being 51 ground-up rewrites. - congress.gov + clerk + senate XML — schemas are published by the chambers themselves. When the clerk updates MemberData.xml or the LIS roll-call XML format, our parsers (
src/lib/ingest/legislator-contacts.ts,house-votes.ts,senate-votes.ts) need a corresponding edit. Each parser logs the schema version it expects; mismatch errors are noisy on purpose. - IRS Form 990 + LDA + FARA— same pattern: each ingest module pins the source URL + the column layout it parses. Upstream-schema drift surfaces as parse errors in the ingest log (we ran into FEC's IE-CSV header rename live during initial build — same pattern handles every future rename).
- The /status dashboard is the canary. Every source has a last-write timestamp; anything past its expected refresh window flags ⚠ on /status. A daily cron-failure email is the planned addition once the ops surface is fleshed out.
Security posture
- Authentication — magic-link only (no passwords). 15-minute single-use tokens; SHA-256 hashed in the database, never stored or logged in plaintext. Session cookies are
HttpOnly,SameSite=Lax, 30-day expiry. No third-party SSO yet. - Encryption — at rest in Neon (AES-256) and Vercel Blob; in transit TLS 1.2+ for every external hop. HSTS preload on the public domain.
- Cookies — exactly one cookie:
pacdog_session, set only on sign-in. No analytics cookies, no ad cookies, no third-party tracking. See /privacy. - Third-party scripts— none on the public site. No Google Analytics, no Meta Pixel, no Hotjar, no LinkedIn Insight, no Twitter conversion pixel, no New Relic Browser, no Sentry Replay. The browser fetches HTML + your session cookie and that's it.
- Secrets — environment variables only, scoped to Vercel (production / preview separated). No secret ever appears in a deploy log or browser URL. Sensitive treasurer credentials (planned for Direct mode) will be encrypted with pgsodium per the README §Schema spec.
- CORS + CSRF — public read-only API is open and CDN-cacheable; user-bound endpoints (
/api/v1/me,/api/v1/auth/*,/api/v1/reactions) are session-cookie-gated +no-store, with SameSite=Lax mitigating the obvious CSRF surfaces. Server Actions use Next.js' built-in CSRF protection. - Rate limits— magic-link issuance is rate-limited per email (5 / 15 min). Public browse endpoints are CDN-cached so a misbehaving client mostly hits Vercel's edge, not Neon.
- Auditability — every public row links to its originating primary source. Codified rules carry a
generator_versioncolumn so a future query can pin a historical decision to the exact rule version that was in effect on that date. - Supply chain — production runs from
npm ciagainst a committed lockfile; no install-time scripts beyond the vendored Postgres / React / Next.js packages. Dependency surface is intentionally small (no UI framework, no state library, no analytics SDKs). - Backups + DR— Neon's managed point-in-time recovery covers the database (7-day window). Codified content (limits, thresholds, deadlines) is reproducible from the committed TS generators in
src/lib/content/— the weekly/api/cron/sync-limitscron re-seeds from source-of-truth files.
Where it's not
- No data leaves U.S. jurisdiction. No CDN tier sits outside the U.S. for HTML responses (Vercel's edge cache replicates across POPs, but the origin compute and the database stay in the U.S.).
- No PII is shared with any third party. Email addresses live in Neon and SendGrid (the latter only for the immediate send) and nowhere else.
- No ad networks, no affiliate networks, no data brokers, no syndicated content from third-party publishers, no AI training partnerships, no anonymized data product.