|
| 1 | +# Congress.gov Relational Data Model |
| 2 | + |
| 3 | +The schema described below synthesizes official Congress.gov documentation, publicly available data |
| 4 | +samples, and structural cues from the website. It is designed to accommodate every major collection |
| 5 | +published by the API while remaining normalized and query-friendly. |
| 6 | + |
| 7 | +> **Note:** Congress.gov evolves continuously. Treat this model as a strong baseline and monitor the |
| 8 | +> API changelog for new fields or entities that may require schema extensions. |
| 9 | +
|
| 10 | +## Core Reference Tables |
| 11 | + |
| 12 | +| Table | Purpose | |
| 13 | +| --- | --- | |
| 14 | +| `congress.sessions` | One row per numbered Congress (e.g., 118th), including start/end dates and calendar year range. | |
| 15 | +| `congress.chambers` | Enumerates the House, Senate, and Joint designations used by multiple resources. | |
| 16 | +| `congress.parties` | Catalogues political parties for member affiliation history. | |
| 17 | +| `congress.states` | ISO-like references for U.S. states and territories, re-used across members and committees. | |
| 18 | + |
| 19 | +## People and Organizations |
| 20 | + |
| 21 | +- **Members**: Stored in `congress.members` with biographical metadata. Temporal service information |
| 22 | + lives in `congress.member_terms`, enabling many-to-one relationships across Congress sessions, |
| 23 | + chambers, and parties. |
| 24 | +- **Committees**: Captured by `congress.committees` with optional `parent_committee_id` for |
| 25 | + subcommittees. Committee membership (including leadership roles) is managed via |
| 26 | + `congress.committee_members`. |
| 27 | + |
| 28 | +## Legislative Instruments |
| 29 | + |
| 30 | +| Entity | Tables | Highlights | |
| 31 | +| --- | --- | --- | |
| 32 | +| Bills & Resolutions | `congress.bills`, `congress.bill_titles`, `congress.bill_actions`, `congress.bill_text_versions`, `congress.bill_summaries`, `congress.bill_subjects`, `congress.bill_cosponsors`, `congress.related_bills` | Covers metadata, multi-lingual titles, action history, full-text versions, CRS summaries, topical subjects, and co-sponsorship data. | |
| 33 | +| Amendments | `congress.amendments`, `congress.amendment_actions`, `congress.amendment_sponsors` | Mirrors the bill structure for amendment records. | |
| 34 | +| Nominations | `congress.nominations`, `congress.nomination_actions`, `congress.nomination_candidates` | Tracks presidential nominations and Senate action. | |
| 35 | +| Treaties | `congress.treaties`, `congress.treaty_actions`, `congress.treaty_topics` | Stores treaty documents and consideration steps. | |
| 36 | +| Congressional Records | `congress.congressional_record_sections`, `congress.congressional_record_pages` | Enables ingestion of daily Congressional Record text and metadata. | |
| 37 | +| Committee Materials | `congress.committee_reports`, `congress.hearings`, `congress.hearing_witnesses` | Supports published reports, hearing schedules, and witness rosters. | |
| 38 | + |
| 39 | +## Legislative Activity |
| 40 | + |
| 41 | +- **Actions and Votes**: All legislative actions are normalized in `congress.bill_actions` and related |
| 42 | + tables. Roll call votes from both chambers live in `congress.roll_calls` with individual positions in |
| 43 | + `congress.roll_call_votes`. |
| 44 | +- **Calendars**: Floor calendars and schedule entries are modeled in `congress.floor_calendars` and |
| 45 | + `congress.floor_calendar_entries`. |
| 46 | + |
| 47 | +## Supporting Structures |
| 48 | + |
| 49 | +- **Documents & Media**: `congress.documents` stores references to PDFs, XML, and other artifacts, |
| 50 | + linking them back to primary entities through join tables. |
| 51 | +- **Search Indexing**: The schema includes `tsvector` columns (e.g., `search_document`) in several |
| 52 | + tables to enable PostgreSQL full-text search acceleration. |
| 53 | +- **Audit Columns**: Every table carries `created_at`, `updated_at`, and immutable natural keys from |
| 54 | + the API, making the ingestion process idempotent. |
| 55 | + |
| 56 | +## Entity Relationship Diagram (Textual) |
| 57 | + |
| 58 | +``` |
| 59 | +members 1---* member_terms *---1 chambers |
| 60 | +members 1---* bill_sponsors *---1 bills |
| 61 | +bills 1---* bill_actions |
| 62 | +bills 1---* bill_text_versions |
| 63 | +bills *---* subjects (via bill_subjects) |
| 64 | +bills *---* committees (via bill_committees) |
| 65 | +bills *---* roll_calls *---* members (via roll_call_votes) |
| 66 | +amendments *---1 bills (via parent_bill_id) |
| 67 | +``` |
| 68 | + |
| 69 | +## API Alignment |
| 70 | + |
| 71 | +- **Pagination**: All API collections use cursor-based pagination. The ingestion pipeline stores the |
| 72 | + `next` token in `congress.ingest_checkpoints` for resumability. |
| 73 | +- **Change Tracking**: Congress.gov publishes `lastModifiedDate` fields. These populate |
| 74 | + `updated_at` columns and support incremental refreshes with `--changed-since` CLI filters. |
| 75 | +- **Identifiers**: Primary keys follow the API's composite keys (e.g., bill type + number + congress) |
| 76 | + to avoid synthetic IDs where unnecessary. |
| 77 | + |
| 78 | +## Future Extensions |
| 79 | + |
| 80 | +1. **Historical Data Normalization**: Some early Congresses have incomplete metadata. Consider |
| 81 | + augmenting the schema with archival datasets when available. |
| 82 | +2. **Event Streams**: If near-real-time updates are required, add Kafka topics fed by the ingestion |
| 83 | + script and consumers that apply change sets to the database. |
| 84 | +3. **Analytic Warehousing**: Mirror the normalized schema into star schemas in your analytics layer |
| 85 | + for simplified reporting (e.g., `fact_votes`, `dim_member`). |
| 86 | + |
0 commit comments