USGS Fix & Other Updates by jacob-a-brown · Pull Request #66 · DataIntegrationGroup/DataIntegrationEngine

jacob-a-brown · 2026-04-29T17:05:10Z

Why

This PR addresses the following problem / context:

The USGS is in the process of or has already deprecated water services APIs. We need to use their new APIs
We should be able to pass a tag of None to the _execute_json_request method when we need to use the full response object, such as features and links
BoR does not report bicarbonate data, so it should not be queryable when running the DIE for bicarbonate

How

Implementation summary - the following was changed / added / removed:

USGS

There are two issues that needed to be resolved when retrieving USGS data:

I needed to get all records, with or without pagination.
I needed to stay within the allowed request rate limit.

It turns out that both of these issues could be addressed with the same solution: set the limit query parameter to the maximum allowed (50,000). By doing this I would get all of the records back in one page and I would severely cut the number of necessary requests.

It also turns out, however, that if I use the limit query parameter in a POST request the cursor-based pagination breaks. I'm not sure why, but if limit is less than the expected number of records pagination returns all USGS records and doesn't apply the filters. There are ~32,000 wells in NM that have groundwater field measurements, and I can get measurements for up to 250 wells at a time, so using ?limit=50000 in my GET and POST requests should be safe for a very long time. Nevertheless, I have commented-out code that can handle pagination if/when it is fixed.

_execute_json_request tags

Previously if no tag was specified (i.e. set or defaulted to None) then the method _execute_json_request would default to tag to "data". This had the unintended consequence of disallowing multiple keys to be used to access data from the response object, such as "features" and "links" (the latter used to follow pagination). This default was removed, allowing the full response object to be returned. I then updated BoR and NMOSE ISC Seven Rivers to explicitly set tag="data" when invoking the _execute_json_request method.

Notes

Any special considerations, workarounds, or follow-up work to note?

Use bullet points here

Previously if the tag parameter was set to Null or was not provided it would default to "data". This caused issues for some endpoints that did not have a "data" key in their response or for which the full object was desired.

BoR does not report bicarbonate data

The USGS deprecated the old water level retrieval API endpoints, so this commit updates the NWISWaterLevelSource to use the new OGC API endpoints. Additionally, the chunk size for site retrieval is reduced to 5 to avoid URI length issues and httpx read timeouts.

Use the new USGS APIs to retrieve water level data.

Copilot

Pull request overview

Updates the USGS connector to use the newer USGS Water Data OGC API and adjusts _execute_json_request behavior so callers can optionally receive the full JSON response object (e.g., to access features/links).

Changes:

Added support for returning full JSON responses from _execute_json_request when tag is not provided, and updated BoR/ISC Seven Rivers callers to pass tag="data" explicitly.
Migrated USGS NWIS site/waterlevel retrieval to the USGS OGC API endpoints (including batching for complex queries and a large limit).
Extended datetime parsing to handle an additional +00:00 ISO-like format; updated parameter-to-agency mapping for bicarbonate.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
backend/transformer.py	Adds an additional datetime format for parsing timestamps returned by sources.
backend/source.py	Changes `_execute_json_request` to no longer default `tag` to `"data"`, allowing full response objects to be returned.
backend/connectors/usgs/transformer.py	Updates USGS site transformation to match OGC API field shapes/keys.
backend/connectors/usgs/source.py	Reworks USGS site/waterlevel retrieval to use OGC API endpoints and batching/limits.
backend/connectors/isc_seven_rivers/source.py	Passes `tag="data"` explicitly to maintain prior behavior after `_execute_json_request` change.
backend/connectors/bor/source.py	Passes `tag="data"` explicitly to maintain prior behavior after `_execute_json_request` change.
backend/config.py	Adds a bicarbonate-specific agency configuration branch.

Comments suppressed due to low confidence (2)

backend/config.py:309

BICARBONATE is now handled in its own branch, but it also still appears in the following multi-parameter branch, making that entry unreachable for bicarbonate and easy to misread later. Remove BICARBONATE from the later list (or consolidate the branches) so each parameter is mapped in exactly one place.

        elif self.parameter in [BICARBONATE]:
            config_agencies = ["nmbgmr_amp", "nmed_dwb", "nmose_isc_seven_rivers", "wqp"]
            false_agencies = [
                "bor",
                "bernco",
                "cabq",
                "ebid",
                "nmose_roswell",
                "nmose_pod",
                "nwis",
                "pvacd",
            ]
        elif self.parameter in [
            BICARBONATE,
            CALCIUM,
            CHLORIDE,
            FLUORIDE,
            MAGNESIUM,
            NITRATE,
            PH,
            POTASSIUM,
            SILICA,
            SODIUM,
            SULFATE,
            TDS,
        ]:

backend/connectors/usgs/transformer.py:50

The site transformer now sets source to "USGS", but NWISWaterLevelTransformer.source_tag is still "USGS-NWIS". Other connectors keep site source and parameter source_tag aligned (e.g., BOR-RISE). Consider standardizing these to the same value to avoid inconsistent source labels in outputs.

        rec = {
            "source": "USGS",
            "id": record["properties"]["monitoring_location_id"],
            "name": record["properties"]["monitoring_location_name"],
            "latitude": record["geometry"]["coordinates"][1],
            "longitude": record["geometry"]["coordinates"][0],
            "elevation": elevation,
            "elevation_units": "ft",
            "horizontal_datum": "WGS84",
            "vertical_datum": record["properties"]["vertical_datum"],
            "aquifer": record["properties"]["national_aquifer_code"],
            "well_depth": record["properties"]["well_constructed_depth"],
            "well_depth_units": "ft",
        }
        return rec


class NWISWaterLevelTransformer(WaterLevelTransformer):
    source_tag = "USGS-NWIS"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jacob-a-brown · 2026-04-29T18:48:19Z

@jirhiker @likithabommasani-sketch I'm setting timeout=1800 to have it timeout after 30 minutes. If that occurs it tries again. CoPilot suggested I not use timeout=None so that there are no indefinite hangups.

jacob-a-brown · 2026-04-29T18:52:35Z

Also, @jirhiker and @likithabommasani-sketch, should we continue to call it NWIS or should it just be USGS now?

jirhiker · 2026-04-30T16:20:32Z

"should we continue to call it NWIS or should it just be USGS now? "

where? in the codebase or in a public interface?

jacob-a-brown · 2026-04-30T16:22:03Z

Both the code base and publicly. So I'd rename everything from NWIS... to USGS... and then to exclude USGS data people would use the --no-usgs flag instead of the --no-nwis flag

jirhiker

API Keys should never be committed to github.

Requiring an API key means we cannot continue to support USGS data in the current state. We either have to add an authentication layer to the CLI or pivot to a new approach

jacob-a-brown · 2026-04-30T16:30:18Z

I removed the API key and with the limit set to 50,000 I was able to reduce the number of API calls so that a key is not needed. If someone runs the tool more than once, though, it will run into 429 rate limit issues. I've also added a --usgs-api-key flag to the CLI so that people can add their own API keys if they have them. I'll push these updates to GitHub and re-request review in a moment.

jirhiker · 2026-04-30T16:31:11Z

"Both the code base and publicly. So I'd rename everything from NWIS... to USGS... and then to exclude USGS data people would use the --no-usgs flag instead of the --no-nwis flag"

CLI should be backwards compatible. so --no-usgs should behave same a --no-nwis

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

backend/source.py:260

_execute_json_request is annotated as returning dict | None, but it can return non-dict JSON payloads (e.g., a top-level list) when tag is None/falsy. Since this PR intentionally allows returning the full response object, please widen the return type annotation and docstring to match actual behavior (e.g., dict | list | None or Any).

        resp = httpx.get(url, params=params, **kw)
        if resp.status_code == 200:
            try:
                obj = resp.json()
                if tag and isinstance(obj, dict):
                    return obj[tag]
                return obj

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ements

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

tests/test_sources/test_nmbgmr_amp.py:16

This fixture forces IS_TESTING_ENV to "False" in teardown even if it had a different value before the test run (or was unset), which can leak state across test modules. Consider saving the prior value and restoring it, or os.environ.pop("IS_TESTING_ENV", None) when it wasn’t originally set.

@pytest.fixture(autouse=True)
def setup_nmbgmr_amp():
    # SETUP CODE -----------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "True"

    # RUN TESTS ------------------------------------------------------------
    yield

    # TEARDOWN CODE ---------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "False"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…api-key flag

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

tests/test_sources/test_nmbgmr_amp.py:16

This fixture overwrites IS_TESTING_ENV and then forces it to the string "False" on teardown, which can clobber a pre-existing value and leak state across tests. Prefer saving the previous value (or absence) and restoring/removing it after the test to keep the environment isolated.

@pytest.fixture(autouse=True)
def setup_nmbgmr_amp():
    # SETUP CODE -----------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "True"

    # RUN TESTS ------------------------------------------------------------
    yield

    # TEARDOWN CODE ---------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "False"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jacob-a-brown added 6 commits April 27, 2026 13:32

fix(source): enable tag paramter for json requests to be Null

c322402

Previously if the tag parameter was set to Null or was not provided it would default to "data". This caused issues for some endpoints that did not have a "data" key in their response or for which the full object was desired.

fix(config): update config/false agencies for bicarbonate

ec9f6e3

BoR does not report bicarbonate data

fix(usgs): update transformer for combined-metadata endpoint

036e9e7

fix(usgs): update USGS water level retrieval for new APIs

2425094

Use the new USGS APIs to retrieve water level data.

fix(usgs): set limit to 50000 to avoid pagination/rate limits

500a0b1

jacob-a-brown requested review from Copilot, jirhiker and likithabommasani21 April 29, 2026 17:05

Copilot started reviewing on behalf of jacob-a-brown April 29, 2026 17:06 View session

refactor(usgs): check usgs health with get request

d93869d

Copilot AI reviewed Apr 29, 2026

View reviewed changes

jacob-a-brown added 8 commits April 29, 2026 11:14

fix(config): set agencies for bicarbonate

12247f5

fix(usgs): add timeout to usgs api calls and retry logic if errors

b342922

fix(usgs): add retry logic to pagination retrieval

3f2d104

refactor(usgs): set datum in site transformer

6005f70

fix(usgs): return False if health check fails

dba8213

fix(usgs): shorten error message

b17bbde

fix(usgs): set limit of 1 for health check

5570c36

fix(usgs): remove unused imports

836243e

fix(usgs): call self.warn not warning

a51b13e

jirhiker requested changes Apr 30, 2026

View reviewed changes

fix(usgs): remove hardcoded API key

31271c6

fix(usgs): have users pass USGS API keys

deed95f

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread backend/connectors/usgs/source.py Outdated

Comment thread backend/connectors/usgs/source.py Outdated

Comment thread backend/connectors/usgs/source.py Outdated

Comment thread backend/connectors/usgs/source.py

Comment thread backend/connectors/usgs/source.py

fix(partial records): don't persist partial records

690ba41

jacob-a-brown requested a review from Copilot April 30, 2026 19:26

Copilot started reviewing on behalf of jacob-a-brown April 30, 2026 19:27 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread backend/connectors/usgs/source.py Outdated

Comment thread backend/connectors/usgs/source.py Outdated

Comment thread backend/connectors/usgs/source.py

Comment thread backend/connectors/usgs/transformer.py

Comment thread backend/unifier.py

jacob-a-brown added 7 commits April 30, 2026 13:40

fix(usgs): make site source USGS-NWIS to correspond with field measur…

7424177

…ements

feature(usgs): add dt params

013406d

note(usgs): type hint variables

7d38c37

test(usgs): use .env for api key for tests

8183300

fix(usgs): Handle USGS rate limit and partial/no data errors

9bf3fc2

fix(test): don't override parent setup fixture

43409f3

fix(test): remove env var outside of fixture

441ff81

jacob-a-brown requested a review from Copilot April 30, 2026 20:14

Copilot started reviewing on behalf of jacob-a-brown April 30, 2026 20:14 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/test_sources/test_nwis.py Outdated

Comment thread setup.py Outdated

Comment thread backend/connectors/usgs/source.py Outdated

Comment thread backend/unifier.py

Comment thread backend/exceptions.py

jacob-a-brown added 7 commits April 30, 2026 14:35

fix(requirements): download python-dotenv

e524ec7

fix(test): load usgs api key from .env for every test

7af57d6

fix(unifier): remove partial persister.records on failure

d7382bd

fix(usgs): use truthiness check for usgs api key header

245b01f

fix(style): two blank lines between classes

f5b2e2f

fix(usgs): fail health check on non 2xx status codes

85ab811

test(usgs api key): test usgs api key is env var when set via --usgs-…

5cf9904

…api-key flag

jacob-a-brown requested a review from Copilot May 1, 2026 14:50

Copilot started reviewing on behalf of jacob-a-brown May 1, 2026 14:51 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread tests/test_cli/__init__.py

Comment thread backend/connectors/usgs/source.py

fix(test): reset usgs api key after each test

4f3b913

jacob-a-brown requested review from jirhiker and likithabommasani21 and removed request for jirhiker and likithabommasani21 May 1, 2026 19:14

Conversation

jacob-a-brown commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

USGS

_execute_json_request tags

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacob-a-brown commented Apr 29, 2026

Uh oh!

jacob-a-brown commented Apr 29, 2026

Uh oh!

jirhiker commented Apr 30, 2026

Uh oh!

jacob-a-brown commented Apr 30, 2026

Uh oh!

jirhiker left a comment

Choose a reason for hiding this comment

Uh oh!

jacob-a-brown commented Apr 30, 2026

Uh oh!

jirhiker commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacob-a-brown commented Apr 29, 2026 •

edited

Loading