Skip to content

USGS Fix & Other Updates#66

Open
jacob-a-brown wants to merge 44 commits intomainfrom
jab-usgs-fix
Open

USGS Fix & Other Updates#66
jacob-a-brown wants to merge 44 commits intomainfrom
jab-usgs-fix

Conversation

@jacob-a-brown
Copy link
Copy Markdown
Contributor

@jacob-a-brown jacob-a-brown commented Apr 29, 2026

Why

This PR addresses the following problem / context:

  • The USGS is in the process of or has already deprecated water services APIs. We need to use their new APIs
  • We should be able to pass a tag of None to the _execute_json_request method when we need to use the full response object, such as features and links
  • BoR does not report bicarbonate data, so it should not be queryable when running the DIE for bicarbonate

How

Implementation summary - the following was changed / added / removed:

USGS

There are two issues that needed to be resolved when retrieving USGS data:

  1. I needed to get all records, with or without pagination.
  2. I needed to stay within the allowed request rate limit.

It turns out that both of these issues could be addressed with the same solution: set the limit query parameter to the maximum allowed (50,000). By doing this I would get all of the records back in one page and I would severely cut the number of necessary requests.

It also turns out, however, that if I use the limit query parameter in a POST request the cursor-based pagination breaks. I'm not sure why, but if limit is less than the expected number of records pagination returns all USGS records and doesn't apply the filters. There are ~32,000 wells in NM that have groundwater field measurements, and I can get measurements for up to 250 wells at a time, so using ?limit=50000 in my GET and POST requests should be safe for a very long time. Nevertheless, I have commented-out code that can handle pagination if/when it is fixed.

_execute_json_request tags

Previously if no tag was specified (i.e. set or defaulted to None) then the method _execute_json_request would default to tag to "data". This had the unintended consequence of disallowing multiple keys to be used to access data from the response object, such as "features" and "links" (the latter used to follow pagination). This default was removed, allowing the full response object to be returned. I then updated BoR and NMOSE ISC Seven Rivers to explicitly set tag="data" when invoking the _execute_json_request method.

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Use bullet points here

Previously if the tag parameter was set to Null or was not provided
it would default to "data". This caused issues for some endpoints that did
not have a "data" key in their response or for which the full object was desired.
The USGS deprecated the old water level retrieval API endpoints, so this
commit updates the NWISWaterLevelSource to use the new OGC API endpoints.

Additionally, the chunk size for site retrieval is reduced to 5 to avoid
URI length issues and httpx read timeouts.
Use the new USGS APIs to retrieve water level data.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the USGS connector to use the newer USGS Water Data OGC API and adjusts _execute_json_request behavior so callers can optionally receive the full JSON response object (e.g., to access features/links).

Changes:

  • Added support for returning full JSON responses from _execute_json_request when tag is not provided, and updated BoR/ISC Seven Rivers callers to pass tag="data" explicitly.
  • Migrated USGS NWIS site/waterlevel retrieval to the USGS OGC API endpoints (including batching for complex queries and a large limit).
  • Extended datetime parsing to handle an additional +00:00 ISO-like format; updated parameter-to-agency mapping for bicarbonate.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
backend/transformer.py Adds an additional datetime format for parsing timestamps returned by sources.
backend/source.py Changes _execute_json_request to no longer default tag to "data", allowing full response objects to be returned.
backend/connectors/usgs/transformer.py Updates USGS site transformation to match OGC API field shapes/keys.
backend/connectors/usgs/source.py Reworks USGS site/waterlevel retrieval to use OGC API endpoints and batching/limits.
backend/connectors/isc_seven_rivers/source.py Passes tag="data" explicitly to maintain prior behavior after _execute_json_request change.
backend/connectors/bor/source.py Passes tag="data" explicitly to maintain prior behavior after _execute_json_request change.
backend/config.py Adds a bicarbonate-specific agency configuration branch.
Comments suppressed due to low confidence (2)

backend/config.py:309

  • BICARBONATE is now handled in its own branch, but it also still appears in the following multi-parameter branch, making that entry unreachable for bicarbonate and easy to misread later. Remove BICARBONATE from the later list (or consolidate the branches) so each parameter is mapped in exactly one place.
        elif self.parameter in [BICARBONATE]:
            config_agencies = ["nmbgmr_amp", "nmed_dwb", "nmose_isc_seven_rivers", "wqp"]
            false_agencies = [
                "bor",
                "bernco",
                "cabq",
                "ebid",
                "nmose_roswell",
                "nmose_pod",
                "nwis",
                "pvacd",
            ]
        elif self.parameter in [
            BICARBONATE,
            CALCIUM,
            CHLORIDE,
            FLUORIDE,
            MAGNESIUM,
            NITRATE,
            PH,
            POTASSIUM,
            SILICA,
            SODIUM,
            SULFATE,
            TDS,
        ]:

backend/connectors/usgs/transformer.py:50

  • The site transformer now sets source to "USGS", but NWISWaterLevelTransformer.source_tag is still "USGS-NWIS". Other connectors keep site source and parameter source_tag aligned (e.g., BOR-RISE). Consider standardizing these to the same value to avoid inconsistent source labels in outputs.
        rec = {
            "source": "USGS",
            "id": record["properties"]["monitoring_location_id"],
            "name": record["properties"]["monitoring_location_name"],
            "latitude": record["geometry"]["coordinates"][1],
            "longitude": record["geometry"]["coordinates"][0],
            "elevation": elevation,
            "elevation_units": "ft",
            "horizontal_datum": "WGS84",
            "vertical_datum": record["properties"]["vertical_datum"],
            "aquifer": record["properties"]["national_aquifer_code"],
            "well_depth": record["properties"]["well_constructed_depth"],
            "well_depth_units": "ft",
        }
        return rec


class NWISWaterLevelTransformer(WaterLevelTransformer):
    source_tag = "USGS-NWIS"


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/transformer.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
@jacob-a-brown
Copy link
Copy Markdown
Contributor Author

@jirhiker @likithabommasani-sketch I'm setting timeout=1800 to have it timeout after 30 minutes. If that occurs it tries again. CoPilot suggested I not use timeout=None so that there are no indefinite hangups.

@jacob-a-brown
Copy link
Copy Markdown
Contributor Author

Also, @jirhiker and @likithabommasani-sketch, should we continue to call it NWIS or should it just be USGS now?

@jirhiker
Copy link
Copy Markdown
Member

"should we continue to call it NWIS or should it just be USGS now? "

where? in the codebase or in a public interface?

@jacob-a-brown
Copy link
Copy Markdown
Contributor Author

Both the code base and publicly. So I'd rename everything from NWIS... to USGS... and then to exclude USGS data people would use the --no-usgs flag instead of the --no-nwis flag

Copy link
Copy Markdown
Member

@jirhiker jirhiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API Keys should never be committed to github.

Requiring an API key means we cannot continue to support USGS data in the current state. We either have to add an authentication layer to the CLI or pivot to a new approach

@jacob-a-brown
Copy link
Copy Markdown
Contributor Author

I removed the API key and with the limit set to 50,000 I was able to reduce the number of API calls so that a key is not needed. If someone runs the tool more than once, though, it will run into 429 rate limit issues. I've also added a --usgs-api-key flag to the CLI so that people can add their own API keys if they have them. I'll push these updates to GitHub and re-request review in a moment.

@jirhiker
Copy link
Copy Markdown
Member

"Both the code base and publicly. So I'd rename everything from NWIS... to USGS... and then to exclude USGS data people would use the --no-usgs flag instead of the --no-nwis flag"

CLI should be backwards compatible. so --no-usgs should behave same a --no-nwis

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py
Comment thread backend/connectors/usgs/source.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

backend/source.py:260

  • _execute_json_request is annotated as returning dict | None, but it can return non-dict JSON payloads (e.g., a top-level list) when tag is None/falsy. Since this PR intentionally allows returning the full response object, please widen the return type annotation and docstring to match actual behavior (e.g., dict | list | None or Any).
        resp = httpx.get(url, params=params, **kw)
        if resp.status_code == 200:
            try:
                obj = resp.json()
                if tag and isinstance(obj, dict):
                    return obj[tag]
                return obj

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/connectors/usgs/source.py
Comment thread backend/connectors/usgs/transformer.py
Comment thread backend/unifier.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

tests/test_sources/test_nmbgmr_amp.py:16

  • This fixture forces IS_TESTING_ENV to "False" in teardown even if it had a different value before the test run (or was unset), which can leak state across test modules. Consider saving the prior value and restoring it, or os.environ.pop("IS_TESTING_ENV", None) when it wasn’t originally set.
@pytest.fixture(autouse=True)
def setup_nmbgmr_amp():
    # SETUP CODE -----------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "True"

    # RUN TESTS ------------------------------------------------------------
    yield

    # TEARDOWN CODE ---------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "False"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_sources/test_nwis.py Outdated
Comment thread setup.py Outdated
Comment thread backend/connectors/usgs/source.py Outdated
Comment thread backend/unifier.py
Comment thread backend/exceptions.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

tests/test_sources/test_nmbgmr_amp.py:16

  • This fixture overwrites IS_TESTING_ENV and then forces it to the string "False" on teardown, which can clobber a pre-existing value and leak state across tests. Prefer saving the previous value (or absence) and restoring/removing it after the test to keep the environment isolated.
@pytest.fixture(autouse=True)
def setup_nmbgmr_amp():
    # SETUP CODE -----------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "True"

    # RUN TESTS ------------------------------------------------------------
    yield

    # TEARDOWN CODE ---------------------------------------------------------
    os.environ["IS_TESTING_ENV"] = "False"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_cli/__init__.py
Comment thread backend/connectors/usgs/source.py
@jacob-a-brown jacob-a-brown requested review from jirhiker and likithabommasani21 and removed request for jirhiker and likithabommasani21 May 1, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants