USGS Fix & Other Updates#66
Conversation
Previously if the tag parameter was set to Null or was not provided it would default to "data". This caused issues for some endpoints that did not have a "data" key in their response or for which the full object was desired.
BoR does not report bicarbonate data
The USGS deprecated the old water level retrieval API endpoints, so this commit updates the NWISWaterLevelSource to use the new OGC API endpoints. Additionally, the chunk size for site retrieval is reduced to 5 to avoid URI length issues and httpx read timeouts.
Use the new USGS APIs to retrieve water level data.
There was a problem hiding this comment.
Pull request overview
Updates the USGS connector to use the newer USGS Water Data OGC API and adjusts _execute_json_request behavior so callers can optionally receive the full JSON response object (e.g., to access features/links).
Changes:
- Added support for returning full JSON responses from
_execute_json_requestwhentagis not provided, and updated BoR/ISC Seven Rivers callers to passtag="data"explicitly. - Migrated USGS NWIS site/waterlevel retrieval to the USGS OGC API endpoints (including batching for complex queries and a large
limit). - Extended datetime parsing to handle an additional
+00:00ISO-like format; updated parameter-to-agency mapping for bicarbonate.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/transformer.py | Adds an additional datetime format for parsing timestamps returned by sources. |
| backend/source.py | Changes _execute_json_request to no longer default tag to "data", allowing full response objects to be returned. |
| backend/connectors/usgs/transformer.py | Updates USGS site transformation to match OGC API field shapes/keys. |
| backend/connectors/usgs/source.py | Reworks USGS site/waterlevel retrieval to use OGC API endpoints and batching/limits. |
| backend/connectors/isc_seven_rivers/source.py | Passes tag="data" explicitly to maintain prior behavior after _execute_json_request change. |
| backend/connectors/bor/source.py | Passes tag="data" explicitly to maintain prior behavior after _execute_json_request change. |
| backend/config.py | Adds a bicarbonate-specific agency configuration branch. |
Comments suppressed due to low confidence (2)
backend/config.py:309
BICARBONATEis now handled in its own branch, but it also still appears in the following multi-parameter branch, making that entry unreachable for bicarbonate and easy to misread later. RemoveBICARBONATEfrom the later list (or consolidate the branches) so each parameter is mapped in exactly one place.
elif self.parameter in [BICARBONATE]:
config_agencies = ["nmbgmr_amp", "nmed_dwb", "nmose_isc_seven_rivers", "wqp"]
false_agencies = [
"bor",
"bernco",
"cabq",
"ebid",
"nmose_roswell",
"nmose_pod",
"nwis",
"pvacd",
]
elif self.parameter in [
BICARBONATE,
CALCIUM,
CHLORIDE,
FLUORIDE,
MAGNESIUM,
NITRATE,
PH,
POTASSIUM,
SILICA,
SODIUM,
SULFATE,
TDS,
]:
backend/connectors/usgs/transformer.py:50
- The site transformer now sets
sourceto "USGS", butNWISWaterLevelTransformer.source_tagis still "USGS-NWIS". Other connectors keep sitesourceand parametersource_tagaligned (e.g., BOR-RISE). Consider standardizing these to the same value to avoid inconsistent source labels in outputs.
rec = {
"source": "USGS",
"id": record["properties"]["monitoring_location_id"],
"name": record["properties"]["monitoring_location_name"],
"latitude": record["geometry"]["coordinates"][1],
"longitude": record["geometry"]["coordinates"][0],
"elevation": elevation,
"elevation_units": "ft",
"horizontal_datum": "WGS84",
"vertical_datum": record["properties"]["vertical_datum"],
"aquifer": record["properties"]["national_aquifer_code"],
"well_depth": record["properties"]["well_constructed_depth"],
"well_depth_units": "ft",
}
return rec
class NWISWaterLevelTransformer(WaterLevelTransformer):
source_tag = "USGS-NWIS"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@jirhiker @likithabommasani-sketch I'm setting |
|
Also, @jirhiker and @likithabommasani-sketch, should we continue to call it |
|
"should we continue to call it NWIS or should it just be USGS now? " where? in the codebase or in a public interface? |
|
Both the code base and publicly. So I'd rename everything from |
jirhiker
left a comment
There was a problem hiding this comment.
API Keys should never be committed to github.
Requiring an API key means we cannot continue to support USGS data in the current state. We either have to add an authentication layer to the CLI or pivot to a new approach
|
I removed the API key and with the limit set to 50,000 I was able to reduce the number of API calls so that a key is not needed. If someone runs the tool more than once, though, it will run into |
|
"Both the code base and publicly. So I'd rename everything from NWIS... to USGS... and then to exclude USGS data people would use the --no-usgs flag instead of the --no-nwis flag" CLI should be backwards compatible. so --no-usgs should behave same a --no-nwis |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
backend/source.py:260
_execute_json_requestis annotated as returningdict | None, but it can return non-dict JSON payloads (e.g., a top-level list) whentagisNone/falsy. Since this PR intentionally allows returning the full response object, please widen the return type annotation and docstring to match actual behavior (e.g.,dict | list | NoneorAny).
resp = httpx.get(url, params=params, **kw)
if resp.status_code == 200:
try:
obj = resp.json()
if tag and isinstance(obj, dict):
return obj[tag]
return obj
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
tests/test_sources/test_nmbgmr_amp.py:16
- This fixture forces
IS_TESTING_ENVto "False" in teardown even if it had a different value before the test run (or was unset), which can leak state across test modules. Consider saving the prior value and restoring it, oros.environ.pop("IS_TESTING_ENV", None)when it wasn’t originally set.
@pytest.fixture(autouse=True)
def setup_nmbgmr_amp():
# SETUP CODE -----------------------------------------------------------
os.environ["IS_TESTING_ENV"] = "True"
# RUN TESTS ------------------------------------------------------------
yield
# TEARDOWN CODE ---------------------------------------------------------
os.environ["IS_TESTING_ENV"] = "False"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
tests/test_sources/test_nmbgmr_amp.py:16
- This fixture overwrites
IS_TESTING_ENVand then forces it to the string "False" on teardown, which can clobber a pre-existing value and leak state across tests. Prefer saving the previous value (or absence) and restoring/removing it after the test to keep the environment isolated.
@pytest.fixture(autouse=True)
def setup_nmbgmr_amp():
# SETUP CODE -----------------------------------------------------------
os.environ["IS_TESTING_ENV"] = "True"
# RUN TESTS ------------------------------------------------------------
yield
# TEARDOWN CODE ---------------------------------------------------------
os.environ["IS_TESTING_ENV"] = "False"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Why
This PR addresses the following problem / context:
Noneto the_execute_json_requestmethod when we need to use the full response object, such asfeaturesandlinksHow
Implementation summary - the following was changed / added / removed:
USGS
There are two issues that needed to be resolved when retrieving USGS data:
It turns out that both of these issues could be addressed with the same solution: set the
limitquery parameter to the maximum allowed (50,000). By doing this I would get all of the records back in one page and I would severely cut the number of necessary requests.It also turns out, however, that if I use the
limitquery parameter in a POST request the cursor-based pagination breaks. I'm not sure why, but iflimitis less than the expected number of records pagination returns all USGS records and doesn't apply the filters. There are ~32,000 wells in NM that have groundwater field measurements, and I can get measurements for up to 250 wells at a time, so using?limit=50000in my GET and POST requests should be safe for a very long time. Nevertheless, I have commented-out code that can handle pagination if/when it is fixed._execute_json_request tags
Previously if no tag was specified (i.e. set or defaulted to
None) then the method_execute_json_requestwould default totagto"data". This had the unintended consequence of disallowing multiple keys to be used to access data from the response object, such as"features"and"links"(the latter used to follow pagination). This default was removed, allowing the full response object to be returned. I then updated BoR and NMOSE ISC Seven Rivers to explicitly settag="data"when invoking the_execute_json_requestmethod.Notes
Any special considerations, workarounds, or follow-up work to note?