-
Notifications
You must be signed in to change notification settings - Fork 1
Add an admin client for dataset management #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
8080219
*: Add dependencies and model generation for admin client
fordN e7674f2
admin: Implement admin client infrastructure
fordN 5fee02b
client: Integrate admin client with unified Client class
fordN 8c622ab
tests: Add admin client tests
fordN 91f43dd
docs: Update README with admin client features
fordN 48e4160
docs: Add admin client documentation and examples
fordN 088a0ec
linting and formatting
fordN File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,9 +5,16 @@ | |
| [](https://github.com/edgeandnode/amp-python/actions/workflows/ruff.yml) | ||
|
|
||
|
|
||
| ## Overview | ||
| ## Overview | ||
|
|
||
| Client for issuing queries to an Amp server and working with the returned data. | ||
| Python client for Amp - a high-performance data infrastructure for blockchain data. | ||
|
|
||
| **Features:** | ||
| - **Query Client**: Issue Flight SQL queries to Amp servers | ||
| - **Admin Client**: Manage datasets, deployments, and jobs programmatically | ||
| - **Data Loaders**: Zero-copy loading into PostgreSQL, Redis, Snowflake, Delta Lake, Iceberg, and more | ||
| - **Parallel Streaming**: High-throughput parallel data ingestion with automatic resume | ||
| - **Manifest Generation**: Fluent API for creating and deploying datasets from SQL queries | ||
|
|
||
| ## Installation | ||
|
|
||
|
|
@@ -21,7 +28,57 @@ Client for issuing queries to an Amp server and working with the returned data. | |
| uv venv | ||
| ``` | ||
|
|
||
| ## Useage | ||
| ## Quick Start | ||
|
|
||
| ### Querying Data | ||
|
|
||
| ```python | ||
| from amp import Client | ||
|
|
||
| # Connect to Amp server | ||
| client = Client(url="grpc://localhost:8815") | ||
|
|
||
| # Execute query and convert to pandas | ||
| df = client.query("SELECT * FROM eth.blocks LIMIT 10").to_pandas() | ||
| print(df) | ||
| ``` | ||
|
|
||
| ### Admin Operations | ||
|
|
||
| ```python | ||
| from amp import Client | ||
|
|
||
| # Connect with admin capabilities | ||
| client = Client( | ||
| query_url="grpc://localhost:8815", | ||
| admin_url="http://localhost:8080", | ||
| auth_token="your-token" | ||
| ) | ||
|
|
||
| # Register and deploy a dataset | ||
| job = ( | ||
| client.query("SELECT block_num, hash FROM eth.blocks") | ||
| .with_dependency('eth', '_/[email protected]') | ||
| .register_as('_', 'my_dataset', '1.0.0', 'blocks', 'mainnet') | ||
| .deploy(parallelism=4, end_block='latest', wait=True) | ||
| ) | ||
|
|
||
| print(f"Deployment completed: {job.status}") | ||
| ``` | ||
|
|
||
| ### Loading Data | ||
|
|
||
| ```python | ||
| # Load query results into PostgreSQL | ||
| loader = client.query("SELECT * FROM eth.blocks").load( | ||
| loader_type='postgresql', | ||
| connection='my_pg_connection', | ||
| table_name='eth_blocks' | ||
| ) | ||
| print(f"Loaded {loader.rows_written} rows") | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Marimo | ||
|
|
||
|
|
@@ -30,19 +87,23 @@ Start up a marimo workspace editor | |
| uv run marimo edit | ||
| ``` | ||
|
|
||
| The Marimo app will open a new browser tab where you can create a new notebook, view helpful resources, and | ||
| The Marimo app will open a new browser tab where you can create a new notebook, view helpful resources, and | ||
| browse existing notebooks in the workspace. | ||
|
|
||
| ### Apps | ||
|
|
||
| You can execute python apps and scripts using `uv run <path>` which will give them access to the dependencies | ||
| You can execute python apps and scripts using `uv run <path>` which will give them access to the dependencies | ||
| and the `amp` package. For example, you can run the `execute_query` app with the following command. | ||
| ```bash | ||
| uv run apps/execute_query.py | ||
| ``` | ||
|
|
||
| ## Documentation | ||
|
|
||
| ### Getting Started | ||
| - **[Admin Client Guide](docs/admin_client_guide.md)** - Complete guide for dataset management and deployment | ||
| - **[Admin API Reference](docs/api/admin_api.md)** - Full API documentation for admin operations | ||
|
|
||
| ### Features | ||
| - **[Parallel Streaming Usage Guide](docs/parallel_streaming_usage.md)** - User guide for high-throughput parallel data loading | ||
| - **[Parallel Streaming Design](docs/parallel_streaming.md)** - Technical design documentation for parallel streaming architecture | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what
with_dependencyis doing here in the context of that query. Does the eth_firehose dataset contain eth.blocks as well as logs and transactions datasets, etc.? Is it possible to automatically detect and populate dependencies based on the SQL, or to construct the SQL query in a way that automatically pulls in the dependency?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the dataset's fully qualified name (FQN) is
_/[email protected](<namespace>/<dataset>@<version>) and it's being aliased as simplyeth.blocksis a table in the dataset along withtransactionsandlogs.Agree that this is probably exposing more of the internals to the user than necessary. It reflects the structure of the manifest which requires a section listing all dependencies used in the SQL.
I think that the dependencies list could be generated based on the SQL, but I'm going to let the structure of these datasets and the recommended way to specify dependencies stabilize before making assumptions for the user. Right now we have a few ways of setting up these derived dataset dependencies: using the FQN in all places, specifying an alias like this, or simply using the dataset name without namespace and version (defaults to latest). The server side work for validating and deploying datasets employing all of these options has been WIP and is stabilizing.