Add autogluon-cloud python setup API by AnirudhDagar · Pull Request #213 · autogluon/autogluon-cloud

AnirudhDagar · 2026-05-08T15:42:28Z

Introduces autogluon.cloud.bootstrap/register/status/teardown Python API. This functionality will be extended to a CLI interface as well using click and rich in a subsequent PR. Both will use the shared cloud_setup engine that provisions the CloudFormation stack, writes a per-profile config at ~/.autogluon/cloud.yaml, and tears it down cleanly.

Usage

from autogluon.cloud import bootstrap, status, teardown

bootstrap()       # uses the defaults
# OR
bootstrap(backend="sagemaker", session=<boto3 session>, stack_name="my_stack")

status()                                    # health check
teardown(delete_bucket_contents=True)       # cleanup

bootstrap() deploys the CloudFormation stack and calls method register to save outputs to ~/.autogluon/cloud.yaml.

Follow Up PRs in order:

CLI equivalent (autogluon-cloud bootstrap/status/teardown) built on the same setup engine
Wire the config auto-load into CloudPredictor.__init__ so users don't need to pass cloud_output_path= after bootstrap()
Update docs/tutorials/autogluon-cloud.md to lead with the agc.initialize() quick setup; add init/status/teardown to docs/api.rst

Note: Used opus 4.7 for development, please review carefully.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

shchur

Thanks! Left a few comments

- Public API is now `bootstrap`, `register`, `status`, `teardown`, exposed at top level (`from autogluon.cloud import bootstrap, ...`). - Flat single-config YAML; removed Profile / multi-profile machinery. - `register()` lets users persist existing role_arn/bucket without touching AWS; `bootstrap()` calls it after a successful CFN deploy. - Replace `aws_profile` string with `session: Optional[boto3.Session]`. - Verbose progress prints now include account ID and region. - Strict RuntimeError when no AWS region can be detected. - Rename local `Backend` Literal to `BackendName` (was shadowing the Backend ABC). Source `SUPPORTED_SETUP_BACKENDS` from backend/constant.py. - Drop unused CONFIG_VERSION field.

shchur

Just a bunch of minor comments, but overall this looks great!

shchur · 2026-05-18T15:55:14Z

+
+__all__ = ["bootstrap", "register", "status", "teardown"]
+
+BackendName = Literal[SAGEMAKER, RAY_AWS]


This looks like a syntax error - I think Literal requires the arguments to be strings, not variables.

i see, wasn't aware of that. changing this now

shchur · 2026-05-18T16:26:30Z

+def teardown(
+    *,
+    session: Optional[boto3.Session] = None,
+    delete_bucket_contents: bool = False,


I wonder if we can put any more guardrails in place, this looks like a really dangerous operation 😬

Two ideas:

Maybe we can just tell the user to empty the bucket themselves?

Ask to put in the bucket name as a confirmation.

I am leaning towards 1

yeah, i removed the s3 bucket deletion then. If it is empty it will anyway be removed with the stack removal. If it is not empty, let's not touch it.

shchur · 2026-05-18T16:33:51Z

+    return {
+        "found": True,
+        "config_path": str(get_config_path()),
+        "config": config,
+        "checks": checks,
+    }


Two comments

status() returns a loose Dict[str, Any] with two shapes (found=True/False) — let's maybe return a TypedDict or dataclass, or None if there is no config found.

Drop check_role param — instead, handle AccessDenied gracefully in _check_role/_check_stack by returning something like "ok (unverified)" rather than "failed". Currently they report failure when it's actually a caller permissions issue, not a broken resource.

Thanks that makes sense, created a dataclass

shchur · 2026-05-18T16:39:44Z

+            raise
+        print(f"Stack {stack_name!r} already exists — reusing it.")
+
+    cfn.get_waiter("stack_create_complete").wait(StackName=stack_name)


Comment from our good friend:

_provision_stack waits on stack_create_complete even when the stack already exists — if it's already CREATE_COMPLETE, the waiter succeeds immediately, but if it's in UPDATE_ROLLBACK_COMPLETE or another terminal state, this could hang or error confusingly. Should either use describe_stacks to check current state after AlreadyExistsException, or just report the outputs directly.

- Config restructured: cloud.yaml is now keyed by backend name, so a user can have entries for sagemaker and ray_aws side by side. Introduces BackendConfig (per-backend record); CloudConfig wraps Dict[str, BackendConfig]. - bootstrap()/register() take backend= to select the slot. - status() returns Dict[str, StatusReport], one entry per configured backend. - teardown(backend=...) tears down that backend's stack and removes its config entry; teardown() (no arg) tears down everything. - Typed status return via StatusReport dataclass. - AccessDenied / Forbidden errors in _check_* now surface as "ok (unverified — caller lacks <perm>)" instead of "failed". - Drop delete_bucket_contents from teardown(); user empties buckets first. - _provision_stack: skip the create-waiter when stack already existed (avoids confusing hangs on ROLLBACK_COMPLETE etc). - Rename register parameter role_arn → role to match SageMaker SDK convention. - BackendName Literal uses string literals (PEP 586 compliant). - Add inline comment explaining iam:GetRole RoleName parsing for ARNs with paths.

AnirudhDagar

Thanks for the review @shchur, I've addressed the comments and pushed an update for the same.

AnirudhDagar · 2026-05-19T08:31:45Z

+
+__all__ = ["bootstrap", "register", "status", "teardown"]
+
+BackendName = Literal[SAGEMAKER, RAY_AWS]


i see, wasn't aware of that. changing this now

AnirudhDagar · 2026-05-19T11:31:38Z

+    we only check existence, not the caller's permission to assume it.
+    """
+    try:
+        session.client("iam").get_role(RoleName=role_arn.rsplit("/", 1)[-1])


iam:GetRole's RoleName parameter takes the bare role name without the path. Per the IAM docs, an ARN like arn:aws:iam::123:role/service-role/MyRole has path = /service-role/ and role name = MyRole. And RoleName only accepts the bare name (MyRole), it rejects path-prefixed values.
rsplit("/", 1)[-1] always gets the last segment (the role name) regardless of how many path components exist in between, so it works.

I ran a quick smoke test against real IAM to confirm:

iam.get_role(RoleName="NonExistentBareName") # → NoSuchEntity (format accepted, role just doesn't exist) iam.get_role(RoleName="service-role/NonExistentName") # → ValidationError: roleName must contain only alphanumeric and +=,.@_- iam.get_role(RoleName="team/prod/NonExistentName") # → ValidationError: same

AnirudhDagar · 2026-05-19T12:46:28Z

+def teardown(
+    *,
+    session: Optional[boto3.Session] = None,
+    delete_bucket_contents: bool = False,


yeah, i removed the s3 bucket deletion then. If it is empty it will anyway be removed with the stack removal. If it is not empty, let's not touch it.

AnirudhDagar · 2026-05-19T12:47:51Z

+    return {
+        "found": True,
+        "config_path": str(get_config_path()),
+        "config": config,
+        "checks": checks,
+    }


Thanks that makes sense, created a dataclass

shchur

Thanks! Feel free to merge after addressing the remaining comments

- SUPPORTED_SETUP_BACKENDS → SUPPORTED_BACKENDS - AUTOGLUON_CLOUD_CONFIG_DIR → AG_CONFIG_DIR (match repo's AG_* env-var convention) - status(): collapse `for name in list(...)` + None-check to `for name, cfg in config.backends.items()` - Fold _PERMISSION_ERROR_CODES into _is_permission_error (single call site) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AnirudhDagar · 2026-05-19T15:09:05Z

Thanks @shchur for multiple rounds of reviews and helping make the design much better! I'll merge once the CI is green.

AnirudhDagar force-pushed the improve_setup branch 2 times, most recently from f867ee5 to 8f35030 Compare May 12, 2026 13:05

AnirudhDagar changed the title ~~Add autogluon-cloud CLI and python setup API~~ Add autogluon-cloud python setup API May 12, 2026

AnirudhDagar marked this pull request as ready for review May 12, 2026 13:12

AnirudhDagar requested a review from shchur May 12, 2026 13:13

AnirudhDagar mentioned this pull request May 12, 2026

Add a skeleton for the foundation model class #217

Merged

1 task

AnirudhDagar force-pushed the improve_setup branch from 8f35030 to 209855c Compare May 18, 2026 09:00

Add Python setup API (autogluon.cloud.initialize/status/teardown)

3960b60

AnirudhDagar force-pushed the improve_setup branch from 209855c to 3960b60 Compare May 18, 2026 09:08

AnirudhDagar added 2 commits May 18, 2026 11:22

remove redundant logging

5855fd3

lint

7f9cb48

shchur reviewed May 18, 2026

View reviewed changes

Comment thread src/autogluon/cloud/cloud_setup.py Outdated

Comment thread src/autogluon/cloud/cloud_setup.py

Comment thread setup.py

Comment thread src/autogluon/cloud/cloud_setup.py Outdated

shchur reviewed May 18, 2026

View reviewed changes

AnirudhDagar commented May 19, 2026

View reviewed changes

shchur approved these changes May 19, 2026

View reviewed changes

Comment thread src/autogluon/cloud/backend/constant.py Outdated

Comment thread src/autogluon/cloud/config.py Outdated

Comment thread src/autogluon/cloud/cloud_setup.py Outdated

Comment thread src/autogluon/cloud/cloud_setup.py Outdated

Comment thread setup.py

AnirudhDagar merged commit cdbd945 into autogluon:master May 19, 2026
12 checks passed

AnirudhDagar mentioned this pull request May 19, 2026

[WIP] Add Setup CLI #222

Draft


		__all__ = ["bootstrap", "register", "status", "teardown"]

		BackendName = Literal[SAGEMAKER, RAY_AWS]

Conversation

AnirudhDagar commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Uh oh!

shchur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shchur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AnirudhDagar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shchur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnirudhDagar commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnirudhDagar commented May 8, 2026 •

edited

Loading