Skip to content

[C++] HeadBucket called in S3FS breaking IAM scoped prefixes #49949

@rpep

Description

@rpep

If you have a bucket s3://mybucket which has prefixes which you are allowed to write into via IAM roles, e.g. s3://mybucket/allowed_dir, you cannot currently use S3FileSystem::CreateDir(path, recursive=true) because a call of HeadBucket.

For e.g.:

(base) ray@ryan-test-head-wkzmw:~$ python
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> import pyarrow.fs as fs
>>> s3 = fs.S3FileSystem()
>>> s3.create_dir("test-bucket/pepperr/test", recursive=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/_fs.pyx", line 638, in pyarrow._fs.FileSystem.create_dir
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When testing for existence of bucket 'test-bucket': AWS Error ACCESS_DENIED during HeadBucket operation: No response body.

In this case, the bucket exists, but the user is not allowed to do the HEAD operation on the root bucket itself. Nonetheless, the user is able to write/do things in this location, so they should not be blocked by this.

>>> boto3.client("s3").put_object(Bucket="test-bucket", Key="pepperr/test/probe.txt", Body=b"hello")
{'ResponseMetadata': {'RequestId': 'FXJ3YG658VKQ8725', 'HostId': '37ezZQHJ0KzbBAQEU6uuDYWIIBa9XpNqs/E6neTpT/8XDpTIh7TE6hBXrHJDT+19STrC6QRcUKlp1zqeIMlLfCnvU/lPvtCp', 'HTTPStatusCode': 200, ....}

I think the check comes in here on line 3177; if recursive=True, then you hit a code block which checks if the bucket exists:

Status S3FileSystem::CreateDir(const std::string& s, bool recursive) {
ARROW_ASSIGN_OR_RAISE(auto path, S3Path::FromString(s));
if (path.key.empty()) {
// Create bucket
return impl_->CreateBucket(path.bucket);
}
ARROW_ASSIGN_OR_RAISE(auto backend, impl_->GetBackend());
FileInfo file_info;
if (recursive) {
// Ensure bucket exists
ARROW_ASSIGN_OR_RAISE(bool bucket_exists, impl_->BucketExists(path.bucket));
if (!bucket_exists) {
RETURN_NOT_OK(impl_->CreateBucket(path.bucket));
}
auto key_i = path.key_parts.begin();

Whereas this needs to actually be more permissive; the S3 call should be issued rather than performing an explicit check for bucket existence, and failure reported up the stack.

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions