Skip to content

[Bug]: S3FileStorage raises ValueError when using IAM roles — no fallback to boto3 credential chain #3086

@mvecchionespringhealth

Description

@mvecchionespringhealth

Bug Description

S3FileStorage.__init__ unconditionally raises ValueError when aws_access_key_id or aws_secret_access_key are None, with no fallback to boto3's standard credential chain. This makes S3 storage completely unusable on AWS-managed compute (ECS Fargate, EC2, Lambda) where credentials are provided via IAM roles rather than static key/secret pairs.

Steps to Reproduce

  1. Deploy Cognee on ECS Fargate with STORAGE_BACKEND=s3 and an IAM task role that has S3 permissions
  2. Do not set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (relying on the task role instead)
  3. Call /healthfile_storage reports unhealthy:
"details": "Storage test failed: S3 credentials are not set in the configuration."

Expected Behavior

When explicit credentials are absent, S3FileStorage should fall back to boto3's credential discovery chain (ECS task metadata endpoint, EC2 instance metadata, environment variables, ~/.aws/credentials, etc.) — the same chain that every other AWS SDK client uses by default.

s3fs.S3FileSystem(anon=False) already does this automatically when no key/secret are passed.

Actual Behavior

S3FileStorage.__init__ in cognee/infrastructure/files/storage/S3FileStorage.py:

if s3_config.aws_access_key_id is not None and s3_config.aws_secret_access_key is not None:
    self.s3 = s3fs.S3FileSystem(
        key=s3_config.aws_access_key_id,
        secret=s3_config.aws_secret_access_key,
        token=s3_config.aws_session_token,
        anon=False,
        endpoint_url=s3_config.aws_endpoint_url,
        client_kwargs={"region_name": s3_config.aws_region},
    )
else:
    raise ValueError("S3 credentials are not set in the configuration.")

Proposed Fix

Add an else branch that initialises S3FileSystem without explicit credentials, allowing boto3 to discover them automatically:

if s3_config.aws_access_key_id is not None and s3_config.aws_secret_access_key is not None:
    self.s3 = s3fs.S3FileSystem(
        key=s3_config.aws_access_key_id,
        secret=s3_config.aws_secret_access_key,
        token=s3_config.aws_session_token,
        anon=False,
        endpoint_url=s3_config.aws_endpoint_url,
        client_kwargs={"region_name": s3_config.aws_region},
    )
else:
    # Fall back to boto3 credential chain:
    # ECS task role, EC2 instance metadata, env vars, ~/.aws/credentials, etc.
    self.s3 = s3fs.S3FileSystem(
        anon=False,
        endpoint_url=s3_config.aws_endpoint_url,
        client_kwargs={"region_name": s3_config.aws_region},
    )

This is a one-line change in spirit — s3fs already handles credential discovery correctly when key/secret are omitted. The explicit-credential path is unchanged.

Environment

  • Cognee version: 1.1.2-local (also present in main)
  • Deployment: AWS ECS Fargate with IAM task role
  • STORAGE_BACKEND=s3
  • AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY not set (intentionally, using IAM role)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions