Harvey S3 Client Replacement — Implementation Plan

See s3-replacement-design.md for the full design rationale and API mapping.


Phase A — Dependency Swap

Goal: Remove minio-go/v7 from go.mod; add the three aws-sdk-go-v2 modules. Verify the build compiles (with a stub implementation if needed).

Files to modify

File Change
go.mod Remove github.com/minio/minio-go/v7; add github.com/aws/aws-sdk-go-v2/{aws,config,service/s3}
go.sum Regenerated by go mod tidy

Commands

cd harvey
go get github.com/aws/aws-sdk-go-v2/aws
go get github.com/aws/aws-sdk-go-v2/config
go get github.com/aws/aws-sdk-go-v2/service/s3
go get github.com/aws/aws-sdk-go-v2/credentials
go mod tidy

go mod tidy removes minio-go/v7 and its transitive dependencies from go.sum automatically.

Acceptance criteria


Phase B — Rewrite remote_s3.go

Goal: Replace the MinIO client with aws-sdk-go-v2 while keeping the RemoteReader interface unchanged.

Files to modify

File Change
remote_s3.go Full rewrite of the implementation; public s3Reader interface unchanged
remote_test.go Update mock server to serve AWS-SDK-compatible responses

New s3Reader struct

import (
    "context"
    "fmt"
    "io"
    "os"
    "strings"

    "github.com/aws/aws-sdk-go-v2/aws"
    awsconfig "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/credentials"
    "github.com/aws/aws-sdk-go-v2/service/s3"
    "github.com/aws/aws-sdk-go-v2/service/s3/types"
)

// s3Reader implements RemoteReader for s3:// URIs using the AWS SDK v2.
// It works with AWS S3, MinIO server, Cloudflare R2, and any S3-compatible
// endpoint.
type s3Reader struct {
    client   *s3.Client
    endpoint string
}

newS3Reader constructor

func newS3Reader(ctx context.Context, endpoint, accessKey, secretKey, region string) (*s3Reader, error) {
    if region == "" {
        region = "us-east-1"
    }

    var opts []func(*awsconfig.LoadOptions) error
    opts = append(opts, awsconfig.WithRegion(region))

    if accessKey != "" && secretKey != "" {
        opts = append(opts, awsconfig.WithCredentialsProvider(
            credentials.NewStaticCredentialsProvider(accessKey, secretKey, ""),
        ))
    }

    cfg, err := awsconfig.LoadDefaultConfig(ctx, opts...)
    if err != nil {
        return nil, fmt.Errorf("s3: load config: %w", err)
    }

    client := s3.NewFromConfig(cfg, func(o *s3.Options) {
        if endpoint != "" {
            o.BaseEndpoint = aws.String(endpoint)
        }
        o.UsePathStyle = true // required for non-AWS endpoints
    })

    return &s3Reader{client: client, endpoint: endpoint}, nil
}

Stat method

func (r *s3Reader) Stat(ctx context.Context, uri string) (RemoteFileInfo, error) {
    bucket, key, ok := parseS3URI(uri)
    if !ok {
        return RemoteFileInfo{}, fmt.Errorf("s3: not an S3 URI: %q", uri)
    }

    resp, err := r.client.HeadObject(ctx, &s3.HeadObjectInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
    })
    if err != nil {
        if isNotFound(err) {
            return RemoteFileInfo{}, ErrNotFound
        }
        return RemoteFileInfo{}, fmt.Errorf("s3: stat %s: %w", uri, err)
    }

    size := int64(0)
    if resp.ContentLength != nil {
        size = *resp.ContentLength
    }
    return RemoteFileInfo{
        URI:         uri,
        Size:        size,
        LastModified: aws.ToTime(resp.LastModified),
        ContentType: aws.ToString(resp.ContentType),
    }, nil
}

Get method

func (r *s3Reader) Get(ctx context.Context, uri string, dst io.Writer) error {
    bucket, key, ok := parseS3URI(uri)
    if !ok {
        return fmt.Errorf("s3: not an S3 URI: %q", uri)
    }

    resp, err := r.client.GetObject(ctx, &s3.GetObjectInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
    })
    if err != nil {
        if isNotFound(err) {
            return ErrNotFound
        }
        return fmt.Errorf("s3: get %s: %w", uri, err)
    }
    defer resp.Body.Close()

    if _, err := io.Copy(dst, resp.Body); err != nil {
        return fmt.Errorf("s3: read %s: %w", uri, err)
    }
    return nil
}

List method

func (r *s3Reader) List(ctx context.Context, uri string) ([]RemoteFileInfo, error) {
    bucket, prefix, ok := parseS3URI(uri)
    if !ok {
        return nil, fmt.Errorf("s3: not an S3 URI: %q", uri)
    }

    paginator := s3.NewListObjectsV2Paginator(r.client, &s3.ListObjectsV2Input{
        Bucket: aws.String(bucket),
        Prefix: aws.String(prefix),
    })

    var results []RemoteFileInfo
    for paginator.HasMorePages() {
        page, err := paginator.NextPage(ctx)
        if err != nil {
            return nil, fmt.Errorf("s3: list %s: %w", uri, err)
        }
        for _, obj := range page.Contents {
            size := int64(0)
            if obj.Size != nil {
                size = *obj.Size
            }
            results = append(results, RemoteFileInfo{
                URI:          "s3://" + bucket + "/" + aws.ToString(obj.Key),
                Size:         size,
                LastModified: aws.ToTime(obj.LastModified),
            })
        }
    }
    return results, nil
}

isNotFound helper

func isNotFound(err error) bool {
    var nsk *types.NoSuchKey
    var nsb *types.NoSuchBucket
    return errors.As(err, &nsk) || errors.As(err, &nsb)
}

parseS3URI helper (unchanged from current implementation)

The existing parseS3URI function parses s3://bucket/key into (bucket, key, true). No change needed.


Phase C — Test Updates

Goal: Update remote_test.go to work with AWS SDK v2’s request format.

Background

The current tests use a minimal HTTP mock server. The AWS SDK v2 sends HeadObject, GetObject, and ListObjectsV2 requests to the endpoint. The mock server must handle these correctly.

Mock server changes

The main difference from the MinIO client is the request signing and the ListObjectsV2 XML response format. The mock server must:

  1. Accept requests without validating signatures (use config.WithCredentialsProvider(credentials.AnonymousCredentials{}) in test setup, or a mock signer that always succeeds).
  2. Return HeadObject responses with Content-Length and Last-Modified headers (same as before).
  3. Return GetObject responses with the object body.
  4. Return ListObjectsV2 XML for list requests (slightly different XML from ListObjects v1).

ListObjectsV2 XML format

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>my-bucket</Name>
  <Prefix>docs/</Prefix>
  <KeyCount>2</KeyCount>
  <MaxKeys>1000</MaxKeys>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>docs/readme.txt</Key>
    <LastModified>2026-06-18T10:00:00.000Z</LastModified>
    <Size>1024</Size>
  </Contents>
  <Contents>
    <Key>docs/spec.pdf</Key>
    <LastModified>2026-06-18T11:00:00.000Z</LastModified>
    <Size>204800</Size>
  </Contents>
</ListBucketResult>

The mock server distinguishes ListObjectsV2 from HeadObject/GetObject by the URL path and query parameters: list requests include ?list-type=2 in the query string.

Acceptance criteria


Dependency Graph

Phase A (dependency swap)
    └─► Phase B (rewrite remote_s3.go)
            └─► Phase C (test updates)

Phases must be done in order. However, Phases A and B can be a single commit since Phase A alone leaves the build broken (imports removed, implementation not yet updated).


Open Questions