Skip to content

Minimum throughput detection incorrectly raises an error #1202

@GeorgeHahn

Description

@GeorgeHahn

Describe the bug

We observe frequent low throughput errors when reading a streamed S3 body in a lambda function. Our theory is that the decompression and business logic for processing the stream cause the async runtime to behave in a CPU-constrained manner and appears to break keepalive on the underlying stream. This appears to be a worst-case scenario for the throughput estimation logic. The attached reproduction returns an error even though reads separated by 120ms are able to receive data.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Minimum throughput errors should not be raised unless the connection is unable to provide data.

Current Behavior

A minimum throughput error is incorrectly raised.

Reproduction Steps

Cargo.toml

[package]
name = "s3-throughput-min-repro"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0"
aws-config = { version = "1.5", default-features = false, features = ["client-hyper", "rt-tokio", "rustls"] }
aws-sdk-s3 = { version = "1.42", default-features = false, features = ["rt-tokio", "rustls"] }
clap = { version = "4.5", default-features = false, features = ["std", "derive"]}
tokio = { version = "1.23", default-features = false, features = ["full"] }
tracing = { version = "0.1", default-features = false }
tracing-subscriber = { version = "0.3", default-features = false, features = ["env-filter", "fmt", "std"] }

main.rs

use aws_config::BehaviorVersion;
use aws_sdk_s3::operation::RequestIdExt;
use clap::Parser;
use tokio::io::{AsyncRead, AsyncReadExt};
use tracing::info;

/// Read a file from S3. Reproduces a bug where the AWS SDK returns a low throughput error.
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Cli {
    /// S3 bucket to read from
    #[arg(long)]
    bucket: String,
    /// S3 path to read
    #[arg(long)]
    key: String,
}

pub async fn remote_object<'a>(
    s3_client: &'a aws_sdk_s3::Client,
    bucket: &str,
    key: &str,
) -> Result<impl AsyncRead, anyhow::Error> {
    let object = s3_client
        .get_object()
        .bucket(bucket)
        .key(key)
        .send()
        .await?;
    if let Some(id) = object.extended_request_id() {
        info!("Request ID: {id}");
    }
    Ok(object.body.into_async_read())
}

async fn repro(opts: Cli) -> Result<(), anyhow::Error> {
    let config = aws_config::defaults(BehaviorVersion::latest())
        .region("us-west-2")
        .load()
        .await;
    let s3_client = aws_sdk_s3::Client::new(&config);

    let mut object = remote_object(&s3_client, &opts.bucket, &opts.key).await?;
    let mut buffer = Vec::with_capacity(5000000);

    // Low throughput detection uses a 10-bin[1] window[2] to track requests
    // over the last second[3,4]. There is a 20 second grace period before a
    // throughput error is raised, implemented as a delay future[5,6]. Errors
    // are suppressed if the stream is not being polled, detected by half of the
    // bins being empty[7].
    //
    // 1: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L273
    // 2: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L334-L339
    // 3: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/options.rs#L79
    // 4: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L309
    // 5: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime-api/src/client/stalled_stream_protection.rs#L16-L20
    // 6: https://github.com/awslabs/aws-sdk-rust/blob/bbf69c2279fa9cdd1af5327ba58c7dad0cb62633/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/http_body_0_4_x.rs#L111-L127
    // 7: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L401

    // This repro works by filling the bins with pending requests, which are
    // counted as no data in the throughput calculation. The incoming bytes are
    // logged into bins, but the pending request label is sticky and takes
    // precedence.
    //
    // Full runtime pauses are used to simulate running alongside a CPU-bound
    // workload. (In our case, a lambda that's busy decompressing incoming
    // data.)
    //
    // Seems to me this is effectively the worst-case scenario for slow readers.
    // The throughput estimation does not work if most polls return pending
    // before becoming ready.
    //

    // Send requests just above the resolution limit to fill the throughput buckets
    loop {
        let n = object.read_buf(&mut buffer).await?;
        buffer.clear();

        std::thread::sleep(std::time::Duration::from_millis(120));

        // Good-enough EOF check
        if n == 0 {
            break Ok(());
        }
    }
}

#[tokio::main(flavor = "current_thread")]
async fn main() {
    if std::env::var("RUST_LOG").is_err() {
        std::env::set_var("RUST_LOG", "trace");
    }
    tracing_subscriber::fmt::init();
    let opts = Cli::parse();
    let res = repro(opts).await;
    if let Err(e) = res {
        eprintln!("{:?}", e);
    }
}

Possible Solution

No response

Additional Information/Context

No response

Version

├── aws-config v1.5.7
│   ├── aws-credential-types v1.2.1
│   │   ├── aws-smithy-async v1.2.1
│   │   ├── aws-smithy-runtime-api v1.7.2
│   │   │   ├── aws-smithy-async v1.2.1 (*)
│   │   │   ├── aws-smithy-types v1.2.7
│   │   ├── aws-smithy-types v1.2.7 (*)
│   ├── aws-runtime v1.4.3
│   │   ├── aws-credential-types v1.2.1 (*)
│   │   ├── aws-sigv4 v1.2.4
│   │   │   ├── aws-credential-types v1.2.1 (*)
│   │   │   ├── aws-smithy-eventstream v0.60.5
│   │   │   │   ├── aws-smithy-types v1.2.7 (*)
│   │   │   ├── aws-smithy-http v0.60.11
│   │   │   │   ├── aws-smithy-eventstream v0.60.5 (*)
│   │   │   │   ├── aws-smithy-runtime-api v1.7.2 (*)
│   │   │   │   ├── aws-smithy-types v1.2.7 (*)
│   │   │   ├── aws-smithy-runtime-api v1.7.2 (*)
│   │   │   ├── aws-smithy-types v1.2.7 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-eventstream v0.60.5 (*)
│   │   ├── aws-smithy-http v0.60.11 (*)
│   │   ├── aws-smithy-runtime v1.7.1
│   │   │   ├── aws-smithy-async v1.2.1 (*)
│   │   │   ├── aws-smithy-http v0.60.11 (*)
│   │   │   ├── aws-smithy-runtime-api v1.7.2 (*)
│   │   │   ├── aws-smithy-types v1.2.7 (*)
│   │   ├── aws-smithy-runtime-api v1.7.2 (*)
│   │   ├── aws-smithy-types v1.2.7 (*)
│   │   ├── aws-types v1.3.3
│   │   │   ├── aws-credential-types v1.2.1 (*)
│   │   │   ├── aws-smithy-async v1.2.1 (*)
│   │   │   ├── aws-smithy-runtime-api v1.7.2 (*)
│   │   │   ├── aws-smithy-types v1.2.7 (*)
│   ├── aws-sdk-sts v1.44.0
│   │   ├── aws-credential-types v1.2.1 (*)
│   │   ├── aws-runtime v1.4.3 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-http v0.60.11 (*)
│   │   ├── aws-smithy-json v0.60.7
│   │   │   └── aws-smithy-types v1.2.7 (*)
│   │   ├── aws-smithy-query v0.60.7
│   │   │   ├── aws-smithy-types v1.2.7 (*)
│   │   ├── aws-smithy-runtime v1.7.1 (*)
│   │   ├── aws-smithy-runtime-api v1.7.2 (*)
│   │   ├── aws-smithy-types v1.2.7 (*)
│   │   ├── aws-smithy-xml v0.60.9
│   │   ├── aws-types v1.3.3 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-http v0.60.11 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-runtime v1.7.1 (*)
│   ├── aws-smithy-runtime-api v1.7.2 (*)
│   ├── aws-smithy-types v1.2.7 (*)
│   ├── aws-types v1.3.3 (*)
├── aws-sdk-s3 v1.52.0
│   ├── aws-credential-types v1.2.1 (*)
│   ├── aws-runtime v1.4.3 (*)
│   ├── aws-sigv4 v1.2.4 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-checksums v0.60.12
│   │   ├── aws-smithy-http v0.60.11 (*)
│   │   ├── aws-smithy-types v1.2.7 (*)
│   ├── aws-smithy-eventstream v0.60.5 (*)
│   ├── aws-smithy-http v0.60.11 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-runtime v1.7.1 (*)
│   ├── aws-smithy-runtime-api v1.7.2 (*)
│   ├── aws-smithy-types v1.2.7 (*)
│   ├── aws-smithy-xml v0.60.9 (*)
│   ├── aws-types v1.3.3 (*)


### Environment details (OS name and version, etc.)

Repro tested on macos. We're hitting this in production on x86_64 lambdas running al2023 and 375 MiB of memory.

### Logs

_No response_

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.pending-releaseThis issue will be fixed by an approved PR that hasn't been released yet.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions