generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 266
Closed
Labels
bugThis issue is a bug.This issue is a bug.pending-releaseThis issue will be fixed by an approved PR that hasn't been released yet.This issue will be fixed by an approved PR that hasn't been released yet.
Description
Describe the bug
We observe frequent low throughput errors when reading a streamed S3 body in a lambda function. Our theory is that the decompression and business logic for processing the stream cause the async runtime to behave in a CPU-constrained manner and appears to break keepalive on the underlying stream. This appears to be a worst-case scenario for the throughput estimation logic. The attached reproduction returns an error even though reads separated by 120ms are able to receive data.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Minimum throughput errors should not be raised unless the connection is unable to provide data.
Current Behavior
A minimum throughput error is incorrectly raised.
Reproduction Steps
Cargo.toml
[package]
name = "s3-throughput-min-repro"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = "1.0"
aws-config = { version = "1.5", default-features = false, features = ["client-hyper", "rt-tokio", "rustls"] }
aws-sdk-s3 = { version = "1.42", default-features = false, features = ["rt-tokio", "rustls"] }
clap = { version = "4.5", default-features = false, features = ["std", "derive"]}
tokio = { version = "1.23", default-features = false, features = ["full"] }
tracing = { version = "0.1", default-features = false }
tracing-subscriber = { version = "0.3", default-features = false, features = ["env-filter", "fmt", "std"] }
main.rs
use aws_config::BehaviorVersion;
use aws_sdk_s3::operation::RequestIdExt;
use clap::Parser;
use tokio::io::{AsyncRead, AsyncReadExt};
use tracing::info;
/// Read a file from S3. Reproduces a bug where the AWS SDK returns a low throughput error.
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Cli {
/// S3 bucket to read from
#[arg(long)]
bucket: String,
/// S3 path to read
#[arg(long)]
key: String,
}
pub async fn remote_object<'a>(
s3_client: &'a aws_sdk_s3::Client,
bucket: &str,
key: &str,
) -> Result<impl AsyncRead, anyhow::Error> {
let object = s3_client
.get_object()
.bucket(bucket)
.key(key)
.send()
.await?;
if let Some(id) = object.extended_request_id() {
info!("Request ID: {id}");
}
Ok(object.body.into_async_read())
}
async fn repro(opts: Cli) -> Result<(), anyhow::Error> {
let config = aws_config::defaults(BehaviorVersion::latest())
.region("us-west-2")
.load()
.await;
let s3_client = aws_sdk_s3::Client::new(&config);
let mut object = remote_object(&s3_client, &opts.bucket, &opts.key).await?;
let mut buffer = Vec::with_capacity(5000000);
// Low throughput detection uses a 10-bin[1] window[2] to track requests
// over the last second[3,4]. There is a 20 second grace period before a
// throughput error is raised, implemented as a delay future[5,6]. Errors
// are suppressed if the stream is not being polled, detected by half of the
// bins being empty[7].
//
// 1: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L273
// 2: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L334-L339
// 3: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/options.rs#L79
// 4: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L309
// 5: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime-api/src/client/stalled_stream_protection.rs#L16-L20
// 6: https://github.com/awslabs/aws-sdk-rust/blob/bbf69c2279fa9cdd1af5327ba58c7dad0cb62633/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/http_body_0_4_x.rs#L111-L127
// 7: https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L401
// This repro works by filling the bins with pending requests, which are
// counted as no data in the throughput calculation. The incoming bytes are
// logged into bins, but the pending request label is sticky and takes
// precedence.
//
// Full runtime pauses are used to simulate running alongside a CPU-bound
// workload. (In our case, a lambda that's busy decompressing incoming
// data.)
//
// Seems to me this is effectively the worst-case scenario for slow readers.
// The throughput estimation does not work if most polls return pending
// before becoming ready.
//
// Send requests just above the resolution limit to fill the throughput buckets
loop {
let n = object.read_buf(&mut buffer).await?;
buffer.clear();
std::thread::sleep(std::time::Duration::from_millis(120));
// Good-enough EOF check
if n == 0 {
break Ok(());
}
}
}
#[tokio::main(flavor = "current_thread")]
async fn main() {
if std::env::var("RUST_LOG").is_err() {
std::env::set_var("RUST_LOG", "trace");
}
tracing_subscriber::fmt::init();
let opts = Cli::parse();
let res = repro(opts).await;
if let Err(e) = res {
eprintln!("{:?}", e);
}
}
Possible Solution
No response
Additional Information/Context
No response
Version
├── aws-config v1.5.7
│ ├── aws-credential-types v1.2.1
│ │ ├── aws-smithy-async v1.2.1
│ │ ├── aws-smithy-runtime-api v1.7.2
│ │ │ ├── aws-smithy-async v1.2.1 (*)
│ │ │ ├── aws-smithy-types v1.2.7
│ │ ├── aws-smithy-types v1.2.7 (*)
│ ├── aws-runtime v1.4.3
│ │ ├── aws-credential-types v1.2.1 (*)
│ │ ├── aws-sigv4 v1.2.4
│ │ │ ├── aws-credential-types v1.2.1 (*)
│ │ │ ├── aws-smithy-eventstream v0.60.5
│ │ │ │ ├── aws-smithy-types v1.2.7 (*)
│ │ │ ├── aws-smithy-http v0.60.11
│ │ │ │ ├── aws-smithy-eventstream v0.60.5 (*)
│ │ │ │ ├── aws-smithy-runtime-api v1.7.2 (*)
│ │ │ │ ├── aws-smithy-types v1.2.7 (*)
│ │ │ ├── aws-smithy-runtime-api v1.7.2 (*)
│ │ │ ├── aws-smithy-types v1.2.7 (*)
│ │ ├── aws-smithy-async v1.2.1 (*)
│ │ ├── aws-smithy-eventstream v0.60.5 (*)
│ │ ├── aws-smithy-http v0.60.11 (*)
│ │ ├── aws-smithy-runtime v1.7.1
│ │ │ ├── aws-smithy-async v1.2.1 (*)
│ │ │ ├── aws-smithy-http v0.60.11 (*)
│ │ │ ├── aws-smithy-runtime-api v1.7.2 (*)
│ │ │ ├── aws-smithy-types v1.2.7 (*)
│ │ ├── aws-smithy-runtime-api v1.7.2 (*)
│ │ ├── aws-smithy-types v1.2.7 (*)
│ │ ├── aws-types v1.3.3
│ │ │ ├── aws-credential-types v1.2.1 (*)
│ │ │ ├── aws-smithy-async v1.2.1 (*)
│ │ │ ├── aws-smithy-runtime-api v1.7.2 (*)
│ │ │ ├── aws-smithy-types v1.2.7 (*)
│ ├── aws-sdk-sts v1.44.0
│ │ ├── aws-credential-types v1.2.1 (*)
│ │ ├── aws-runtime v1.4.3 (*)
│ │ ├── aws-smithy-async v1.2.1 (*)
│ │ ├── aws-smithy-http v0.60.11 (*)
│ │ ├── aws-smithy-json v0.60.7
│ │ │ └── aws-smithy-types v1.2.7 (*)
│ │ ├── aws-smithy-query v0.60.7
│ │ │ ├── aws-smithy-types v1.2.7 (*)
│ │ ├── aws-smithy-runtime v1.7.1 (*)
│ │ ├── aws-smithy-runtime-api v1.7.2 (*)
│ │ ├── aws-smithy-types v1.2.7 (*)
│ │ ├── aws-smithy-xml v0.60.9
│ │ ├── aws-types v1.3.3 (*)
│ ├── aws-smithy-async v1.2.1 (*)
│ ├── aws-smithy-http v0.60.11 (*)
│ ├── aws-smithy-json v0.60.7 (*)
│ ├── aws-smithy-runtime v1.7.1 (*)
│ ├── aws-smithy-runtime-api v1.7.2 (*)
│ ├── aws-smithy-types v1.2.7 (*)
│ ├── aws-types v1.3.3 (*)
├── aws-sdk-s3 v1.52.0
│ ├── aws-credential-types v1.2.1 (*)
│ ├── aws-runtime v1.4.3 (*)
│ ├── aws-sigv4 v1.2.4 (*)
│ ├── aws-smithy-async v1.2.1 (*)
│ ├── aws-smithy-checksums v0.60.12
│ │ ├── aws-smithy-http v0.60.11 (*)
│ │ ├── aws-smithy-types v1.2.7 (*)
│ ├── aws-smithy-eventstream v0.60.5 (*)
│ ├── aws-smithy-http v0.60.11 (*)
│ ├── aws-smithy-json v0.60.7 (*)
│ ├── aws-smithy-runtime v1.7.1 (*)
│ ├── aws-smithy-runtime-api v1.7.2 (*)
│ ├── aws-smithy-types v1.2.7 (*)
│ ├── aws-smithy-xml v0.60.9 (*)
│ ├── aws-types v1.3.3 (*)
### Environment details (OS name and version, etc.)
Repro tested on macos. We're hitting this in production on x86_64 lambdas running al2023 and 375 MiB of memory.
### Logs
_No response_
Metadata
Metadata
Assignees
Labels
bugThis issue is a bug.This issue is a bug.pending-releaseThis issue will be fixed by an approved PR that hasn't been released yet.This issue will be fixed by an approved PR that hasn't been released yet.