Skip to content

Service endpoint changes not updated in envoy #293

@drobinson123

Description

@drobinson123

I'm seeing some weird issues with Contour 0.4. It seems like Contour configures Envoy correctly upon startup but then fails to keep envoy updated as resources change (service endpoints specifically). If I restart a contour pod it starts with the configuration I expect to see, and it routes requests correctly, that is until endpoints change. Here's what I see in the logs upon startup -- should I be concerned by the two "gRPC update" messages?

$ kubectl -n heptio-contour logs contour-gv9gk -c envoy -f
[2018-03-19 19:52:23.066][1][info][main] source/server/server.cc:178] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-03-19 19:52:23.072][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:128] cm init: initializing cds
[2018-03-19 19:52:23.073][1][info][config] source/server/configuration_impl.cc:52] loading 0 listener(s)
[2018-03-19 19:52:23.073][1][info][config] source/server/configuration_impl.cc:92] loading tracing configuration
[2018-03-19 19:52:23.073][1][info][config] source/server/configuration_impl.cc:119] loading stats sink configuration
[2018-03-19 19:52:23.073][1][info][main] source/server/server.cc:353] starting main dispatch loop
[2018-03-19 19:52:23.075][1][warning][upstream] source/common/config/grpc_mux_impl.cc:205] gRPC config stream closed: 1,
[2018-03-19 19:52:23.075][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.Cluster failed
[2018-03-19 19:52:23.075][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-03-19 19:52:23.075][1][info][main] source/server/server.cc:337] all clusters initialized. initializing init manager
[2018-03-19 19:52:23.075][1][warning][upstream] source/common/config/grpc_mux_impl.cc:205] gRPC config stream closed: 1,
[2018-03-19 19:52:23.075][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.Listener failed
[2018-03-19 19:52:23.075][1][info][config] source/server/listener_manager_impl.cc:583] all dependencies initialized. starting workers

Contour is running in AWS, with NLB, and TLS. It was deployed w/ the ds-hostnet config w/ the changes below (for TLS):

--- deployment/ds-hostnet/02-contour.yaml
+++ deployment/ds-hostnet/02-contour.yaml
@@ -28,8 +28,10 @@ spec:
         ports:
         - containerPort: 8080
           name: http
+        - containerPort: 8443
+          name: https
         command: ["envoy"]
-        args: ["-c", "/config/contour.yaml", "--service-cluster", "cluster0", "--service-node", "node0"]
+        args: ["-c", "/config/contour.yaml", "--service-cluster", "cluster0", "--service-node", "node0", "-l", "info", "--v2-config-only"]
         volumeMounts:
         - name: contour-config
           mountPath: /config

Snippet from envoy's /clusters endpoint when routing is broken:

admin/fleet/8080::default_priority::max_connections::1024
admin/fleet/8080::default_priority::max_pending_requests::1024
admin/fleet/8080::default_priority::max_requests::1024
admin/fleet/8080::default_priority::max_retries::3
admin/fleet/8080::high_priority::max_connections::1024
admin/fleet/8080::high_priority::max_pending_requests::1024
admin/fleet/8080::high_priority::max_requests::1024
admin/fleet/8080::high_priority::max_retries::3
admin/fleet/8080::added_via_api::true

Snippet from envoy's /clusters endpoint when routing is working:

admin/fleet/8080::default_priority::max_connections::1024
admin/fleet/8080::default_priority::max_pending_requests::1024
admin/fleet/8080::default_priority::max_requests::1024
admin/fleet/8080::default_priority::max_retries::3
admin/fleet/8080::high_priority::max_connections::1024
admin/fleet/8080::high_priority::max_pending_requests::1024
admin/fleet/8080::high_priority::max_requests::1024
admin/fleet/8080::high_priority::max_retries::3
admin/fleet/8080::added_via_api::true
admin/fleet/8080::100.96.1.166:8080::cx_active::0
admin/fleet/8080::100.96.1.166:8080::cx_connect_fail::0
admin/fleet/8080::100.96.1.166:8080::cx_total::0
admin/fleet/8080::100.96.1.166:8080::rq_active::0
admin/fleet/8080::100.96.1.166:8080::rq_error::0
admin/fleet/8080::100.96.1.166:8080::rq_success::0
admin/fleet/8080::100.96.1.166:8080::rq_timeout::0
admin/fleet/8080::100.96.1.166:8080::rq_total::0
admin/fleet/8080::100.96.1.166:8080::health_flags::healthy
admin/fleet/8080::100.96.1.166:8080::weight::1
admin/fleet/8080::100.96.1.166:8080::region::
admin/fleet/8080::100.96.1.166:8080::zone::
admin/fleet/8080::100.96.1.166:8080::sub_zone::
admin/fleet/8080::100.96.1.166:8080::canary::false
admin/fleet/8080::100.96.1.166:8080::success_rate::-1

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions