Skip to content

Commit c675bf0

Browse files
authored
Debug info ci failures (#3400)
* add bash logic to find more context of why things go wrong * fail to schedule rabbit; and fail to start brig to see if the logs are as expected. To be reverted * changelog * fixup * PR feedback * take bash from env * undo one on-purpose-unschedulable bug * no need for context before Events: * undo brig misconfiguration
1 parent 7cb9ab2 commit c675bf0

File tree

3 files changed

+43
-0
lines changed

3 files changed

+43
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
On CI runs, provide additional context when 'helmfile install' fails.

hack/bin/integration-setup-federation.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,20 @@ export FEDERATION_DOMAIN_2="federation-test-helper.$FEDERATION_DOMAIN_BASE"
4949

5050
echo "Installing charts..."
5151

52+
set +e
5253
helmfile --environment "$HELMFILE_ENV" --file "${TOP_LEVEL}/hack/helmfile.yaml" sync --skip-deps --concurrency 0
54+
EXIT_CODE=$?
55+
56+
if (( EXIT_CODE > 0)); then
57+
echo "!! Helm install failed. Attempting to get some more information ..."
58+
59+
kubectl -n "$NAMESPACE_1" get events | grep -v "Normal "
60+
kubectl -n "$NAMESPACE_2" get events | grep -v "Normal "
61+
"${DIR}/kubectl-get-debug-info.sh" "$NAMESPACE_1"
62+
"${DIR}/kubectl-get-debug-info.sh" "$NAMESPACE_2"
63+
exit $EXIT_CODE
64+
fi
65+
set -e
5366

5467
# wait for fakeSNS to create resources. TODO, cleaner: make initiate-fake-aws-sns a post hook. See cassandra-migrations chart for an example.
5568
resourcesReady() {

hack/bin/kubectl-get-debug-info.sh

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#!/usr/bin/env bash
2+
3+
USAGE="$0 <NAMESPACE>"
4+
NAMESPACE=${1:?$USAGE}
5+
6+
echo "Checking pods in namespace '${NAMESPACE}' that failed to schedule..."
7+
8+
# Get pods that failed to schedule
9+
UNSCHEDULED_PODS=$(kubectl get pods --namespace "$NAMESPACE" -o json | jq -r '.items[] | select(.status.phase=="Pending") | .metadata.name')
10+
11+
for POD in $UNSCHEDULED_PODS; do
12+
echo "Pod $POD failed to schedule for the following reasons:"
13+
# Get events for pod
14+
kubectl describe pod "$POD" --namespace "$NAMESPACE" | grep -A 10 "Events:"
15+
echo ""
16+
done
17+
18+
echo "Checking pods in namespace '${NAMESPACE}' that are crashlooping..."
19+
20+
# Get pods that are crashlooping
21+
CRASHLOOPING_PODS=$(kubectl get pods --namespace "$NAMESPACE" -o json | jq -r '.items[] | select(.status.containerStatuses[]?.state.waiting.reason=="CrashLoopBackOff") | .metadata.name')
22+
23+
for POD in $CRASHLOOPING_PODS; do
24+
echo "Pod $POD is crashlooping for the following reasons:"
25+
# Get logs of previous run for pod
26+
kubectl logs "$POD" --namespace "$NAMESPACE" --previous
27+
echo ""
28+
done
29+

0 commit comments

Comments
 (0)