Skip to content

ETCD backup script will delete other files when there is no space left on device #1625

@24sama

Description

@24sama

What is version of KubeKey has the issue?

v3.0.1, v3.0.0, v2.3.0, v2.2.2, v2.2.1, v2.2.0, v2.1.1, v2.1.0, v2.0.0, v1.2.1, v1.2.0, v1.1.1, v1.1.0, v1.0.1

What is your os environment?

none

KubeKey config file

No response

A clear and concise description of what happend.

There is a very extreme case where the kk backup etcd script may erroneously delete / directory files when the node has no space to create directories (i.e not even 4096K).

Suggest using the latest version:

Binary downloads of the latest kk can be found on the Releases page.
Or
Download the latest kk by the following command

curl -sSL https://get-kk.kubesphere.io | sh -

And for the existing cluster installed by KubeKey command (kk), here is a solution.

  1. manually editing the script:
$ vi /usr/local/bin/kube-scripts/etcd-backup.sh
  1. modify the script like the below:

    1. add set -o xxx at the beginning of the script
    2. replace the ; after the cd command with && in the last line

    Here is an example:

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

ETCDCTL_PATH='/usr/local/bin/etcdctl'
ENDPOINTS='https://192.168.100.3:2379'
ETCD_DATA_DIR="/var/lib/etcd"
BACKUP_DIR="/var/backups/kube_etcd/etcd-$(date +%Y-%m-%d-%H-%M-%S)"
KEEPBACKUPNUMBER='6'
ETCDBACKUPSCIPT='/usr/local/bin/kube-scripts'

ETCDCTL_CERT="/etc/ssl/etcd/ssl/admin-node1.pem"
ETCDCTL_KEY="/etc/ssl/etcd/ssl/admin-node1-key.pem"
ETCDCTL_CA_FILE="/etc/ssl/etcd/ssl/ca.pem"

[ ! -d $BACKUP_DIR ] && mkdir -p $BACKUP_DIR

export ETCDCTL_API=2;$ETCDCTL_PATH backup --data-dir $ETCD_DATA_DIR --backup-dir $BACKUP_DIR

sleep 3

{
export ETCDCTL_API=3;$ETCDCTL_PATH --endpoints="$ENDPOINTS" snapshot save $BACKUP_DIR/snapshot.db \
                                   --cacert="$ETCDCTL_CA_FILE" \
                                   --cert="$ETCDCTL_CERT" \
                                   --key="$ETCDCTL_KEY"
} > /dev/null 

sleep 3

cd $BACKUP_DIR/../ && ls -lt |awk '{if(NR > '$KEEPBACKUPNUMBER'){print "rm -rf "$9}}'|sh
  1. reload the new script:
$ systemctl daemon-reload

Relevant log output

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions