Skip to content

WSREP: wsrep::connect() failed: 7 #4

@trompx

Description

@trompx

Hello,

After I launch the cluster through marathon, the seed is running on host 1 (ip 192.168.33.101), I then scale the number of instances of node to 1 (it will try to launch the node on host 2 with ip 192.168.33.102) and I get an error "WSREP: wsrep::connect() failed: 7". Here is the full log :

CLUSTERCHECK_PASSWORD=62d50cf100bcbb8755a85f9936df32a9a39b1830171d25e59de17cb56e9d10da
+ QCOMM=
+ CLUSTER_NAME=cluster
+ MYSQL_MODE_ARGS=
+ case "$1" in
+ '[' -z galera.service.consul ']'
+ ADDRS=galera.service.consul
+ SEP=
+ for ADDR in '${ADDRS//,/ }'
+ expr galera.service.consul : '^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$'
++ paste -sd ,
++ awk '{ print $4 }'
++ host -t A galera.service.consul
+ QCOMM+=192.168.33.102
+ SEP=,
+ shift 2
+ echo 'Starting node, connecting to qcomm://192.168.33.102'
Starting node, connecting to qcomm://192.168.33.102
+ set +e -m
+ trap shutdown TERM INT
+ wait 19
+ /mysqld.sh --console --wsrep_cluster_name=cluster --wsrep_cluster_address=gcomm://192.168.33.102 --wsrep_sst_auth=xtrabackup:3240fd7as9f8798 --defau
lt-time-zone=+00:00
+ /bin/galera-healthcheck -password=62d50cf100bcbb8755a85f9936df32a9a39b1830171d25e59de17cb56e9d10da -pidfile=/var/run/galera-healthcheck.pid -user cl
ustercheck
/usr/sbin/mysqld
Docker startscript:  Get the GTID positon
150527  1:21:58 [Note] mysqld (mysqld 10.0.19-MariaDB-1~trusty-wsrep-log) starting as process 19 ...
150527  1:21:59 [Note] WSREP: Read nil XID from storage engines, skipping position init
150527  1:21:59 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
150527  1:21:59 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <[email protected]> loaded successfully.
150527  1:21:59 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
150527  1:21:59 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
150527  1:21:59 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
150527  1:21:59 [Note] WSREP: Passing config to GCS: base_host = 172.17.0.96; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict =
0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period
 = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.
view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache
; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size
 = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0;
 gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery
150527  1:21:59 [Note] WSREP: Service thread queue flushed.
150527  1:21:59 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
150527  1:21:59 [Note] WSREP: wsrep_sst_grab()
150527  1:21:59 [Note] WSREP: Start replication
150527  1:21:59 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
150527  1:21:59 [Note] WSREP: protonet asio version 0
150527  1:21:59 [Note] WSREP: Using CRC-32C for message checksums.
150527  1:21:59 [Note] WSREP: backend: asio
150527  1:21:59 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
150527  1:21:59 [Note] WSREP: restore pc from disk failed
150527  1:21:59 [Note] WSREP: GMCast version 0
150527  1:21:59 [Note] WSREP: (c5446e45, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
150527  1:21:59 [Note] WSREP: (c5446e45, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
150527  1:21:59 [Note] WSREP: EVS version 0
150527  1:21:59 [Note] WSREP: gcomm: connecting to group 'cluster', peer '192.168.33.102:'
150527  1:22:02 [Warning] WSREP: no nodes coming from prim view, prim not possible
150527  1:22:02 [Note] WSREP: view(view_id(NON_PRIM,c5446e45,1) memb {
        c5446e45,0
} joined {
} left {
} partitioned {
})
150527  1:22:02 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50748S), skipping check
150527  1:22:32 [Note] WSREP: view((empty))
150527  1:22:32 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():161
150527  1:22:32 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend
connection: -110 (Connection timed out)
150527  1:22:32 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'cluster'
 at 'gcomm://192.168.33.102': -110 (Connection timed out)
150527  1:22:32 [ERROR] WSREP: gcs connect failed: Connection timed out
150527  1:22:32 [ERROR] WSREP: wsrep::connect() failed: 7
150527  1:22:32 [ERROR] Aborting

150527  1:22:32 [Note] WSREP: Service disconnected.
150527  1:22:33 [Note] WSREP: Some threads may fail to exit.
150527  1:22:33 [Note] mysqld: Shutdown complete

+ RC=1
+ test -s /var/run/galera-healthcheck.pid
++ cat /var/run/galera-healthcheck.pid
+ kill 18
+ exit 1

In the start file, the gcomm is defined by :

ADDRS="$2" # with $2 = galera.service.consul
for ADDR in ${ADDRS//,/ }; do
    if expr "$ADDR" : '^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$' >/dev/null; then
        QCOMM+="$SEP$ADDR"
    else
        QCOMM+="$SEP$(host -t A "$ADDR" | awk '{ print $4 }' | paste -sd ",")"
    fi
    SEP=,
done

The result is always the ip of the host where the galera node is being launched, not the list of all ips of the galera cluster. Do you think that is the culprit ?
In the cas of galera.service.consul, it is QCOMM+="$SEP$(host -t A "$ADDR" | awk '{ print $4 }' | paste -sd ",")" which is used, do you think I have to change this or does it works out of the box for you.

I was trying to get all the galera clusters hosts ip with :

dig galera.service.consul +tcp SRV
or
curl http://192.168.33.101:8500/v1/catalog/service/galera

but I am not sure it is the way to go...

Hopefully you have some tips :)

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions