-
Notifications
You must be signed in to change notification settings - Fork 306
Description
Problem : I am using two instances of opensearch one Primary and one Secondary, We are observing that the fluentd hangs when one of the instance is down.
Expected behavior : I want the logs to be pushed to other instance even if one is down. For example : Let's say Secondary is down..the fluentd is hanged and data is not pushing to primary. but I want to flow to be continuous without affecting the other instance.
Here are the logs of fluentd one of the instance is down :
As suggested in the issue : #935 , tried to do the same confugration as well..but still the logs show the same and data is not being pushed.
can anyone help me resolve this issue ?
Here is the output of fluent-gem list
*** LOCAL GEMS ***
abbrev (default: 0.1.1)
activesupport (7.0.2.4)
addressable (2.8.7)
base64 (0.2.0, default: 0.1.1)
benchmark (default: 0.2.1)
bigdecimal (default: 3.1.3)
bundler (2.6.2, default: 2.4.10)
cgi (default: 0.3.6)
concurrent-ruby (1.3.4)
cool.io (1.9.0)
csv (default: 3.2.6)
date (default: 3.3.3)
debug (1.7.1)
delegate (default: 0.3.0)
did_you_mean (default: 1.6.3)
digest (default: 3.1.1)
domain_name (0.6.20240107)
drb (default: 2.1.1)
elasticsearch (7.10.1)
elasticsearch-api (7.10.1)
elasticsearch-transport (7.10.1)
english (default: 0.7.2)
erb (default: 4.0.2)
error_highlight (default: 0.5.1)
etc (default: 1.4.2)
excon (1.2.3)
faraday (1.10.4)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-multipart (1.1.0)
faraday-net_http (1.0.2)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday-retry (1.0.3)
fcntl (default: 1.0.2)
ffi (1.17.1 x86_64-linux-gnu)
ffi-compiler (1.3.2)
fiddle (default: 1.1.1)
fileutils (default: 1.7.0)
find (default: 0.1.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-concat (2.5.0)
fluent-plugin-detect-exceptions (0.0.16)
fluent-plugin-elasticsearch (5.2.5)
fluent-plugin-kubernetes_metadata_filter (2.9.5)
fluent-plugin-kvp-filter (0.1.1)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-parser-cri (0.1.1)
fluent-plugin-prometheus (2.0.3)
fluent-plugin-record-modifier (2.1.1)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-systemd (1.0.5)
fluent-plugin-throttle (0.0.5)
fluentd (1.14.6)
forwardable (default: 1.3.3)
getoptlong (default: 0.2.0)
http (5.2.0)
http-accept (1.7.0)
http-cookie (1.0.8)
http-form_data (2.3.0)
http_parser.rb (0.8.0)
i18n (1.14.6)
io-console (default: 0.6.0)
io-nonblock (default: 0.2.0)
io-wait (default: 0.3.0)
ipaddr (default: 1.2.5)
irb (default: 1.6.2)
json (default: 2.6.3)
jsonpath (1.1.5)
kubeclient (4.12.0)
llhttp-ffi (0.5.0)
logfmt (0.0.10)
logger (1.6.4, default: 1.5.3)
lru_redux (1.1.0)
matrix (0.4.2)
mime-types (3.6.0)
mime-types-data (3.2024.1203)
minitest (5.25.4, 5.16.3)
msgpack (1.7.5)
multi_json (1.15.0)
multipart-post (2.4.1)
mutex_m (default: 0.1.2)
net-ftp (0.2.0)
net-http (default: 0.3.2)
net-imap (0.3.4)
net-pop (0.1.2)
net-protocol (default: 0.2.1)
net-smtp (0.3.3)
netrc (0.11.0)
nkf (default: 0.1.2)
observer (default: 0.1.1)
oj (3.13.23)
open-uri (default: 0.3.0)
open3 (default: 0.1.2)
openssl (default: 3.1.0)
optparse (default: 0.3.1)
ostruct (0.6.1, default: 0.5.5)
pathname (default: 0.2.1)
power_assert (2.0.3)
pp (default: 0.4.0)
prettyprint (default: 0.1.1)
prime (0.1.2)
prometheus-client (4.2.3)
pstore (default: 0.1.2)
psych (default: 5.0.1)
public_suffix (6.0.1)
racc (default: 1.6.2)
rake (13.2.1, 13.0.6)
rbs (2.8.2)
rdoc (default: 6.5.0)
readline (default: 0.0.3)
readline-ext (default: 0.1.5)
recursive-open-struct (1.3.1)
reline (default: 0.3.2)
resolv (default: 0.2.2)
resolv-replace (default: 0.1.1)
rest-client (2.1.0)
rexml (3.2.5)
rinda (default: 0.1.1)
rss (0.2.9)
ruby2_keywords (default: 0.0.5)
securerandom (default: 0.2.2)
serverengine (2.4.0)
set (default: 1.0.3)
shellwords (default: 0.1.0)
sigdump (0.2.5)
singleton (default: 0.1.1)
stringio (default: 3.0.4)
strptime (0.2.5)
strscan (default: 3.0.5)
syntax_suggest (default: 1.0.2)
syslog (default: 0.1.1)
systemd-journal (1.4.2)
tempfile (default: 0.1.3)
test-unit (3.5.7)
time (default: 0.2.2)
timeout (default: 0.3.1)
tmpdir (default: 0.1.3)
tsort (default: 0.1.1)
typeprof (0.21.3)
tzinfo (2.0.6)
tzinfo-data (1.2024.2)
un (default: 0.2.1)
uri (default: 0.12.1)
weakref (default: 0.1.2)
webrick (1.7.0)
yajl-ruby (1.4.3)
yaml (default: 0.2.1)
zlib (default: 3.0.0)
and here is the fluentd confugration : (store ignore_error seems not to be working ?? )
output.conf: |-
<match **>
@type copy
<store ignore_error>
@type relabel
@label @primary
</store>
{{- if .Values.opensearch.mossSecondary.enabled }}
<store ignore_error>
@type relabel
@label @secondary
</store>
{{- end }}
</match>
<label @primary>
<match **>
@id elasticsearch_moss_pm
@type elasticsearch
@log_level trace
// as per #935 comments
verify_es_version_at_startup false
default_elasticsearch_version 7
max_retry_get_es_version 20
max_retry_putting_template 20
scheme https
ssl_verify false
ssl_version TLSv1_2
suppress_type_name true
host xxxxxxxxxxxx // correct url was specified
port 443
user xxxxxxxxxxx
password xxxxxxxxxxx
templates xxxxxxxxxxxxx
template_overwrite true
write_operation upsert
target_index_key @target_index
index_name defaulmosstindex
type_name _doc
id_key request_id
remove_keys request_id
reconnect_on_error true
reload_on_failure true
reload_connections false
request_timeout 45s
bulk_message_request_threshold -1
<buffer>
@type file
path xxxxxx
flush_mode interval
flush_thread_count 4
flush_interval 60s
retry_type periodic
retry_max_times 20
retry_wait 20s
chunk_limit_size 64M
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
</label>
{{- if .Values.opensearch.mossSecondary.enabled }}
<label @secondary>
<filter **>
@type record_modifier
<record>
@target_index ${record["@target_index"]}_${(((Time.at(time).strftime("%j").to_i - 1) / 3) * 3 + 1)}
</record>
</filter>
<match **>
@id secondary_elasticsearch_moss_pm
@type elasticsearch
@log_level trace
verify_es_version_at_startup false
default_elasticsearch_version 7
max_retry_get_es_version 20
max_retry_putting_template 20
scheme https
ssl_verify false
ssl_version TLSv1_2
suppress_type_name true
host testapi // specified wrong url intentionally to check the case
port 443
user xxxxxxxx
password xxxxx
templates xxxxxx
template_overwrite true
write_operation upsert
target_index_key @target_index
index_name duplicatedefaulmosstindex
type_name _doc
id_key request_id
remove_keys request_id
reconnect_on_error true
reload_on_failure true
reload_connections false
request_timeout 45s
bulk_message_request_threshold -1
<buffer>
@type file
path xxxxxxxxxxxxxxxxxx
flush_mode interval
flush_thread_count 4
flush_interval 60s
retry_type periodic
retry_max_times 20
retry_wait 20s
chunk_limit_size 64M
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
</label>
{{- end }}
{{- end }}
Can you please take a look at this it will be really helpful : @cosmo0920 . Thanks