Skip to content

Fluentd hangs if unable to connect to opensearch. #1058

@Sameer0998

Description

@Sameer0998

Problem : I am using two instances of opensearch one Primary and one Secondary, We are observing that the fluentd hangs when one of the instance is down.

Expected behavior : I want the logs to be pushed to other instance even if one is down. For example : Let's say Secondary is down..the fluentd is hanged and data is not pushing to primary. but I want to flow to be continuous without affecting the other instance.

Here are the logs of fluentd one of the instance is down :

Image

As suggested in the issue : #935 , tried to do the same confugration as well..but still the logs show the same and data is not being pushed.
can anyone help me resolve this issue ?
Here is the output of fluent-gem list
*** LOCAL GEMS ***

abbrev (default: 0.1.1)
activesupport (7.0.2.4)
addressable (2.8.7)
base64 (0.2.0, default: 0.1.1)
benchmark (default: 0.2.1)
bigdecimal (default: 3.1.3)
bundler (2.6.2, default: 2.4.10)
cgi (default: 0.3.6)
concurrent-ruby (1.3.4)
cool.io (1.9.0)
csv (default: 3.2.6)
date (default: 3.3.3)
debug (1.7.1)
delegate (default: 0.3.0)
did_you_mean (default: 1.6.3)
digest (default: 3.1.1)
domain_name (0.6.20240107)
drb (default: 2.1.1)
elasticsearch (7.10.1)
elasticsearch-api (7.10.1)
elasticsearch-transport (7.10.1)
english (default: 0.7.2)
erb (default: 4.0.2)
error_highlight (default: 0.5.1)
etc (default: 1.4.2)
excon (1.2.3)
faraday (1.10.4)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-multipart (1.1.0)
faraday-net_http (1.0.2)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday-retry (1.0.3)
fcntl (default: 1.0.2)
ffi (1.17.1 x86_64-linux-gnu)
ffi-compiler (1.3.2)
fiddle (default: 1.1.1)
fileutils (default: 1.7.0)
find (default: 0.1.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-concat (2.5.0)
fluent-plugin-detect-exceptions (0.0.16)
fluent-plugin-elasticsearch (5.2.5)
fluent-plugin-kubernetes_metadata_filter (2.9.5)
fluent-plugin-kvp-filter (0.1.1)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-parser-cri (0.1.1)
fluent-plugin-prometheus (2.0.3)
fluent-plugin-record-modifier (2.1.1)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-systemd (1.0.5)
fluent-plugin-throttle (0.0.5)
fluentd (1.14.6)
forwardable (default: 1.3.3)
getoptlong (default: 0.2.0)
http (5.2.0)
http-accept (1.7.0)
http-cookie (1.0.8)
http-form_data (2.3.0)
http_parser.rb (0.8.0)
i18n (1.14.6)
io-console (default: 0.6.0)
io-nonblock (default: 0.2.0)
io-wait (default: 0.3.0)
ipaddr (default: 1.2.5)
irb (default: 1.6.2)
json (default: 2.6.3)
jsonpath (1.1.5)
kubeclient (4.12.0)
llhttp-ffi (0.5.0)
logfmt (0.0.10)
logger (1.6.4, default: 1.5.3)
lru_redux (1.1.0)
matrix (0.4.2)
mime-types (3.6.0)
mime-types-data (3.2024.1203)
minitest (5.25.4, 5.16.3)
msgpack (1.7.5)
multi_json (1.15.0)
multipart-post (2.4.1)
mutex_m (default: 0.1.2)
net-ftp (0.2.0)
net-http (default: 0.3.2)
net-imap (0.3.4)
net-pop (0.1.2)
net-protocol (default: 0.2.1)
net-smtp (0.3.3)
netrc (0.11.0)
nkf (default: 0.1.2)
observer (default: 0.1.1)
oj (3.13.23)
open-uri (default: 0.3.0)
open3 (default: 0.1.2)
openssl (default: 3.1.0)
optparse (default: 0.3.1)
ostruct (0.6.1, default: 0.5.5)
pathname (default: 0.2.1)
power_assert (2.0.3)
pp (default: 0.4.0)
prettyprint (default: 0.1.1)
prime (0.1.2)
prometheus-client (4.2.3)
pstore (default: 0.1.2)
psych (default: 5.0.1)
public_suffix (6.0.1)
racc (default: 1.6.2)
rake (13.2.1, 13.0.6)
rbs (2.8.2)
rdoc (default: 6.5.0)
readline (default: 0.0.3)
readline-ext (default: 0.1.5)
recursive-open-struct (1.3.1)
reline (default: 0.3.2)
resolv (default: 0.2.2)
resolv-replace (default: 0.1.1)
rest-client (2.1.0)
rexml (3.2.5)
rinda (default: 0.1.1)
rss (0.2.9)
ruby2_keywords (default: 0.0.5)
securerandom (default: 0.2.2)
serverengine (2.4.0)
set (default: 1.0.3)
shellwords (default: 0.1.0)
sigdump (0.2.5)
singleton (default: 0.1.1)
stringio (default: 3.0.4)
strptime (0.2.5)
strscan (default: 3.0.5)
syntax_suggest (default: 1.0.2)
syslog (default: 0.1.1)
systemd-journal (1.4.2)
tempfile (default: 0.1.3)
test-unit (3.5.7)
time (default: 0.2.2)
timeout (default: 0.3.1)
tmpdir (default: 0.1.3)
tsort (default: 0.1.1)
typeprof (0.21.3)
tzinfo (2.0.6)
tzinfo-data (1.2024.2)
un (default: 0.2.1)
uri (default: 0.12.1)
weakref (default: 0.1.2)
webrick (1.7.0)
yajl-ruby (1.4.3)
yaml (default: 0.2.1)
zlib (default: 3.0.0)

and here is the fluentd confugration : (store ignore_error seems not to be working ?? )

 output.conf: |-
    <match **>
      @type copy
      <store ignore_error>
        @type relabel
        @label @primary
      </store>
  {{- if .Values.opensearch.mossSecondary.enabled }}
      <store ignore_error>
        @type relabel
        @label @secondary
      </store>
  {{- end }}
    </match>
    
    <label @primary>
      <match **>
        @id elasticsearch_moss_pm
        @type elasticsearch
        @log_level trace
// as per #935 comments 
        verify_es_version_at_startup false
        default_elasticsearch_version 7
        max_retry_get_es_version 20
        max_retry_putting_template 20
        scheme https
        ssl_verify false
        ssl_version TLSv1_2
        suppress_type_name true
        host xxxxxxxxxxxx // correct url was specified 
        port 443
        user xxxxxxxxxxx
        password xxxxxxxxxxx
        templates xxxxxxxxxxxxx
        template_overwrite true
        write_operation upsert
        target_index_key @target_index
        index_name defaulmosstindex
        type_name _doc
        id_key request_id
        remove_keys request_id
        reconnect_on_error true
        reload_on_failure true
        reload_connections false
        request_timeout 45s
        bulk_message_request_threshold -1
        <buffer>
          @type file
          path xxxxxx
          flush_mode interval
          flush_thread_count 4
          flush_interval 60s
          retry_type periodic
          retry_max_times 20
          retry_wait 20s
          chunk_limit_size 64M
          queued_chunks_limit_size 100
          overflow_action throw_exception
        </buffer>
      </match>
    </label>
  {{- if .Values.opensearch.mossSecondary.enabled }}
    <label @secondary>
      <filter **>
        @type record_modifier
        <record>
          @target_index ${record["@target_index"]}_${(((Time.at(time).strftime("%j").to_i - 1) / 3) * 3 + 1)}
        </record>
      </filter>
      <match **>
        @id secondary_elasticsearch_moss_pm
        @type elasticsearch
        @log_level trace
        verify_es_version_at_startup false
        default_elasticsearch_version 7
        max_retry_get_es_version 20
        max_retry_putting_template 20
        scheme https
        ssl_verify false
        ssl_version TLSv1_2
        suppress_type_name true
        host testapi // specified wrong url intentionally to check the case
        port 443
        user xxxxxxxx
        password xxxxx
        templates xxxxxx
        template_overwrite true
        write_operation upsert
        target_index_key @target_index
        index_name duplicatedefaulmosstindex
        type_name _doc
        id_key request_id
        remove_keys request_id
        reconnect_on_error true
        reload_on_failure true
        reload_connections false
        request_timeout 45s
        bulk_message_request_threshold -1
        <buffer>
          @type file
          path xxxxxxxxxxxxxxxxxx
          flush_mode interval
          flush_thread_count 4
          flush_interval 60s
          retry_type periodic
          retry_max_times 20
          retry_wait 20s
          chunk_limit_size 64M
          queued_chunks_limit_size 100
          overflow_action throw_exception
        </buffer>
      </match>
    </label>
  {{- end }}
{{- end }}

Can you please take a look at this it will be really helpful : @cosmo0920 . Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions