Skip to content

Python 3.9 Support #3720

@luminoso

Description

@luminoso

Describe the bug

I have several Seldon deployments across multiple versions. The pods are a 3-step graph with three different containers, using the python wrapper.

Occasionally, one of the containers will fail to start and enter into a CrashLoop stage where subsequently it'll either recover or not seemingly at random. Sometimes the only solution is to kill the container outright.

The error is the following:

2021-11-03 14:37:40,657 - seldon_core.microservice:main:206 - INFO:  Starting microservice.py:main
2021-11-03 14:37:40,657 - seldon_core.microservice:main:207 - INFO:  Seldon Core version: 1.11.2
2021-11-03 14:37:40,660 - seldon_core.microservice:main:362 - INFO:  Parse JAEGER_EXTRA_TAGS []
2021-11-03 14:37:40,660 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation buidId:38 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation buildJob:factory/webhooks/update-fused-seldon 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation deployment_version:v1 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation kubernetes.io/config.seen:2021-11-03T14:37:24.189445793Z 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation kubernetes.io/config.source:api 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation predictor_version:v1 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation prometheus.io/path:/prometheus 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation prometheus.io/scrape:true 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation sidecar.istio.io/inject:true 
2021-11-03 14:37:40,661 - seldon_core.microservice:load_annotations:158 - INFO:  Found annotation v1: 
2021-11-03 14:37:40,661 - seldon_core.microservice:main:365 - INFO:  Annotations: {'buidId': '38', 'buildJob': 'factory/webhooks/update-fused-seldon', 'deployment_version': 'v1', 'kubernetes.io/config.seen': '2021-11-03T14:37:24.189445793Z', 'kubernetes.io/config.source': 'api', 'predictor_version': 'v1', 'prometheus.io/path': '/prometheus', 'prometheus.io/scrape': 'true', 'sidecar.istio.io/inject': 'true', 'v1': ''}
2021-11-03 14:37:40,661 - seldon_core.microservice:main:369 - INFO:  Importing Model
2021-11-03 14:37:43,548 - Model:__init__:13 - INFO:  Clean transformer initiated
Process SyncManager-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/multiprocessing/managers.py", line 583, in _run_server
    server = cls._Server(registry, address, authkey, serializer)
  File "/usr/local/lib/python3.9/multiprocessing/managers.py", line 156, in __init__
    self.listener = Listener(address=address, backlog=16)
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 453, in __init__
    self._listener = SocketListener(address, family, backlog)
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 596, in __init__
    self._socket.bind(address)
OSError: [Errno 98[] Address already in use
Traceback (most recent call last):
  File "/usr/local/bin/seldon-core-microservice", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/seldon_core/microservice.py", line 385, in main
    seldon_metrics = SeldonMetrics(worker_id_func=os.getpid)
  File "/usr/local/lib/python3.9/site-packages/seldon_core/metrics.py", line 89, in __init__
    self._manager = Manager()
  File "/usr/local/lib/python3.9/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/usr/local/lib/python3.9/multiprocessing/managers.py", line 558, in start
    self._address = reader.recv()
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
Exception ignored in: <function SeldonMetrics.__del__ at 0x7f76680523a0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/seldon_core/metrics.py", line 96, in __del__
    self._manager.shutdown()
AttributeError: 'SeldonMetrics' object has no attribute '_manager'

Initially, it looked like a concurrency bug (due to happening randomly at any of the three containers and not always), so what I did was to follow the documentation to reduce the possible concurrency problems that may exist by setting env flags to every container.

- env:
    - name: LOG_LEVEL_ENV
      value: 'DEBUG'
    - name: FLASK_DEBUG
      value: '1'
    - name: GUNICORN_WORKERS
      value: '1'
    - name: GUNICORN_THREADS
      value: '1'
    - name: FLASK_SINGLE_THREADED
      value: '1'
    - name: SELDON_DEBUG
      value: '1'

I verified and the flags are indeed being applied.

I'm also following the def load(): pattern according to the python wrapper documentation.

So right now I'm without any ideas on how to debug even more and looking for ideas

To reproduce

  1. Deploy seldon with a graph defined by 3 different images
  2. Start the pod multiple times until the error happens

Environment

  • Cloud Provider: GKE
#### Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.13-gke.1900", GitCommit:"ee714a7b695ca42b9bd0c8fe2c0159024cdcba5e", GitTreeState:"clean", BuildDate:"2021-08-11T09:19:42Z", GoVersion:"go1.15.13b5", Compiler:"gc", Platform:"linux/amd64"} 

#### Seldon Images:
          value: docker.io/seldonio/engine:1.11.2
          value: docker.io/seldonio/seldon-core-executor:1.11.2
        image: docker.io/seldonio/seldon-core-operator:1.11.2

Model Details

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions