Skip to content

Chocolate service db exhausted #1122

@StefanoFioravanzo

Description

@StefanoFioravanzo

/kind bug

What steps did you take and what happened:
When creating experiments that use the grid search algorithm sometimes trials stop being generated, even though the experiment is still seen as running.
By printing the logs of the suggestions pods I can see the following:

ERROR:sqlalchemy.pool.impl.NullPool:Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
    fairy._reset(pool)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 502, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140636731225856 and this is thread id 140636053288704.
ERROR:sqlalchemy.pool.impl.NullPool:Exception closing connection <sqlite3.Connection object at 0x7fe8b4397650>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
    fairy._reset(pool)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 502, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140636731225856 and this is thread id 140636053288704.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 270, in _close_connection
    self._dialect.do_close(connection)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 508, in do_close
    dbapi_connection.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140636731225856 and this is thread id 140636053288704.
DEBUG:filelock:Attempting to release lock 140636764717688 on my_db.db.lock
INFO:filelock:Lock 140636764717688 released on my_db.db.lock
INFO:BaseChocolateService:{'_chocolate_id': 0, '_loss': -83.971292, 'bm9kZXNfbnVtYmVy': 512.0}
INFO:BaseChocolateService:{'_chocolate_id': 1, '_loss': -82.655502, 'bm9kZXNfbnVtYmVy': 768.0}
DEBUG:filelock:Attempting to acquire lock 140636764717688 on my_db.db.lock
INFO:filelock:Lock 140636764717688 acquired on my_db.db.lock
DEBUG:filelock:Attempting to release lock 140636764717688 on my_db.db.lock
INFO:filelock:Lock 140636764717688 released on my_db.db.lock
INFO:BaseChocolateService:Chocolate db is exhausted, increase Search Space or decrease maxTrialCount!
INFO:BaseChocolateService:Chocolate db is exhausted, increase Search Space or decrease maxTrialCount!
INFO:BaseChocolateService:Chocolate db is exhausted, increase Search Space or decrease maxTrialCount!

Note that this is happening just sometimes, I was not able to pin down a specific configuration that causes this. With some parameters configurations it happens, with others the experiment completes by exploring the search space as expected.

What did you expect to happen:
I would at least expect to see the experiment failing (related to #1120 ), but a more informative message would be expected. The message Chocolate db is exhausted, increase Search Space or decrease maxTrialCount! doesn't make sense as the maxTrialCount is still not reaches and there are some more configurations of the search space that have not been tried.

Environment:

  • Kubeflow version: 1.0
  • Minikube version: 1.2.0 (MiniKF Latest)
  • Kubernetes version: 1.14

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions