Skip to content

Conversation

kemingy
Copy link
Member

@kemingy kemingy commented Aug 12, 2022

Signed-off-by: Keming [email protected]

cc @aseaday @terrytangyuan

@kemingy
Copy link
Member Author

kemingy commented Aug 12, 2022

Copy link
Member

@aseaday aseaday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where the stdout log will be ?

@kemingy
Copy link
Member Author

kemingy commented Aug 12, 2022

Where the stdout log will be ?

Some ideas:

  • manually redirect to files
  • use journalctl if the process is controlled by systemd
  • auto redirect to a file and provide access like envd log --name <container_name>

@aseaday
Copy link
Member

aseaday commented Aug 12, 2022

Where the stdout log will be ?

Some ideas:

  • manually redirect to files
  • use journalctl if the process is controlled by systemd
  • auto redirect to a file and provide access like envd log --name <container_name>

manually redirect to files which we record it in documents LGTM

@gaocegege
Copy link
Member

Do we need to store the logs in files? I think we can just print in STDOUT. and show them in envd logs

@VoVAllen
Copy link
Member

@gaocegege Since there're multiple processes at the same time, it's hard to put everything in the same stdout

## API

```python
runtime.daemon(commands=[
Copy link
Member

@gaocegege gaocegege Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how to support services like tensorboard with the help of this feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use multiple features.

  • run a daemon tensorboard process (this proposal)
  • expose the port to host
  • specify the log dir mount

@gaocegege
Copy link
Member

@gaocegege Since there're multiple processes at the same time, it's hard to put everything in the same stdout

Currently, the stdout looks like:

time="2022-08-11T09:41:16Z" level=info msg="zsh exists at /usr/bin/zsh"
time="2022-08-11T09:41:16Z" level=info msg="ssh server v0.2.0-alpha.13+1fec011 started in 0.0.0.0:2222"
[I 09:41:17.144 NotebookApp] Writing notebook server cookie secret to /home/envd/.local/share/jupyter/runtime/notebook_cookie_secret
[I 09:41:17.280 NotebookApp] Serving notebooks from local directory: /home/envd/mnist
[I 09:41:17.280 NotebookApp] Jupyter Notebook 6.4.12 is running at:
[I 09:41:17.280 NotebookApp] http://96956beaaa6c:8888/
[I 09:41:17.280 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 09:41:17.282 NotebookApp] No web browser found: could not locate runnable browser.
time="2022-08-11T09:41:17Z" level=info msg="starting ssh session with command 'zsh'" session.id=179e909a-21b0-4031-b901-6b8e81cc8036
time="2022-08-11T09:41:17Z" level=info msg="agent requested" session.id=179e909a-21b0-4031-b901-6b8e81cc8036
time="2022-08-11T09:41:17Z" level=info msg="handling PTY session" session.id=179e909a-21b0-4031-b901-6b8e81cc8036
[I 09:41:34.769 NotebookApp] 302 GET / (172.17.0.1) 0.740000ms
[I 09:41:34.773 NotebookApp] 302 GET /tree? (172.17.0.1) 0.580000ms
[W 09:41:37.119 NotebookApp] 401 POST /login?next=%2Ftree%3F (172.17.0.1) 2.280000ms referer=http://localhost:38571/login?next=%2Ftree%3F
[W 09:41:39.143 NotebookApp] 401 POST /login?next=%2Ftree%3F (172.17.0.1) 1.890000ms referer=http://localhost:38571/login?next=%2Ftree%3F

@VoVAllen
Copy link
Member

SSHD is also a deamon service. One stdout makes it hard to track the logs. Let's say if user launch an envd container for training job with tensorboard launched also, envd logs should only show the stdout of the python train.py instead of combining them together. Also store them separately can help debug when there's a problem.


## Goals

* able to run multiple daemon processes controlled by `tini`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the implementation plan? Any architectural considerations that we should discuss here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • plan: can add the commands to tini

    envd/pkg/lang/ir/compile.go

    Lines 166 to 176 in b9f0af8

    ep := []string{
    "tini",
    "--",
    "bash",
    "-c",
    }
    template := `set -e
    /var/envd/bin/envd-ssh --authorized-keys %s --port %d --shell %s &
    %s
    wait -n`
  • I'd like to discuss if this is a general approach (need to work with other features like mount and expose) to solving the issues like feat(lang): Support TensorBoard #527 . Or if we should do it in another way?

@aseaday
Copy link
Member

aseaday commented Aug 12, 2022

Do we need to store the logs in files? I think we can just print in STDOUT. and show them in envd logs

the stdin/stdout/stderr still returns the problems what file descriptor the super process pass to its daemon subprocess as fd 0,1,2. We could force them gather into a file or split them out.

@kemingy
Copy link
Member Author

kemingy commented Aug 15, 2022

Here is an example to demonstrate how to use it for jupyter-lab:

def jupyter_lab():
    expose(local_port=8888, host_port=8888, svc="jupyter")
    runtime.daemon(commands=["jupyter-lab"])


def build():
    base(os="ubuntu20.04", language="python")
    install.pip_packages(["numpy", "jupyterlab"])
    jupyter_lab()

cc @Xiaoaier-Z-L

@gaocegege
Copy link
Member

Do we need to store the logs in files? I think we can just print in STDOUT. and show them in envd logs

the stdin/stdout/stderr still returns the problems what file descriptor the super process pass to its daemon subprocess as fd 0,1,2. We could force them gather into a file or split them out.

SGTM

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@aseaday aseaday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@VoVAllen VoVAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Is it better to put expose under runtime namespace also?

@kemingy
Copy link
Member Author

kemingy commented Aug 15, 2022

LGTM. Is it better to put expose under runtime namespace also?

Agree. BTW, expose is not implemented yet.

@gaocegege
Copy link
Member

I am merging this to move forward. But feel free to comment if there is any problem.

@gaocegege gaocegege merged commit 2f82fa5 into tensorchord:main Aug 15, 2022
@kemingy kemingy deleted the proposal_daemon branch August 16, 2022 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants