Skip to content

Commit a4a3376

Browse files
authored
Remove SDK (#2657)
* Remove SDK Signed-off-by: Eoin Fennessy <[email protected]> * Fix e2e test Signed-off-by: Eoin Fennessy <[email protected]> * Remove unnecessary `pip install ./sdk` for unit tests Signed-off-by: Eoin Fennessy <[email protected]> * Remove sdk from pre-commit config Signed-off-by: Eoin Fennessy <[email protected]> * Update docstring for `generate` target Signed-off-by: Eoin Fennessy <[email protected]> * Update CONTRIBUTING.md Signed-off-by: Eoin Fennessy <[email protected]> * Remove .gitignore for OpenAPI Generator CLI JAR Signed-off-by: Eoin Fennessy <[email protected]> * README.md: Add link to `kubeflow/sdk` + some minor edits Signed-off-by: Eoin Fennessy <[email protected]> * Add TODO to repurpose SDK release info if needed Signed-off-by: Eoin Fennessy <[email protected]> * Add SDK installation step to all notebooks Signed-off-by: Eoin Fennessy <[email protected]> * Comment out SDK installation command in all notebooks Signed-off-by: Eoin Fennessy <[email protected]> --------- Signed-off-by: Eoin Fennessy <[email protected]>
1 parent b83e058 commit a4a3376

File tree

385 files changed

+94
-39591
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

385 files changed

+94
-39591
lines changed

.github/workflows/test-e2e.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
pip install papermill==2.6.0 jupyter==1.1.1 ipykernel==6.29.5
4343
4444
echo "Install Kubeflow SDK"
45-
pip install ./sdk
45+
pip install git+https://github.com/kubeflow/sdk.git@main#subdirectory=python
4646
4747
- name: Setup cluster
4848
run: |

.gitignore

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,6 @@ artifacts
2020
__pycache__/
2121
*.egg-info/
2222

23-
# OpenAPI Generator CLI JAR file
24-
hack/python-sdk/openapi-generator-cli.jar
25-
2623
# Coverage
2724
cover.out
2825

.pre-commit-config.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,5 @@ exclude: |
2626
(?x)^(
2727
docs/images/.*|
2828
pkg/client/.*|
29-
sdk/kubeflow/trainer/__init__.py|
30-
sdk/kubeflow/trainer/api/__init__.py|
31-
sdk/kubeflow/trainer/models/.*|
3229
api/python_api/kubeflow_trainer_api/models/.*|
33-
sdk/docs/.*
3430
)$

CONTRIBUTING.md

Lines changed: 3 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Note for Lima the link is to the Adopters, which supports several different cont
2121
The Kubeflow Trainer project includes a Makefile with several helpful commands to streamline your development workflow:
2222

2323
```sh
24-
# Generate manifests, APIs and SDK
24+
# Generate manifests and APIs.
2525
make generate
2626
```
2727

@@ -215,24 +215,6 @@ kubectl logs -n kubeflow -l training.kubeflow.org/job-name=pytorch-simple --foll
215215

216216
### SDK Development
217217

218-
To generate Python SDK for the operator, run:
218+
Changes to the Kubeflow Trainer Python SDK can be made in the https://github.com/kubeflow/sdk repo.
219219

220-
```sh
221-
./hack/python-sdk/gen-sdk.sh
222-
```
223-
224-
This command will re-generate the api and model files together with the documentation and model tests.
225-
The following files/folders in `sdk/python` are auto-generated and should not be modified directly:
226-
227-
```
228-
sdk/python/docs
229-
sdk/python/kubeflow/training/models
230-
sdk/python/kubeflow/training/*.py
231-
sdk/python/test/*.py
232-
```
233-
234-
The Training Operator client and public APIs are located here:
235-
236-
```
237-
sdk/python/kubeflow/training/api
238-
```
220+
The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/python/kubeflow/trainer.

Makefile

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,13 +120,11 @@ manifests: controller-gen ## Generate manifests.
120120
output:rbac:artifacts:config=manifests/base/rbac \
121121
output:webhook:artifacts:config=manifests/base/webhook
122122

123-
## TODO (kramaranya): Remove gen-sdk.sh when moving SDK
124123
.PHONY: generate
125-
generate: go-mod-download manifests ## Generate APIs and SDK.
124+
generate: go-mod-download manifests ## Generate APIs.
126125
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate/boilerplate.go.txt" paths="./pkg/apis/..."
127126
hack/update-codegen.sh
128127
CONTAINER_RUNTIME=$(CONTAINER_RUNTIME) hack/python-api/gen-api.sh
129-
CONTAINER_RUNTIME=$(CONTAINER_RUNTIME) hack/python-sdk/gen-sdk.sh
130128

131129
.PHONY: go-mod-download
132130
go-mod-download: ## Run go mod download to download modules.
@@ -163,7 +161,6 @@ test-integration: ginkgo envtest jobset-operator-crd scheduler-plugins-crd ## Ru
163161
test-python: ## Run Python unit test.
164162
pip install pytest
165163
pip install -r ./cmd/initializers/dataset/requirements.txt
166-
pip install ./sdk
167164

168165
PYTHONPATH=$(PROJECT_DIR) pytest ./pkg/initializers/dataset
169166
PYTHONPATH=$(PROJECT_DIR) pytest ./pkg/initializers/model

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,9 @@ You can integrate other ML libraries such as [HuggingFace](https://huggingface.c
2020
[DeepSpeed](https://github.com/microsoft/DeepSpeed), or [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
2121
with Kubeflow Training to orchestrate their ML training on Kubernetes.
2222

23-
Kubeflow Trainer allows you effortlessly develop your LLMs with the Kubeflow Python SDK and
24-
build Kubernetes-native Training Runtimes with Kubernetes Custom Resources APIs.
23+
Kubeflow Trainer enables you to effortlessly develop your LLMs with the
24+
[Kubeflow Python SDK](https://github.com/kubeflow/sdk/), and build Kubernetes-native Training
25+
Runtimes using Kubernetes Custom Resource APIs.
2526

2627
<h1 align="center">
2728
<img src="./docs/images/trainer-tech-stack.drawio.svg" alt="logo" width="500">

docs/release/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,9 @@ cherry pick your changes from the `master` branch and submit a PR.
8181
8282
### Release Training SDK
8383
84+
TODO (eoinfennessy): This information no longer applies to Trainer V2 after moving the SDK to kubeflow/sdk.
85+
However, we should repurpose this information if we ever decide to publish the Trainer APIs.
86+
8487
1. Update the the VERSION in the [gen-sdk.sh script](../../hack/python-sdk/gen-sdk.sh#L27),
8588
packageVersion in the [swagger_config.json file](../../hack/python-sdk/swagger_config.json#L4),
8689
and the version in the [setup.py file](../../sdk/python/setup.py#L36).

examples/deepspeed/text-summarization/T5-Fine-Tuning.ipynb

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,40 @@
2020
]
2121
},
2222
{
23+
"metadata": {},
2324
"cell_type": "markdown",
24-
"id": "b5ad0e01-d893-484c-8988-d25c2f322b80",
25+
"source": [
26+
"## Install the Kubeflow SDK\n",
27+
"\n",
28+
"You need to install the Kubeflow SDK to interact with Kubeflow Trainer APIs:"
29+
],
30+
"id": "1c461b2984e77d99"
31+
},
32+
{
33+
"metadata": {},
34+
"cell_type": "code",
35+
"outputs": [],
36+
"execution_count": null,
37+
"source": "# !pip install git+https://github.com/kubeflow/sdk.git@main#subdirectory=python",
38+
"id": "4900404c5d532bdf"
39+
},
40+
{
2541
"metadata": {},
42+
"cell_type": "markdown",
2643
"source": [
2744
"## Create Script to Fine-Tune T5 with DeepSpeed\n",
2845
"\n",
2946
"We need to wrap our fine-tuning script into a function to create Kubeflow TrainJob."
30-
]
47+
],
48+
"id": "47534ee4955f3ff6"
3149
},
3250
{
51+
"metadata": {
52+
"jupyter": {
53+
"is_executing": true
54+
}
55+
},
3356
"cell_type": "code",
34-
"execution_count": null,
35-
"id": "57952ce1-5752-4976-8a35-c71d25935b74",
36-
"metadata": {},
37-
"outputs": [],
3857
"source": [
3958
"def deepspeed_train_t5(args):\n",
4059
" import os\n",
@@ -208,7 +227,10 @@
208227
" file_path = os.path.join(HOME_PATH, \"global_step94/mp_rank_00_model_states.pt\")\n",
209228
" bucket = boto3.resource(\"s3\").Bucket(args[\"BUCKET\"])\n",
210229
" bucket.upload_file(file_path, f\"deepspeed/{file_path}\")"
211-
]
230+
],
231+
"id": "35f06c45b614ecd0",
232+
"outputs": [],
233+
"execution_count": null
212234
},
213235
{
214236
"cell_type": "markdown",

examples/mlx/image-classification/MLX-Distributed-Mnist.ipynb

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,24 @@
6666
"```"
6767
]
6868
},
69+
{
70+
"metadata": {},
71+
"cell_type": "markdown",
72+
"source": [
73+
"## Install the Kubeflow SDK\n",
74+
"\n",
75+
"You need to install the Kubeflow SDK to interact with Kubeflow Trainer APIs:"
76+
],
77+
"id": "3cb9ab4c8e12221a"
78+
},
79+
{
80+
"metadata": {},
81+
"cell_type": "code",
82+
"outputs": [],
83+
"execution_count": null,
84+
"source": "# !pip install git+https://github.com/kubeflow/sdk.git@main#subdirectory=python",
85+
"id": "bd62189280760f42"
86+
},
6987
{
7088
"cell_type": "markdown",
7189
"id": "b5ad0e01-d893-484c-8988-d25c2f322b80",

examples/pytorch/image-classification/mnist.ipynb

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,7 @@
3131
"execution_count": null,
3232
"metadata": {},
3333
"outputs": [],
34-
"source": [
35-
"# TODO (astefanutti): Change to the Kubeflow SDK when it's available.\n",
36-
"# !pip install git+https://github.com/kubeflow/trainer.git@master#subdirectory=sdk"
37-
]
34+
"source": "# !pip install git+https://github.com/kubeflow/sdk.git@main#subdirectory=python"
3835
},
3936
{
4037
"cell_type": "markdown",

0 commit comments

Comments
 (0)