test: enforce PSS restricted for CI user namespace#3444
test: enforce PSS restricted for CI user namespace#3444abdullahpathan22 wants to merge 7 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hello @juliusvonkohout, |
|
Thank you, please try to fix the tests. |
|
@abdullahpathan22 do you plan to continue here ? Otherwise @Raakshass could take over. |
|
YOu probably have to wait for #3463 being implemented |
Yeah i am fixing this PR |
|
/retest |
|
Hi all! I have updated this PR to resolve the failing CI integration tests by cleanly merging the latest |
2aa11d7 to
ac0cfc3
Compare
- Enforce PSS restricted labels on user namespace during CI tests via kubeflow_profile_install.sh - Update workflow triggers for broader PSS test coverage across katib, pipeline, trainer, training-operator, and dex workflows - Add PSS-compliant securityContext and workingDir to test Notebook and Katib trial manifests to prevent permission issues - Add seccompProfile to JupyterLab WorkspaceKind sample - Add PSS-compliant overrides to istio_validation test-client pod - Upgrade Istio manifests from 1.29 to 1.30.0-rc.0 for native PSS Restricted compatibility (CRDs, install, sidecar injector, cluster-local-gateway, ztunnel, profile) Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
9faf80a to
8f7f127
Compare
There was a problem hiding this comment.
Pull request overview
The PR's stated purpose is to enforce PSS restricted on the kubeflow-user-example-com namespace during CI by relabeling it in tests/kubeflow_profile_install.sh. However, the diff is substantially broader: it also bumps Istio from 1.29.2 to the release‑candidate 1.30.0-rc.0 across the Istio manifests, adds pod/container securityContext hardening to several test and upstream sample/runtime manifests, and changes path triggers in six GitHub Actions workflows.
Changes:
- Relabel CI namespace to PSS
restricted(withenforce-version=latest) intests/kubeflow_profile_install.sh. - Upgrade Istio manifests/CRDs/install to
1.30.0-rc.0. - Add PSS-restricted-compatible
securityContextto test job manifests, an upstream trainer runtime, and an upstream workspace sample; expand workflow trigger paths.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/kubeflow_profile_install.sh | Relabels CI namespace to PSS restricted with enforce-version=latest. |
| tests/training_operator_job.yaml | Adds pod/container securityContext to both PyTorchJob replica specs. |
| tests/notebook.test.kubeflow-user-example.com.yaml | Adds pod/container securityContext for the test notebook. |
| tests/katib_test.yaml | Adds securityContext to the trial container/pod. |
| applications/trainer/upstream/base/runtimes/torch_distributed.yaml | Adds securityContext to upstream-synced runtime (modifies an /upstream/ path). |
| applications/workspaces/upstream/controller/samples/jupyterlab_v1beta1_workspacekind.yaml | Adds seccompProfile to upstream-synced sample. |
| scripts/synchronize-istio-manifests.sh | Bumps COMMIT to RC 1.30.0-rc.0. |
| README.md | Updates Istio version in components table. |
| common/istio/profile.yaml | Updates Istio tag to RC. |
| common/istio/istio-install/base/install.yaml | Regenerated Istio install for 1.30.0-rc.0 (new RBAC entry, env vars, volumes). |
| common/istio/istio-install/base/patches/istio-sidecar-injector-patch.yaml | Updates injector tag. |
| common/istio/istio-install/overlays/insecure/configmap-patch.yaml | Updates ConfigMap tag. |
| common/istio/istio-install/components/ambient-mode/ztunnel.yaml | Updates ztunnel image/chart labels to RC. |
| common/istio/istio-crds/base/crd.yaml | Regenerated CRDs (adds TrafficExtension, notTrustDomains, disableContextPropagation, fixes port 65535 rule). |
| common/istio/cluster-local-gateway/base/cluster-local-gateway.yaml | Updates labels/image to RC. |
| .github/workflows/{katib,pipeline,pipeline_run_from_notebook,dex_oauth2-proxy,trainer,training_operator}_test.yaml | Replaces experimental/security/PSS/* trigger with tests/kubeflow_profile_install.sh + tests/PSS_enable.sh; broadens some globs. |
| .github/workflows/istio_validation.yaml | Adds inline --overrides JSON to kubectl run for PSS compatibility. |
| kubectl label namespace $KF_PROFILE \ | ||
| pod-security.kubernetes.io/enforce=restricted \ | ||
| pod-security.kubernetes.io/enforce-version=latest \ | ||
| --overwrite |
| COMMIT="1.30.0-rc.0" | ||
| PREVIOUS_COMMIT="1.29.2" |
| securityContext: | ||
| runAsNonRoot: true | ||
| runAsUser: 1000 | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| containers: | ||
| - name: node | ||
| image: pytorch/pytorch:2.10.0-cuda12.8-cudnn9-runtime | ||
| workingDir: /tmp | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| add: [] | ||
| runAsNonRoot: true |
| seccompProfile: | ||
| type: RuntimeDefault |
| capabilities: | ||
| drop: | ||
| - ALL | ||
| add: [] |
| drop: | ||
| - ALL | ||
| add: [] | ||
| runAsNonRoot: true |
| kubectl run test-client --image=busybox --rm -i --restart=Never -n $KF_PROFILE \ | ||
| --overrides='{"spec": {"securityContext": {"runAsNonRoot": true, "runAsUser": 1000, "seccompProfile": {"type": "RuntimeDefault"}}, "containers": [{"name": "test-client", "image": "busybox", "securityContext": {"allowPrivilegeEscalation": false, "capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "runAsUser": 1000}}]}}' -- \ |
| - tests/pipeline_v1_test.py | ||
| - tests/pipeline_v2_test.py | ||
| - experimental/security/PSS/* | ||
| - tests/pipeline* |
| @@ -7,7 +7,8 @@ on: | |||
| - common/cert-manager/** | |||
| - common/oauth2-proxy/** | |||
| - common/istio*/** | |||
| kubectl label namespace $KF_PROFILE pod-security.kubernetes.io/enforce=baseline --overwrite | ||
| kubectl label namespace $KF_PROFILE \ | ||
| pod-security.kubernetes.io/enforce=restricted \ | ||
| pod-security.kubernetes.io/enforce-version=latest \ |
|
@abdullahpathan22, some checks are still failing. Also, please address all the Copilot review comments as well |
|
Yeah sure working on it! |
…ew points - Pin Istio in synchronize script to stable GA 1.30.0 and reject pre-releases. - Create local Kustomize overlays for Katib and Trainer to avoid modifying upstream manifests directly. - Revert all manual changes inside upstream directories (torch_distributed.yaml and jupyterlab_v1beta1_workspacekind.yaml). - Harden container-level securityContext parameters (runAsUser, seccompProfile) explicitly across all test and runtime manifests. - Fix Training Operator worker timeout by explicitly defining a custom init-pytorch container under replica specs. - Fix Katib experiment timeout by enabling injectSecurityContext: true inside Katib's local ConfigMap patch. - Refactor kubectl run overrides in istio_validation.yaml to use a structured multi-line heredoc variable with stdin/tty/seccomp parameters. - Pin PSS enforce-version in profile install to stable v1.29. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
| kubectl label namespace $KF_PROFILE \ | ||
| pod-security.kubernetes.io/enforce=restricted \ | ||
| pod-security.kubernetes.io/enforce-version=v1.29 \ | ||
| --overwrite |
| hub: registry.istio.io/release | ||
| profile: default | ||
| tag: 1.29.2 | ||
| tag: 1.30.0-rc.0 |
| COMMIT="1.30.0" | ||
| PREVIOUS_COMMIT="1.29.2" | ||
| if [[ "${COMMIT}" =~ -(rc|beta|alpha)([.-]|$) ]]; then | ||
| echo "Refusing to synchronize pre-release Istio version: ${COMMIT}. Pin COMMIT to a stable GA release tag." | ||
| exit 1 | ||
| fi |
| | Kubeflow Hub | applications/hub/upstream | [v0.3.9](https://github.com/kubeflow/hub/tree/v0.3.9/manifests/kustomize) | 510m | 2112Mi | 20GB | | ||
| | Spark Operator | applications/spark/spark-operator | [2.5.0](https://github.com/kubeflow/spark-operator/tree/v2.5.0) | 9m | 41Mi | 0GB | | ||
| | Istio | common/istio | [1.29.2](https://github.com/istio/istio/releases/tag/1.29.2) | 750m | 2364Mi | 0GB | | ||
| | Istio | common/istio | [1.30.0-rc.0](https://github.com/istio/istio/releases/tag/1.30.0-rc.0) | 750m | 2364Mi | 0GB | |
| apiVersion: kubeflow.org/v1beta1 | ||
| kind: WorkspaceKind | ||
| metadata: | ||
| name: jupyterlab | ||
| spec: | ||
| ## ================================================================ | ||
| ## SPAWNER CONFIGS | ||
| ## - how the WorkspaceKind is displayed in the Workspace Spawner UI | ||
| ## ================================================================ | ||
| spawner: | ||
|
|
||
| ## the display name of the WorkspaceKind | ||
| displayName: "JupyterLab Notebook" | ||
|
|
||
| ## the description of the WorkspaceKind | ||
| description: "A Workspace which runs JupyterLab in a Pod" | ||
|
|
||
| ## if this WorkspaceKind should be hidden from the Workspace Spawner UI | ||
| hidden: false | ||
|
|
||
| ## if this WorkspaceKind is deprecated | ||
| deprecated: false | ||
|
|
||
| ## a message to show in Workspace Spawner UI when the WorkspaceKind is deprecated | ||
| #deprecationMessage: "This WorkspaceKind will be removed on 20XX-XX-XX, please use another WorkspaceKind." | ||
|
|
||
| ## the icon of the WorkspaceKind | ||
| ## - a small (favicon-sized) icon used in the Workspace Spawner UI | ||
| ## | ||
| icon: | ||
| url: "https://jupyter.org/assets/favicons/apple-touch-icon-152x152.png" | ||
| #configMap: | ||
| # name: "my-logos" | ||
| # key: "apple-touch-icon-152x152.png" | ||
|
|
||
| ## the logo of the WorkspaceKind | ||
| ## - a 1:1 (card size) logo used in the Workspace Spawner UI | ||
| ## | ||
| logo: | ||
| url: "https://upload.wikimedia.org/wikipedia/commons/3/38/Jupyter_logo.svg" | ||
| #configMap: | ||
| # name: "my-logos" | ||
| # key: "Jupyter_logo.svg" | ||
|
|
||
| ## ================================================================ | ||
| ## DEFINITION CONFIGS | ||
| ## - currently the only supported type is `podTemplate` | ||
| ## - in the future, there will be MORE types like `virtualMachine` | ||
| ## to run the Workspace on systems like KubeVirt/EC2 rather than in a Pod | ||
| ## ================================================================ | ||
| podTemplate: | ||
|
|
||
| ## metadata for Workspace Pods (MUTABLE) | ||
| ## | ||
| podMetadata: | ||
| labels: | ||
| my-workspace-kind-label: "my-value" | ||
| annotations: | ||
| my-workspace-kind-annotation: "my-value" | ||
|
|
||
| ## service account configs for Workspace Pods | ||
| ## | ||
| serviceAccount: | ||
|
|
||
| ## the name of the ServiceAccount (NOT MUTABLE) | ||
| ## - this Service Account MUST already exist in the Namespace | ||
| ## of the Workspace, the controller will NOT create it | ||
| ## - we will not show this WorkspaceKind in the Spawner UI | ||
| ## if the SA does not exist in the Namespace | ||
| ## | ||
| name: "default-editor" | ||
|
|
||
| ## activity culling configs (MUTABLE) | ||
| ## - for pausing inactive Workspaces | ||
| ## | ||
| culling: | ||
|
|
||
| ## if the culling feature is enabled | ||
| ## | ||
| enabled: true | ||
|
|
||
| ## the maximum number of seconds a Workspace can be inactive | ||
| ## | ||
| maxInactiveSeconds: 86400 | ||
|
|
||
| ## the probe used to determine if the Workspace is active | ||
| ## | ||
| activityProbe: | ||
|
|
||
| ## OPTION 1: a shell command probe | ||
| ## - if the Workspace had activity in the last 60 seconds this command | ||
| ## should return status 0, otherwise it should return status 1 | ||
| ## | ||
| #exec: | ||
| # command: | ||
| # - "bash" | ||
| # - "-c" | ||
| # - "exit 0" | ||
|
|
||
| ## OPTION 2: a Jupyter-specific probe | ||
| ## - will poll the `/api/status` endpoint of the Jupyter API, and use the `last_activity` field | ||
| ## https://github.com/jupyter-server/jupyter_server/blob/v2.13.0/jupyter_server/services/api/handlers.py#L62-L67 | ||
| ## - note, users need to be careful that their other probes don't trigger a "last_activity" update | ||
| ## e.g. they should only check the health of Jupyter using the `/api/status` endpoint | ||
| ## | ||
| jupyter: | ||
| lastActivity: true | ||
|
|
||
| ## standard probes to determine Container health (MUTABLE) | ||
| ## - spec for Probe: | ||
| ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#probe-v1-core | ||
| ## | ||
| probes: | ||
|
|
||
| ## startup probe for the "main" container | ||
| ## | ||
| #startupProbe: | ||
| # ... | ||
|
|
||
| ## liveness probe for the "main" container | ||
| ## | ||
| #livenessProbe: | ||
| # ... | ||
|
|
||
| ## readiness probe for the "main" container | ||
| ## | ||
| #readinessProbe: | ||
| # ... | ||
|
|
||
| ## volume mount paths | ||
| ## | ||
| volumeMounts: | ||
|
|
||
| ## the path to mount the home PVC (NOT MUTABLE) | ||
| ## | ||
| home: "/home/jovyan" | ||
|
|
||
| ## port definitions which can be referenced in image config values (MUTABLE) | ||
| ## - think of port definitions as the "types" of services which could be provided by a specific image | ||
| ## - a port definition has a common id (URL path) for consistency if the listening TCP port changes | ||
| ## - ports are referenced in image config values by their `id` and their definition here establishes | ||
| ## their protocol type, and default display name in the UI | ||
| ## | ||
| ports: | ||
|
|
||
| - id: "jupyterlab" | ||
| defaultDisplayName: "JupyterLab" | ||
| protocol: "HTTP" | ||
|
|
||
| ## http proxy configs (MUTABLE) | ||
| ## only "HTTP" protocol ports are supported | ||
| ## | ||
| httpProxy: | ||
|
|
||
| ## if the path prefix is stripped from incoming HTTP requests | ||
| ## - if true, the '/workspace/connect/{profile_name}/{workspace_name}/' path prefix | ||
| ## is stripped from incoming requests, the application sees the request | ||
| ## as if it was made to '/...' | ||
| ## - this only works if the application serves RELATIVE URLs for its assets | ||
| ## | ||
| removePathPrefix: false | ||
|
|
||
| ## header manipulation rules for incoming HTTP requests | ||
| ## - sets the `spec.http[].headers.request` of the Istio VirtualService | ||
| ## https://istio.io/latest/docs/reference/config/networking/virtual-service/#Headers-HeaderOperations | ||
| ## - the following string templates are available: | ||
| ## - `.PathPrefix`: the path prefix of the Workspace (e.g. '/workspace/connect/{profile_name}/{workspace_name}/') | ||
| ## | ||
| requestHeaders: {} | ||
|
|
||
| ## environment variables for Workspace Pods (MUTABLE) | ||
| ## - spec for EnvVar: | ||
| ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvar-v1-core | ||
| ## - the following go template functions are available: | ||
| ## - `httpPathPrefix(portId string)`: returns the HTTP path prefix of the specified port | ||
| ## | ||
| extraEnv: | ||
|
|
||
| ## to enable backwards compatibility with old Jupyter images from Kubeflow Notebooks V1 | ||
| ## https://github.com/kubeflow/kubeflow/blob/v1.8.0/components/example-notebook-servers/jupyter/s6/services.d/jupyterlab/run#L12 | ||
| - name: "NB_PREFIX" | ||
| value: |- | ||
| {{ httpPathPrefix "jupyterlab" }} | ||
|
|
||
| ## extra volume mounts for Workspace Pods (MUTABLE) | ||
| ## - spec for VolumeMount: | ||
| ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#volumemount-v1-core | ||
| ## | ||
| extraVolumeMounts: | ||
|
|
||
| ## frameworks like PyTorch use shared memory for inter-process communication and expect a tmpfs at /dev/shm | ||
| ## https://en.wikipedia.org/wiki/Shared_memory | ||
| - name: "dshm" | ||
| mountPath: "/dev/shm" | ||
|
|
||
| ## extra volumes for Workspace Pods (MUTABLE) | ||
| ## - spec for Volume: | ||
| ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#volume-v1-core | ||
| ## | ||
| extraVolumes: | ||
| - name: "dshm" | ||
| emptyDir: | ||
| medium: "Memory" | ||
|
|
||
| ## security context for Workspace Pods (MUTABLE) | ||
| ## - spec for PodSecurityContext: | ||
| ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#podsecuritycontext-v1-core | ||
| ## | ||
| securityContext: | ||
| fsGroup: 100 | ||
| runAsNonRoot: true | ||
| runAsUser: 1000 | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
|
|
||
| ## container SecurityContext for Workspace Pods (MUTABLE) | ||
| ## - spec for SecurityContext: | ||
| ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#securitycontext-v1-core | ||
| ## | ||
| containerSecurityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| runAsNonRoot: true | ||
| runAsUser: 1000 | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
|
|
||
| ## ============================================================== | ||
| ## WORKSPACE OPTIONS | ||
| ## - options are the user-selectable fields, | ||
| ## they determine the PodSpec of the Workspace | ||
| ## ============================================================== | ||
| options: | ||
|
|
||
| ## | ||
| ## About the `values` fields: | ||
| ## - the `values` field is a list of options that the user can select | ||
| ## - elements of `values` can NOT be removed, only HIDDEN or REDIRECTED | ||
| ## - this prevents options being removed that are still in use by existing Workspaces | ||
| ## - this limitation may be removed in the future | ||
| ## - options may be "hidden" by setting `spawner.hidden` to `true` | ||
| ## - hidden options are NOT visible by default in the Spawner UI | ||
| ## - hidden options are still available to the controller and manually created Workspace resources | ||
| ## - options may be "redirected" by setting `redirect.to` to another option: | ||
| ## - redirected options are NOT shown in the Spawner UI | ||
| ## - redirected options are computed by the controller and shown in status fields | ||
| ## - users must explicitly update their Workspace via the API to apply redirects | ||
| ## - the Spawner UI will warn users about Workspaces with pending restarts | ||
| ## | ||
|
|
||
| ## ============================================================ | ||
| ## IMAGE CONFIG OPTIONS | ||
| ## - SETS: image, imagePullPolicy, ports | ||
| ## ============================================================ | ||
| imageConfig: | ||
|
|
||
| ## spawner ui configs | ||
| ## | ||
| spawner: | ||
|
|
||
| ## the id of the default option | ||
| ## - this will be selected by default in the spawner ui | ||
| ## | ||
| default: "jupyter-scipy:v1.10.0" | ||
|
|
||
| ## the list of image configs that are available | ||
| ## | ||
| values: | ||
|
|
||
| ## ================================================================ | ||
| ## jupyter-scipy:v1.8.0 | ||
| ## ================================================================ | ||
| - id: "jupyter-scipy:v1.8.0" | ||
| spawner: | ||
| displayName: "jupyter-scipy:v1.8.0" | ||
| description: "JupyterLab, with SciPy Packages" | ||
| labels: | ||
| - key: "python_version" | ||
| value: "3.11.6" | ||
| ## NOTE: this option is hidden | ||
| hidden: true | ||
| redirect: | ||
| to: "jupyter-scipy:v1.9.2" | ||
| message: | ||
| level: "Info" | ||
| text: > | ||
| This update does not introduce any breaking changes in Python packages from SciPy. | ||
| However, the version of JupyterLab has been updated from 3.6.6 to 4.2.5. | ||
| spec: | ||
| ## the container image to use | ||
| ## | ||
| image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-scipy:v1.8.0" | ||
|
|
||
| ## the pull policy for the container image | ||
| ## - default: "IfNotPresent" | ||
| ## | ||
| imagePullPolicy: "IfNotPresent" | ||
|
|
||
| ## ports that the container listens on | ||
| ## - currently, only HTTP is supported for `protocol` | ||
| ## - currently, all ports use the same `httpProxy` settings | ||
| ## - if multiple ports are defined, the user will see multiple "Connect" buttons | ||
| ## in a dropdown menu on the Workspace overview page | ||
| ## | ||
| ports: | ||
| - id: "jupyterlab" | ||
| port: 8888 | ||
|
|
||
| ## ================================================================ | ||
| ## jupyter-scipy:v1.9.2 | ||
| ## ================================================================ | ||
| - id: "jupyter-scipy:v1.9.2" | ||
| spawner: | ||
| displayName: "jupyter-scipy:v1.9.2" | ||
| description: "JupyterLab, with SciPy Packages" | ||
| labels: | ||
| - key: "python_version" | ||
| value: "3.11.10" | ||
| redirect: | ||
| to: "jupyter-scipy:v1.10.0" | ||
| message: | ||
| level: "Info" | ||
| text: > | ||
| This update does not introduce any breaking changes in Python packages from SciPy. | ||
| spec: | ||
| image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-scipy:v1.9.2" | ||
| imagePullPolicy: "IfNotPresent" | ||
| ports: | ||
| - id: "jupyterlab" | ||
| port: 8888 | ||
|
|
||
| ## ================================================================ | ||
| ## jupyter-scipy:v1.10.0 | ||
| ## ================================================================ | ||
| - id: "jupyter-scipy:v1.10.0" | ||
| spawner: | ||
| displayName: "jupyter-scipy:v1.10.0" | ||
| description: "JupyterLab, with SciPy Packages" | ||
| labels: | ||
| - key: "python_version" | ||
| value: "3.11.11" | ||
| spec: | ||
| image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-scipy:v1.10.0" | ||
| imagePullPolicy: "IfNotPresent" | ||
| ports: | ||
| - id: "jupyterlab" | ||
| port: 8888 | ||
|
|
||
| ## ================================================================ | ||
| ## jupyter-pytorch-cuda-full:v1.9.2 | ||
| ## ================================================================ | ||
| - id: "jupyter-pytorch-cuda-full:v1.9.2" | ||
| spawner: | ||
| displayName: "jupyter-pytorch-cuda-full:v1.9.2" | ||
| description: "JupyterLab, with PyTorch (CUDA), and Common Python Packages" | ||
| labels: | ||
| - key: "python_version" | ||
| value: "3.11.10" | ||
| - key: "pytorch_version" | ||
| value: "2.3.1" | ||
| - key: "cuda_version" | ||
| value: "12.1" | ||
| - key: "nccl_version" | ||
| value: "2.20.5" | ||
| redirect: | ||
| to: "jupyter-pytorch-cuda-full:v1.10.0" | ||
| message: | ||
| level: "Warning" | ||
| text: > | ||
| This update changes the version of PyTorch from 2.3.1 to 2.5.1. | ||
| This only breaking change in a common python package is xgboost, which updated from 1.7.6 to 2.1.4. | ||
| spec: | ||
| image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-pytorch-cuda-full:v1.9.2" | ||
| imagePullPolicy: "IfNotPresent" | ||
| ports: | ||
| - id: "jupyterlab" | ||
| port: 8888 | ||
|
|
||
| ## ================================================================ | ||
| ## jupyter-pytorch-cuda-full:v1.10.0 | ||
| ## ================================================================ | ||
| - id: "jupyter-pytorch-cuda-full:v1.10.0" | ||
| spawner: | ||
| displayName: "jupyter-pytorch-cuda-full:v1.10.0" | ||
| description: "JupyterLab, with PyTorch (CUDA), and Common Python Packages" | ||
| labels: | ||
| - key: "python_version" | ||
| value: "3.11.11" | ||
| - key: "pytorch_version" | ||
| value: "2.5.1" | ||
| - key: "cuda_version" | ||
| value: "12.4" | ||
| - key: "nccl_version" | ||
| value: "2.21.5" | ||
| spec: | ||
| image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-pytorch-cuda-full:v1.10.0" | ||
| imagePullPolicy: "IfNotPresent" | ||
| ports: | ||
| - id: "jupyterlab" | ||
| port: 8888 | ||
|
|
||
| ## ============================================================ | ||
| ## POD CONFIG OPTIONS | ||
| ## - SETS: affinity, nodeSelector, tolerations, resources | ||
| ## ============================================================ | ||
| podConfig: | ||
|
|
||
| ## spawner ui configs | ||
| ## | ||
| spawner: | ||
|
|
||
| ## the id of the default option | ||
| ## - this will be selected by default in the spawner ui | ||
| ## | ||
| default: "tiny_cpu" | ||
|
|
||
| ## the list of pod configs that are available | ||
| ## | ||
| values: | ||
|
|
||
| ## ================================================================ | ||
| ## EXAMPLE 1: a tiny CPU pod | ||
| ## ================================================================ | ||
| - id: "tiny_cpu" | ||
| spawner: | ||
| displayName: "Tiny CPU" | ||
| description: "Pod with 0.1 CPU, 128 Mb RAM" | ||
| labels: | ||
| - key: "cpu" | ||
| value: "100m" | ||
| - key: "memory" | ||
| value: "128Mi" | ||
| spec: | ||
| ## affinity configs for the pod | ||
| ## - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#affinity-v1-core | ||
| ## | ||
| affinity: {} | ||
|
|
||
| ## node selector configs for the pod | ||
| ## - https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector | ||
| ## | ||
| nodeSelector: {} | ||
|
|
||
| ## toleration configs for the pod | ||
| ## - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#toleration-v1-core | ||
| ## | ||
| tolerations: [] | ||
|
|
||
| ## resource configs for the "main" container in the pod | ||
| ## - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#resourcerequirements-v1-core | ||
| ## | ||
| resources: | ||
| requests: | ||
| cpu: 100m | ||
| memory: 128Mi | ||
|
|
||
| ## ================================================================ | ||
| ## EXAMPLE 2: a small CPU pod | ||
| ## ================================================================ | ||
| - id: "small_cpu" | ||
| spawner: | ||
| displayName: "Small CPU" | ||
| description: "Pod with 1 CPU, 2 GB RAM" | ||
| labels: | ||
| - key: "cpu" | ||
| value: "1000m" | ||
| - key: "memory" | ||
| value: "2Gi" | ||
| spec: | ||
| resources: | ||
| requests: | ||
| cpu: 1000m | ||
| memory: 2Gi | ||
|
|
||
| ## ================================================================ | ||
| ## EXAMPLE 3: a big GPU pod | ||
| ## ================================================================ | ||
| - id: "big_gpu" | ||
| spawner: | ||
| displayName: "Big GPU" | ||
| description: "Pod with 4 CPU, 16 GB RAM, and 1 GPU" | ||
| labels: | ||
| - key: "cpu" | ||
| value: "4000m" | ||
| - key: "memory" | ||
| value: "16Gi" | ||
| - key: "gpu" | ||
| value: "1" | ||
| spec: | ||
| affinity: {} | ||
| nodeSelector: {} | ||
| resources: | ||
| requests: | ||
| cpu: 4000m | ||
| memory: 16Gi | ||
| limits: | ||
| nvidia.com/gpu: 1 | ||
| tolerations: | ||
| - key: "nvidia.com/gpu" | ||
| operator: "Exists" | ||
| effect: "NoSchedule" |
| CLIENT_OVERRIDES=$(cat <<EOF | ||
| { | ||
| "spec": { | ||
| "securityContext": { | ||
| "runAsNonRoot": true, | ||
| "runAsUser": 1000, | ||
| "seccompProfile": { | ||
| "type": "RuntimeDefault" | ||
| } | ||
| }, | ||
| "containers": [ | ||
| { | ||
| "name": "test-client", | ||
| "image": "busybox", | ||
| "stdin": true, | ||
| "tty": true, | ||
| "securityContext": { | ||
| "allowPrivilegeEscalation": false, | ||
| "capabilities": { | ||
| "drop": ["ALL"] | ||
| }, | ||
| "runAsNonRoot": true, | ||
| "runAsUser": 1000, | ||
| "seccompProfile": { | ||
| "type": "RuntimeDefault" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| EOF | ||
| ) | ||
| kubectl run test-client --image=busybox --rm -i --restart=Never -n $KF_PROFILE \ | ||
| --overrides="${CLIENT_OVERRIDES}" -- \ |
| - tests/kubeflow_profile_install.sh | ||
| - tests/PSS_enable.sh |
| configMapGenerator: | ||
| - behavior: merge | ||
| files: | ||
| - katib-config.yaml | ||
| name: katib-config | ||
| options: | ||
| disableNameSuffixHash: true |
| - name: pytorch | ||
| image: docker.io/kubeflowkatib/pytorch-mnist:v1beta1-45c5727 | ||
| imagePullPolicy: Always | ||
| workingDir: /tmp |
| - name: init-pytorch | ||
| image: alpine:3.18 | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| runAsNonRoot: true | ||
| runAsUser: 1000 | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| command: | ||
| - sh | ||
| - -c | ||
| - until nslookup pytorch-simple-master-0; do echo waiting for master; sleep 2; done; |
…t check - Revert Istio upgrade completely to keep stable GA 1.29.2 in production manifests, eliminating release-candidate and RC guard discrepancy. - Replace duplicate jupyterlab_v1beta1_workspacekind.yaml file with dynamic Kustomize strategic merge patch overlay under tests/workspaces-kustomization. - Refactor test-client overrides in istio_validation.yaml to a single-line JSON string containing all required PSS Restricted fields, resolving the YAML linter failure. - Update tests/training_operator_job.yaml to pin and consolidate initContainers image from alpine:3.18 to stable busybox:1.36.1, reducing pull overhead. - Add descriptive developer comments explaining workingDir: /tmp workarounds and injectSecurityContext: true PSS overlays across test manifests. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
| # Katib Config patch to enable PSS compliance. | ||
| # injectSecurityContext: true clones securityContext properties from primary | ||
| # trial containers to sidecar metrics-collectors, preventing admission blocks. | ||
| apiVersion: config.kubeflow.org/v1beta1 | ||
| kind: KatibConfig | ||
| init: | ||
| controller: | ||
| webhookPort: 8443 | ||
| injectSecurityContext: true |
| kubectl label namespace $KF_PROFILE pod-security.kubernetes.io/enforce=baseline --overwrite | ||
| kubectl label namespace $KF_PROFILE \ | ||
| pod-security.kubernetes.io/enforce=restricted \ | ||
| pod-security.kubernetes.io/enforce-version=v1.29 \ |
| kubectl label namespace $KF_PROFILE \ | ||
| pod-security.kubernetes.io/enforce=restricted \ | ||
| pod-security.kubernetes.io/enforce-version=v1.29 \ | ||
| --overwrite |
- Restored Istio 1.30.0-rc.0 to regain native PSS Restricted compatibility for injected sidecars. - Documented Istio version requirement in README.md to clarify 1.30+ is mandatory for PSS. - Refactored istio_dummy_deployment.yaml to use unprivileged nginx image, natively satisfying PSS Restricted policies. - Addressed label race condition with the Profile Controller in kubeflow_profile_install.sh by implementing a robust retry mechanism. - Resolved race condition in workspaces_pipeline_run_test.sh by adding sleep synchronization for the Notebook controller to observe the updated WorkspaceKind overlay. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
…S in CI - Upgraded Istio to stable GA 1.30.0 by running the synchronized script. - Reconstructed the Katib overlay katib-config.yaml to restore full default configuration with injectSecurityContext: true for PSS Restricted support. - Dynamically patched Profiles Controller namespace-labels ConfigMap in profile_controller_install.sh to natively enforce PSS restricted on user namespaces in CI. - Simplified kubeflow_profile_install.sh to remove the retry loop in favor of a clean verification check. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
…rofiles controller setup Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
…el checks - Removed internal certGenerator from Katib overlay config to use cert-manager for webhook certificate generation, preventing tls x509 verification errors. - Adjusted kubeflow_profile_install.sh validation check to support both restricted and privileged native labels, preventing insecure mode test failures. Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
|
@abdullahpathan22, don't modify manifests under the applications folder, as those manifests are synchronised from upstream repos. For example, the trainer manifest is synchronised and pulled into the application folder from the upstream trainer repo. If we make any changes under the applications folder, then it will be lost and overwritten when someone synchronises upstream manifests using the respective synchronisation scripts for trainer. And don't try to address all comments at once, do it one by one, same for CI checks |
|
Ok i will do it according. |
|
Please rebase soon after #3467 is merged |
What this PR does
Modifies
tests/kubeflow_profile_install.shto overwrite thekubeflow-user-example-com namespace label to
enforce: restrictedexclusively during CI testing.
Why
The Profile Controller sets
enforce: baselineby default forcustomer deployments. This change overwrites that label in CI only,
ensuring test workloads are verified under strict PSS restricted
enforcement without affecting production deployments.