Skip to content

test: enforce PSS restricted for CI user namespace#3444

Open
abdullahpathan22 wants to merge 7 commits into
kubeflow:masterfrom
abdullahpathan22:fix/pss-ci-restricted
Open

test: enforce PSS restricted for CI user namespace#3444
abdullahpathan22 wants to merge 7 commits into
kubeflow:masterfrom
abdullahpathan22:fix/pss-ci-restricted

Conversation

@abdullahpathan22
Copy link
Copy Markdown
Contributor

@abdullahpathan22 abdullahpathan22 commented Apr 9, 2026

What this PR does

Modifies tests/kubeflow_profile_install.sh to overwrite the
kubeflow-user-example-com namespace label to enforce: restricted
exclusively during CI testing.

Why

The Profile Controller sets enforce: baseline by default for
customer deployments. This change overwrites that label in CI only,
ensuring test workloads are verified under strict PSS restricted
enforcement without affecting production deployments.

Copilot AI review requested due to automatic review settings April 9, 2026 20:57
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kimwnasptd for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abdullahpathan22
Copy link
Copy Markdown
Contributor Author

Hello @juliusvonkohout,
I have added back the workflow trigger improvements from the previous PR. Specifically:
Broadened katib triggers: tests/katib_install.sh → tests/katib*
Broadened pipeline triggers: individual test files → tests/pipeline*
Added tests/pipeline* trigger to pipeline_run_from_notebook workflow
Replaced dead experimental/security/PSS/* path (directory no longer exists in the repo) with actual test files: tests/kubeflow_profile_install.sh and tests/PSS_enable.sh across all affected workflows
Note: I kept the current directory paths (e.g. applications/dashboard/) instead of the restructured paths (applications/profiles/, applications/admission-webhook/**) since those directories don't exist yet on master.
Happy to adjust if you prefer adding the future paths now.

@juliusvonkohout
Copy link
Copy Markdown
Member

Thank you, please try to fix the tests.

@juliusvonkohout
Copy link
Copy Markdown
Member

@abdullahpathan22 do you plan to continue here ? Otherwise @Raakshass could take over.

@juliusvonkohout
Copy link
Copy Markdown
Member

YOu probably have to wait for #3463 being implemented

@abdullahpathan22
Copy link
Copy Markdown
Contributor Author

@abdullahpathan22 do you plan to continue here ? Otherwise @Raakshass could take over.

Yeah i am fixing this PR

@danish9039
Copy link
Copy Markdown
Member

/retest

@google-oss-prow google-oss-prow Bot added size/XL and removed size/S labels May 17, 2026
@abdullahpathan22
Copy link
Copy Markdown
Contributor Author

abdullahpathan22 commented May 17, 2026

Hi all! I have updated this PR to resolve the failing CI integration tests by cleanly merging the latest master and the unblocking Istio 1.30 upgrade.

@abdullahpathan22 abdullahpathan22 force-pushed the fix/pss-ci-restricted branch 2 times, most recently from 2aa11d7 to ac0cfc3 Compare May 17, 2026 16:04
- Enforce PSS restricted labels on user namespace during CI tests
  via kubeflow_profile_install.sh
- Update workflow triggers for broader PSS test coverage across
  katib, pipeline, trainer, training-operator, and dex workflows
- Add PSS-compliant securityContext and workingDir to test Notebook
  and Katib trial manifests to prevent permission issues
- Add seccompProfile to JupyterLab WorkspaceKind sample
- Add PSS-compliant overrides to istio_validation test-client pod
- Upgrade Istio manifests from 1.29 to 1.30.0-rc.0 for native
  PSS Restricted compatibility (CRDs, install, sidecar injector,
  cluster-local-gateway, ztunnel, profile)

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
@abdullahpathan22 abdullahpathan22 force-pushed the fix/pss-ci-restricted branch from 9faf80a to 8f7f127 Compare May 17, 2026 16:08
@juliusvonkohout juliusvonkohout requested review from Copilot and removed request for Copilot May 17, 2026 16:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

The PR's stated purpose is to enforce PSS restricted on the kubeflow-user-example-com namespace during CI by relabeling it in tests/kubeflow_profile_install.sh. However, the diff is substantially broader: it also bumps Istio from 1.29.2 to the release‑candidate 1.30.0-rc.0 across the Istio manifests, adds pod/container securityContext hardening to several test and upstream sample/runtime manifests, and changes path triggers in six GitHub Actions workflows.

Changes:

  • Relabel CI namespace to PSS restricted (with enforce-version=latest) in tests/kubeflow_profile_install.sh.
  • Upgrade Istio manifests/CRDs/install to 1.30.0-rc.0.
  • Add PSS-restricted-compatible securityContext to test job manifests, an upstream trainer runtime, and an upstream workspace sample; expand workflow trigger paths.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/kubeflow_profile_install.sh Relabels CI namespace to PSS restricted with enforce-version=latest.
tests/training_operator_job.yaml Adds pod/container securityContext to both PyTorchJob replica specs.
tests/notebook.test.kubeflow-user-example.com.yaml Adds pod/container securityContext for the test notebook.
tests/katib_test.yaml Adds securityContext to the trial container/pod.
applications/trainer/upstream/base/runtimes/torch_distributed.yaml Adds securityContext to upstream-synced runtime (modifies an /upstream/ path).
applications/workspaces/upstream/controller/samples/jupyterlab_v1beta1_workspacekind.yaml Adds seccompProfile to upstream-synced sample.
scripts/synchronize-istio-manifests.sh Bumps COMMIT to RC 1.30.0-rc.0.
README.md Updates Istio version in components table.
common/istio/profile.yaml Updates Istio tag to RC.
common/istio/istio-install/base/install.yaml Regenerated Istio install for 1.30.0-rc.0 (new RBAC entry, env vars, volumes).
common/istio/istio-install/base/patches/istio-sidecar-injector-patch.yaml Updates injector tag.
common/istio/istio-install/overlays/insecure/configmap-patch.yaml Updates ConfigMap tag.
common/istio/istio-install/components/ambient-mode/ztunnel.yaml Updates ztunnel image/chart labels to RC.
common/istio/istio-crds/base/crd.yaml Regenerated CRDs (adds TrafficExtension, notTrustDomains, disableContextPropagation, fixes port 65535 rule).
common/istio/cluster-local-gateway/base/cluster-local-gateway.yaml Updates labels/image to RC.
.github/workflows/{katib,pipeline,pipeline_run_from_notebook,dex_oauth2-proxy,trainer,training_operator}_test.yaml Replaces experimental/security/PSS/* trigger with tests/kubeflow_profile_install.sh + tests/PSS_enable.sh; broadens some globs.
.github/workflows/istio_validation.yaml Adds inline --overrides JSON to kubectl run for PSS compatibility.

Comment thread tests/kubeflow_profile_install.sh Outdated
Comment on lines +9 to +12
kubectl label namespace $KF_PROFILE \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest \
--overwrite
Comment thread scripts/synchronize-istio-manifests.sh Outdated
Comment on lines +8 to +9
COMMIT="1.30.0-rc.0"
PREVIOUS_COMMIT="1.29.2"
Comment on lines +22 to +37
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: node
image: pytorch/pytorch:2.10.0-cuda12.8-cudnn9-runtime
workingDir: /tmp
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
add: []
runAsNonRoot: true
Comment on lines +211 to +212
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
add: []
drop:
- ALL
add: []
runAsNonRoot: true
Comment thread .github/workflows/istio_validation.yaml Outdated
Comment on lines +255 to +256
kubectl run test-client --image=busybox --rm -i --restart=Never -n $KF_PROFILE \
--overrides='{"spec": {"securityContext": {"runAsNonRoot": true, "runAsUser": 1000, "seccompProfile": {"type": "RuntimeDefault"}}, "containers": [{"name": "test-client", "image": "busybox", "securityContext": {"allowPrivilegeEscalation": false, "capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "runAsUser": 1000}}]}}' -- \
- tests/pipeline_v1_test.py
- tests/pipeline_v2_test.py
- experimental/security/PSS/*
- tests/pipeline*
@@ -7,7 +7,8 @@ on:
- common/cert-manager/**
- common/oauth2-proxy/**
- common/istio*/**
Comment thread tests/kubeflow_profile_install.sh Outdated
kubectl label namespace $KF_PROFILE pod-security.kubernetes.io/enforce=baseline --overwrite
kubectl label namespace $KF_PROFILE \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest \
@danish9039
Copy link
Copy Markdown
Member

danish9039 commented May 17, 2026

@abdullahpathan22, some checks are still failing. Also, please address all the Copilot review comments as well

@abdullahpathan22
Copy link
Copy Markdown
Contributor Author

Yeah sure working on it!

…ew points

- Pin Istio in synchronize script to stable GA 1.30.0 and reject pre-releases.
- Create local Kustomize overlays for Katib and Trainer to avoid modifying upstream manifests directly.
- Revert all manual changes inside upstream directories (torch_distributed.yaml and jupyterlab_v1beta1_workspacekind.yaml).
- Harden container-level securityContext parameters (runAsUser, seccompProfile) explicitly across all test and runtime manifests.
- Fix Training Operator worker timeout by explicitly defining a custom init-pytorch container under replica specs.
- Fix Katib experiment timeout by enabling injectSecurityContext: true inside Katib's local ConfigMap patch.
- Refactor kubectl run overrides in istio_validation.yaml to use a structured multi-line heredoc variable with stdin/tty/seccomp parameters.
- Pin PSS enforce-version in profile install to stable v1.29.

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 15 comments.

Comment thread tests/kubeflow_profile_install.sh Outdated
Comment on lines +9 to +12
kubectl label namespace $KF_PROFILE \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=v1.29 \
--overwrite
Comment thread common/istio/profile.yaml Outdated
hub: registry.istio.io/release
profile: default
tag: 1.29.2
tag: 1.30.0-rc.0
Comment on lines +8 to +13
COMMIT="1.30.0"
PREVIOUS_COMMIT="1.29.2"
if [[ "${COMMIT}" =~ -(rc|beta|alpha)([.-]|$) ]]; then
echo "Refusing to synchronize pre-release Istio version: ${COMMIT}. Pin COMMIT to a stable GA release tag."
exit 1
fi
Comment thread README.md Outdated
| Kubeflow Hub | applications/hub/upstream | [v0.3.9](https://github.com/kubeflow/hub/tree/v0.3.9/manifests/kustomize) | 510m | 2112Mi | 20GB |
| Spark Operator | applications/spark/spark-operator | [2.5.0](https://github.com/kubeflow/spark-operator/tree/v2.5.0) | 9m | 41Mi | 0GB |
| Istio | common/istio | [1.29.2](https://github.com/istio/istio/releases/tag/1.29.2) | 750m | 2364Mi | 0GB |
| Istio | common/istio | [1.30.0-rc.0](https://github.com/istio/istio/releases/tag/1.30.0-rc.0) | 750m | 2364Mi | 0GB |
Comment on lines +1 to +503
apiVersion: kubeflow.org/v1beta1
kind: WorkspaceKind
metadata:
name: jupyterlab
spec:
## ================================================================
## SPAWNER CONFIGS
## - how the WorkspaceKind is displayed in the Workspace Spawner UI
## ================================================================
spawner:

## the display name of the WorkspaceKind
displayName: "JupyterLab Notebook"

## the description of the WorkspaceKind
description: "A Workspace which runs JupyterLab in a Pod"

## if this WorkspaceKind should be hidden from the Workspace Spawner UI
hidden: false

## if this WorkspaceKind is deprecated
deprecated: false

## a message to show in Workspace Spawner UI when the WorkspaceKind is deprecated
#deprecationMessage: "This WorkspaceKind will be removed on 20XX-XX-XX, please use another WorkspaceKind."

## the icon of the WorkspaceKind
## - a small (favicon-sized) icon used in the Workspace Spawner UI
##
icon:
url: "https://jupyter.org/assets/favicons/apple-touch-icon-152x152.png"
#configMap:
# name: "my-logos"
# key: "apple-touch-icon-152x152.png"

## the logo of the WorkspaceKind
## - a 1:1 (card size) logo used in the Workspace Spawner UI
##
logo:
url: "https://upload.wikimedia.org/wikipedia/commons/3/38/Jupyter_logo.svg"
#configMap:
# name: "my-logos"
# key: "Jupyter_logo.svg"

## ================================================================
## DEFINITION CONFIGS
## - currently the only supported type is `podTemplate`
## - in the future, there will be MORE types like `virtualMachine`
## to run the Workspace on systems like KubeVirt/EC2 rather than in a Pod
## ================================================================
podTemplate:

## metadata for Workspace Pods (MUTABLE)
##
podMetadata:
labels:
my-workspace-kind-label: "my-value"
annotations:
my-workspace-kind-annotation: "my-value"

## service account configs for Workspace Pods
##
serviceAccount:

## the name of the ServiceAccount (NOT MUTABLE)
## - this Service Account MUST already exist in the Namespace
## of the Workspace, the controller will NOT create it
## - we will not show this WorkspaceKind in the Spawner UI
## if the SA does not exist in the Namespace
##
name: "default-editor"

## activity culling configs (MUTABLE)
## - for pausing inactive Workspaces
##
culling:

## if the culling feature is enabled
##
enabled: true

## the maximum number of seconds a Workspace can be inactive
##
maxInactiveSeconds: 86400

## the probe used to determine if the Workspace is active
##
activityProbe:

## OPTION 1: a shell command probe
## - if the Workspace had activity in the last 60 seconds this command
## should return status 0, otherwise it should return status 1
##
#exec:
# command:
# - "bash"
# - "-c"
# - "exit 0"

## OPTION 2: a Jupyter-specific probe
## - will poll the `/api/status` endpoint of the Jupyter API, and use the `last_activity` field
## https://github.com/jupyter-server/jupyter_server/blob/v2.13.0/jupyter_server/services/api/handlers.py#L62-L67
## - note, users need to be careful that their other probes don't trigger a "last_activity" update
## e.g. they should only check the health of Jupyter using the `/api/status` endpoint
##
jupyter:
lastActivity: true

## standard probes to determine Container health (MUTABLE)
## - spec for Probe:
## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#probe-v1-core
##
probes:

## startup probe for the "main" container
##
#startupProbe:
# ...

## liveness probe for the "main" container
##
#livenessProbe:
# ...

## readiness probe for the "main" container
##
#readinessProbe:
# ...

## volume mount paths
##
volumeMounts:

## the path to mount the home PVC (NOT MUTABLE)
##
home: "/home/jovyan"

## port definitions which can be referenced in image config values (MUTABLE)
## - think of port definitions as the "types" of services which could be provided by a specific image
## - a port definition has a common id (URL path) for consistency if the listening TCP port changes
## - ports are referenced in image config values by their `id` and their definition here establishes
## their protocol type, and default display name in the UI
##
ports:

- id: "jupyterlab"
defaultDisplayName: "JupyterLab"
protocol: "HTTP"

## http proxy configs (MUTABLE)
## only "HTTP" protocol ports are supported
##
httpProxy:

## if the path prefix is stripped from incoming HTTP requests
## - if true, the '/workspace/connect/{profile_name}/{workspace_name}/' path prefix
## is stripped from incoming requests, the application sees the request
## as if it was made to '/...'
## - this only works if the application serves RELATIVE URLs for its assets
##
removePathPrefix: false

## header manipulation rules for incoming HTTP requests
## - sets the `spec.http[].headers.request` of the Istio VirtualService
## https://istio.io/latest/docs/reference/config/networking/virtual-service/#Headers-HeaderOperations
## - the following string templates are available:
## - `.PathPrefix`: the path prefix of the Workspace (e.g. '/workspace/connect/{profile_name}/{workspace_name}/')
##
requestHeaders: {}

## environment variables for Workspace Pods (MUTABLE)
## - spec for EnvVar:
## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvar-v1-core
## - the following go template functions are available:
## - `httpPathPrefix(portId string)`: returns the HTTP path prefix of the specified port
##
extraEnv:

## to enable backwards compatibility with old Jupyter images from Kubeflow Notebooks V1
## https://github.com/kubeflow/kubeflow/blob/v1.8.0/components/example-notebook-servers/jupyter/s6/services.d/jupyterlab/run#L12
- name: "NB_PREFIX"
value: |-
{{ httpPathPrefix "jupyterlab" }}

## extra volume mounts for Workspace Pods (MUTABLE)
## - spec for VolumeMount:
## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#volumemount-v1-core
##
extraVolumeMounts:

## frameworks like PyTorch use shared memory for inter-process communication and expect a tmpfs at /dev/shm
## https://en.wikipedia.org/wiki/Shared_memory
- name: "dshm"
mountPath: "/dev/shm"

## extra volumes for Workspace Pods (MUTABLE)
## - spec for Volume:
## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#volume-v1-core
##
extraVolumes:
- name: "dshm"
emptyDir:
medium: "Memory"

## security context for Workspace Pods (MUTABLE)
## - spec for PodSecurityContext:
## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#podsecuritycontext-v1-core
##
securityContext:
fsGroup: 100
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault

## container SecurityContext for Workspace Pods (MUTABLE)
## - spec for SecurityContext:
## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#securitycontext-v1-core
##
containerSecurityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault

## ==============================================================
## WORKSPACE OPTIONS
## - options are the user-selectable fields,
## they determine the PodSpec of the Workspace
## ==============================================================
options:

##
## About the `values` fields:
## - the `values` field is a list of options that the user can select
## - elements of `values` can NOT be removed, only HIDDEN or REDIRECTED
## - this prevents options being removed that are still in use by existing Workspaces
## - this limitation may be removed in the future
## - options may be "hidden" by setting `spawner.hidden` to `true`
## - hidden options are NOT visible by default in the Spawner UI
## - hidden options are still available to the controller and manually created Workspace resources
## - options may be "redirected" by setting `redirect.to` to another option:
## - redirected options are NOT shown in the Spawner UI
## - redirected options are computed by the controller and shown in status fields
## - users must explicitly update their Workspace via the API to apply redirects
## - the Spawner UI will warn users about Workspaces with pending restarts
##

## ============================================================
## IMAGE CONFIG OPTIONS
## - SETS: image, imagePullPolicy, ports
## ============================================================
imageConfig:

## spawner ui configs
##
spawner:

## the id of the default option
## - this will be selected by default in the spawner ui
##
default: "jupyter-scipy:v1.10.0"

## the list of image configs that are available
##
values:

## ================================================================
## jupyter-scipy:v1.8.0
## ================================================================
- id: "jupyter-scipy:v1.8.0"
spawner:
displayName: "jupyter-scipy:v1.8.0"
description: "JupyterLab, with SciPy Packages"
labels:
- key: "python_version"
value: "3.11.6"
## NOTE: this option is hidden
hidden: true
redirect:
to: "jupyter-scipy:v1.9.2"
message:
level: "Info"
text: >
This update does not introduce any breaking changes in Python packages from SciPy.
However, the version of JupyterLab has been updated from 3.6.6 to 4.2.5.
spec:
## the container image to use
##
image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-scipy:v1.8.0"

## the pull policy for the container image
## - default: "IfNotPresent"
##
imagePullPolicy: "IfNotPresent"

## ports that the container listens on
## - currently, only HTTP is supported for `protocol`
## - currently, all ports use the same `httpProxy` settings
## - if multiple ports are defined, the user will see multiple "Connect" buttons
## in a dropdown menu on the Workspace overview page
##
ports:
- id: "jupyterlab"
port: 8888

## ================================================================
## jupyter-scipy:v1.9.2
## ================================================================
- id: "jupyter-scipy:v1.9.2"
spawner:
displayName: "jupyter-scipy:v1.9.2"
description: "JupyterLab, with SciPy Packages"
labels:
- key: "python_version"
value: "3.11.10"
redirect:
to: "jupyter-scipy:v1.10.0"
message:
level: "Info"
text: >
This update does not introduce any breaking changes in Python packages from SciPy.
spec:
image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-scipy:v1.9.2"
imagePullPolicy: "IfNotPresent"
ports:
- id: "jupyterlab"
port: 8888

## ================================================================
## jupyter-scipy:v1.10.0
## ================================================================
- id: "jupyter-scipy:v1.10.0"
spawner:
displayName: "jupyter-scipy:v1.10.0"
description: "JupyterLab, with SciPy Packages"
labels:
- key: "python_version"
value: "3.11.11"
spec:
image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-scipy:v1.10.0"
imagePullPolicy: "IfNotPresent"
ports:
- id: "jupyterlab"
port: 8888

## ================================================================
## jupyter-pytorch-cuda-full:v1.9.2
## ================================================================
- id: "jupyter-pytorch-cuda-full:v1.9.2"
spawner:
displayName: "jupyter-pytorch-cuda-full:v1.9.2"
description: "JupyterLab, with PyTorch (CUDA), and Common Python Packages"
labels:
- key: "python_version"
value: "3.11.10"
- key: "pytorch_version"
value: "2.3.1"
- key: "cuda_version"
value: "12.1"
- key: "nccl_version"
value: "2.20.5"
redirect:
to: "jupyter-pytorch-cuda-full:v1.10.0"
message:
level: "Warning"
text: >
This update changes the version of PyTorch from 2.3.1 to 2.5.1.
This only breaking change in a common python package is xgboost, which updated from 1.7.6 to 2.1.4.
spec:
image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-pytorch-cuda-full:v1.9.2"
imagePullPolicy: "IfNotPresent"
ports:
- id: "jupyterlab"
port: 8888

## ================================================================
## jupyter-pytorch-cuda-full:v1.10.0
## ================================================================
- id: "jupyter-pytorch-cuda-full:v1.10.0"
spawner:
displayName: "jupyter-pytorch-cuda-full:v1.10.0"
description: "JupyterLab, with PyTorch (CUDA), and Common Python Packages"
labels:
- key: "python_version"
value: "3.11.11"
- key: "pytorch_version"
value: "2.5.1"
- key: "cuda_version"
value: "12.4"
- key: "nccl_version"
value: "2.21.5"
spec:
image: "ghcr.io/kubeflow/kubeflow/notebook-servers/jupyter-pytorch-cuda-full:v1.10.0"
imagePullPolicy: "IfNotPresent"
ports:
- id: "jupyterlab"
port: 8888

## ============================================================
## POD CONFIG OPTIONS
## - SETS: affinity, nodeSelector, tolerations, resources
## ============================================================
podConfig:

## spawner ui configs
##
spawner:

## the id of the default option
## - this will be selected by default in the spawner ui
##
default: "tiny_cpu"

## the list of pod configs that are available
##
values:

## ================================================================
## EXAMPLE 1: a tiny CPU pod
## ================================================================
- id: "tiny_cpu"
spawner:
displayName: "Tiny CPU"
description: "Pod with 0.1 CPU, 128 Mb RAM"
labels:
- key: "cpu"
value: "100m"
- key: "memory"
value: "128Mi"
spec:
## affinity configs for the pod
## - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#affinity-v1-core
##
affinity: {}

## node selector configs for the pod
## - https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
##
nodeSelector: {}

## toleration configs for the pod
## - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#toleration-v1-core
##
tolerations: []

## resource configs for the "main" container in the pod
## - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#resourcerequirements-v1-core
##
resources:
requests:
cpu: 100m
memory: 128Mi

## ================================================================
## EXAMPLE 2: a small CPU pod
## ================================================================
- id: "small_cpu"
spawner:
displayName: "Small CPU"
description: "Pod with 1 CPU, 2 GB RAM"
labels:
- key: "cpu"
value: "1000m"
- key: "memory"
value: "2Gi"
spec:
resources:
requests:
cpu: 1000m
memory: 2Gi

## ================================================================
## EXAMPLE 3: a big GPU pod
## ================================================================
- id: "big_gpu"
spawner:
displayName: "Big GPU"
description: "Pod with 4 CPU, 16 GB RAM, and 1 GPU"
labels:
- key: "cpu"
value: "4000m"
- key: "memory"
value: "16Gi"
- key: "gpu"
value: "1"
spec:
affinity: {}
nodeSelector: {}
resources:
requests:
cpu: 4000m
memory: 16Gi
limits:
nvidia.com/gpu: 1
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
Comment thread .github/workflows/istio_validation.yaml Outdated
Comment on lines +255 to +289
CLIENT_OVERRIDES=$(cat <<EOF
{
"spec": {
"securityContext": {
"runAsNonRoot": true,
"runAsUser": 1000,
"seccompProfile": {
"type": "RuntimeDefault"
}
},
"containers": [
{
"name": "test-client",
"image": "busybox",
"stdin": true,
"tty": true,
"securityContext": {
"allowPrivilegeEscalation": false,
"capabilities": {
"drop": ["ALL"]
},
"runAsNonRoot": true,
"runAsUser": 1000,
"seccompProfile": {
"type": "RuntimeDefault"
}
}
}
]
}
}
EOF
)
kubectl run test-client --image=busybox --rm -i --restart=Never -n $KF_PROFILE \
--overrides="${CLIENT_OVERRIDES}" -- \
Comment on lines +17 to +18
- tests/kubeflow_profile_install.sh
- tests/PSS_enable.sh
Comment on lines +5 to +11
configMapGenerator:
- behavior: merge
files:
- katib-config.yaml
name: katib-config
options:
disableNameSuffixHash: true
- name: pytorch
image: docker.io/kubeflowkatib/pytorch-mnist:v1beta1-45c5727
imagePullPolicy: Always
workingDir: /tmp
Comment on lines +77 to +91
- name: init-pytorch
image: alpine:3.18
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
command:
- sh
- -c
- until nslookup pytorch-simple-master-0; do echo waiting for master; sleep 2; done;
…t check

- Revert Istio upgrade completely to keep stable GA 1.29.2 in production manifests, eliminating release-candidate and RC guard discrepancy.
- Replace duplicate jupyterlab_v1beta1_workspacekind.yaml file with dynamic Kustomize strategic merge patch overlay under tests/workspaces-kustomization.
- Refactor test-client overrides in istio_validation.yaml to a single-line JSON string containing all required PSS Restricted fields, resolving the YAML linter failure.
- Update tests/training_operator_job.yaml to pin and consolidate initContainers image from alpine:3.18 to stable busybox:1.36.1, reducing pull overhead.
- Add descriptive developer comments explaining workingDir: /tmp workarounds and injectSecurityContext: true PSS overlays across test manifests.

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comment on lines +1 to +9
# Katib Config patch to enable PSS compliance.
# injectSecurityContext: true clones securityContext properties from primary
# trial containers to sidecar metrics-collectors, preventing admission blocks.
apiVersion: config.kubeflow.org/v1beta1
kind: KatibConfig
init:
controller:
webhookPort: 8443
injectSecurityContext: true
Comment thread tests/kubeflow_profile_install.sh Outdated
kubectl label namespace $KF_PROFILE pod-security.kubernetes.io/enforce=baseline --overwrite
kubectl label namespace $KF_PROFILE \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=v1.29 \
Comment thread tests/kubeflow_profile_install.sh Outdated
Comment on lines +9 to +12
kubectl label namespace $KF_PROFILE \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=v1.29 \
--overwrite
- Restored Istio 1.30.0-rc.0 to regain native PSS Restricted compatibility for injected sidecars.
- Documented Istio version requirement in README.md to clarify 1.30+ is mandatory for PSS.
- Refactored istio_dummy_deployment.yaml to use unprivileged nginx image, natively satisfying PSS Restricted policies.
- Addressed label race condition with the Profile Controller in kubeflow_profile_install.sh by implementing a robust retry mechanism.
- Resolved race condition in workspaces_pipeline_run_test.sh by adding sleep synchronization for the Notebook controller to observe the updated WorkspaceKind overlay.

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
@google-oss-prow google-oss-prow Bot added size/XL and removed size/L labels May 19, 2026
…S in CI

- Upgraded Istio to stable GA 1.30.0 by running the synchronized script.
- Reconstructed the Katib overlay katib-config.yaml to restore full default configuration with injectSecurityContext: true for PSS Restricted support.
- Dynamically patched Profiles Controller namespace-labels ConfigMap in profile_controller_install.sh to natively enforce PSS restricted on user namespaces in CI.
- Simplified kubeflow_profile_install.sh to remove the retry loop in favor of a clean verification check.

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
…rofiles controller setup

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
…el checks

- Removed internal certGenerator from Katib overlay config to use cert-manager for webhook certificate generation, preventing tls x509 verification errors.
- Adjusted kubeflow_profile_install.sh validation check to support both restricted and privileged native labels, preventing insecure mode test failures.

Signed-off-by: abdullahpathan22 <abdullahpathan22@users.noreply.github.com>
@danish9039
Copy link
Copy Markdown
Member

@abdullahpathan22, don't modify manifests under the applications folder, as those manifests are synchronised from upstream repos. For example, the trainer manifest is synchronised and pulled into the application folder from the upstream trainer repo. If we make any changes under the applications folder, then it will be lost and overwritten when someone synchronises upstream manifests using the respective synchronisation scripts for trainer. And don't try to address all comments at once, do it one by one, same for CI checks

@abdullahpathan22
Copy link
Copy Markdown
Contributor Author

abdullahpathan22 commented May 19, 2026

Ok i will do it according.

@juliusvonkohout
Copy link
Copy Markdown
Member

Please rebase soon after #3467 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants